[HUDI-426] Implement Spark DataSource Support for querying bootstrapped tables - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.6.0
Component/s: spark
Labels:
- pull-request-available

Description

We need ability in SparkDataSource to query COW table which is bootstrapped as per https://cwiki.apache.org/confluence/display/HUDI/RFC+-+12+:+Efficient+Migration+of+Large+Parquet+Tables+to+Apache+Hudi#RFC-12:EfficientMigrationofLargeParquetTablestoApacheHudi-BootstrapIndex:

Current implementation delegates to Parquet DataSource but this wont work as we need ability to stitch the columns externally.

Attachments

Issue Links

links to

GitHub Pull Request #1475

GitHub Pull Request #1475

GitHub Pull Request #1702

Activity

People

Assignee:: Udit Mehrotra

Reporter:: Balaji Varadarajan

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 17/Dec/19 14:20

Updated:: 10/Aug/20 22:56

Resolved:: 10/Aug/20 22:56

Time Tracking

Estimated:

Not Specified

Remaining:

0h

Logged:

10m