Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-242 [RFC-12] Support Efficient bootstrap of large parquet datasets to Hudi
  3. HUDI-426

Implement Spark DataSource Support for querying bootstrapped tables

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • None
    • 0.6.0
    • spark

    Description

      We need ability in SparkDataSource to query COW table which is bootstrapped as per https://cwiki.apache.org/confluence/display/HUDI/RFC+-+12+:+Efficient+Migration+of+Large+Parquet+Tables+to+Apache+Hudi#RFC-12:EfficientMigrationofLargeParquetTablestoApacheHudi-BootstrapIndex:

       

      Current implementation delegates to Parquet DataSource but this wont work as we need ability to stitch the columns externally.

       

      Attachments

        Activity

          People

            uditme Udit Mehrotra
            vbalaji Balaji Varadarajan
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 10m
                10m