Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-7925

ParquetIO supports neither column projection nor filter predicate

Details

    • Improvement
    • Status: Resolved
    • P3
    • Resolution: Won't Fix
    • 2.14.0
    • Missing
    • io-java-parquet
    • None

    Description

      Current ParquetIO supports neither column projection nor filter predicate which defeats the performance motivation of using Parquet in the first place. That's why we have our own implementation of ParquetIO in Scio.

      Reading Parquet as Avro with column projection has some complications, namely, the resulting Avro records may be incomplete and will not survive ser/de. A workaround maybe provide a TypedRead interface that takes a Function<A, B> that maps invalid Avro A into user defined type B.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              sinisa_lyh Neville Li
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h 20m
                  3h 20m