Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-11908

Deprecate .withProjection from ParquetIO

Details

    • Improvement
    • Status: Open
    • P3
    • Resolution: Unresolved
    • None
    • None
    • io-java-parquet

    Description

      There are multiple issues wrong with the API of withProjection:

      1. The current API requires an extra encoderSchema that is not needed when projecting data in Parquet. The simplest way to get this with the Parquet API is by passing the projectionSchema like this:

      AvroReadSupport.setAvroReadSchema(conf, projectionSchema);
      AvroReadSupport.setRequestedProjection(conf, projectionSchema);

      We can offer an alternative method `withProjection(Configuration conf, List<String> fields)` so users don't have to build their own projection Schema, but historically we have let users to rely on the upstream connector API. If we follow this we can better document in ParquetIO how to project fields by relying in the Parquet APIs and avoid maintaining this extra code in the Beam side.

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            iemejia Ismaël Mejía
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated: