Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-13104

ParquetIO.read(): potential data loss while using FilterPredicate and withSplit()

Details

    • Bug
    • Status: Resolved
    • P1
    • Resolution: Fixed
    • 2.25.0
    • 2.34.0
    • io-java-parquet
    • None

    Description

      If ParquetIO.read()

      • is used with withConfiguration() and FilterPredicate,
      • is used with withSplit(),
      • and filtered records are not in the beginning of the reading block

      then they will be skipped and it will cause a data loss.

      Attachments

        Activity

          People

            aromanenko Alexey Romanenko
            aromanenko Alexey Romanenko
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1.5h
                1.5h