Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-7182

MapReduce input format/record readers to support S3 select queries

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.3.0
    • Fix Version/s: None
    • Component/s: mrv2
    • Labels:
      None
    • Target Version/s:

      Description

      HADOOP-15229 adds S3 select through the (new) async openFile API, but the classic RecordReader &c can't handle it because

      1. the files are shorter than they are in a getFileStatus, and the readers assume that an EOFException is an error in that situation
      2. everything assumes plain text is splitable
      3. if a file has a gz extension, the gunzip codec should be used. So breaks transcoded/uncompressed data

      to handle s3 select data sources we need to be able to address them, either through changes to the existing code (danger?) or some new readers

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                stevel@apache.org Steve Loughran
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated: