Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6800

FileInputFormat.singleThreadedListStatus to use listFiles(recursive)

Add voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 2.7.3
    • None
    • mrv2
    • None

    Description

      FileInputFormat.singleThreadedListStatus does recursive directory walks to pick files to scan. This is very inefficient on object stores, and can be bypassed if listFiles(recursive=true) can be used instead.

      Based on the experience of SPARK-2984, it should also be resilient to a source file going away during the iteration, downgrading an FNFE to a "skip that nonexistent path"

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            stevel@apache.org Steve Loughran

            Dates

              Created:
              Updated:

              Slack

                Issue deployment