Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32674

Add suggestion for parallel directory listing in tuning doc

    XMLWordPrintableJSON

Details

    • Documentation
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.0.0
    • 2.4.7, 3.0.1, 3.1.0
    • Documentation
    • None

    Description

      Sometimes directory listing could become a bottleneck when user jobs have large number of input directories. This is especially true when against object store like S3.

      There are a few parameters to tune this. This proposes to add some info in the tuning guide so that the knowledge can be better shared.

      Attachments

        Activity

          People

            csun Chao Sun
            csun Chao Sun
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: