Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27534

Do not load `content` column in binary data source if it is not selected

    XMLWordPrintableJSON

Details

    • Story
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • SQL
    • None

    Description

      A follow-up task from SPARK-25348. To save I/O cost, Spark shouldn't attempt to read the file if users didn't request the `content` column. For example:

      spark.read.format("binaryFile").load(path).filter($"length" < 1000000).count()
      

      Attachments

        Issue Links

          Activity

            People

              weichenxu123 Weichen Xu
              mengxr Xiangrui Meng
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: