Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30281

'archive' option in FileStreamSource misses to consider partitioned and recursive option

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • Structured Streaming
    • None

    Description

      Cleanup option for FileStreamSource is introduced in SPARK-20568.

      To simplify the condition of verifying archive path, it took the fact that FileStreamSource reads the files where these files meet one of conditions: 1) parent directory matches the source pattern 2) the file itself matches the source pattern.

      We found there're other cases during post-hoc review which invalidate above fact: partitioned, and recursive option. With these options, FileStreamSource can read the arbitrary files in subdirectories which match the source pattern, so simply checking the depth of archive path doesn't work.

      We need to restore the path check logic, though it would be not easy to explain to end users.

      Attachments

        Issue Links

          Activity

            People

              kabhwan Jungtaek Lim
              kabhwan Jungtaek Lim
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: