Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-14267

Update watchForNewFiles to allow reading already read files with a new timestamp

Details

    • New Feature
    • Status: In Progress
    • P2
    • Resolution: Unresolved
    • None
    • None
    • io-java-files
    • None

    Description

      In TextIO and AvroIO, we have a configuration option called watchForNewFiles, and in FileIO.MatchConfiguration, we have an option called watchInterval. Right now, these match any files according to the filtering criteria, and then periodically check for new files. A file is determined to be new if it has a different filename than a file that has already been read.

      We want to add an option to choose to consider a file new if it has a different timestamp from an existing file, even if the file itself has the same name.

      See the following design doc for more detail:

      https://docs.google.com/document/d/1xnacyLGNh6rbPGgTAh5D1gZVR8rHUBsMMRV3YkvlL08/edit?usp=sharing&resourcekey=0-be0uF-DdmwAz6Vg4Li9FNw

       

      Attachments

        Issue Links

          Activity

            People

              yihu Yi Hu
              yihu Yi Hu
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h 40m
                  3h 40m