Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-7101

Add config parameter to allow JHS to alway scan user dir irrespective of modTime

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 2.10.0, 3.2.0, 3.1.1
    • None
    • None

    Description

      Currently, the JHS scan directory if the modification of directory changed:

       
          public synchronized void scanIfNeeded(FileStatus fs) {
            long newModTime = fs.getModificationTime();
            if (modTime != newModTime) {
              <... omitted some logics ...>
              // reset scanTime before scanning happens
              scanTime = System.currentTimeMillis();
              Path p = fs.getPath();
              try {
                scanIntermediateDirectory(p);
      

      This logic relies on an assumption that, the directory's modification time will be updated if a file got placed under the directory.

      However, the semantic of directory's modification time is not consistent in different FS implementations. For example, MAPREDUCE-6680 fixed some issues of truncated modification time. And HADOOP-12837 mentioned on S3, the directory's modification time is always 0.

      I think we need to revisit behavior of this logic to make it to more robustly work on different file systems.

      Attachments

        1. MAPREDUCE-7101.001.patch
          4 kB
          Arun Suresh
        2. MAPREDUCE-7101.001.patch
          4 kB
          Thomas Marqardt

        Issue Links

          Activity

            People

              tmarquardt Thomas Marqardt
              leftnoteasy Wangda Tan
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: