Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6680

JHS UserLogDir scan algorithm sometime could skip directory with update in CloudFS (Azure FileSystem, S3, etc.)

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      In our cluster based on a Cloud FileSystem, we notice JHS sometimes could skip directory with .jhist file in scanning.
      The behavior is like:
      First round scan, doesn't found .jhist file:

      16/04/13 11:14:34 DEBUG azure.NativeAzureFileSystem: Found path as a directory with 6 files in it.
      16/04/13 11:14:34 DEBUG hs.HistoryFileManager: Found 0 files
      ...
      

      Then, we see "Scan not needed of ..." for the same directory every 3 minutes until application failed as timeout.

      From our analysis, we found the root cause is: most of Cloud File System (Azure FS, S3, etc.) is truncating file/directory modification time to seconds instead of milliseconds - which could due to limit of http protocol (from discussion at: https://forums.aws.amazon.com/thread.jspa?messageID=476615).

      So if the time sequence is happen to be: latest non .jhist file modification on directory happens at T1, directory scanning happens at T2, .jhist file added to directory at T3. If we have T1< T2 < T3 and T1 is equal to T3 after truncating to seconds, this issue could appear.

      Attachments

        1. MAPREDUCE-6680-v3.patch
          2 kB
          Junping Du
        2. MAPREDUCE-6680-v2.patch
          2 kB
          Junping Du
        3. MAPREDUCE-6680.patch
          2 kB
          Junping Du

        Issue Links

          Activity

            People

              junping_du Junping Du
              junping_du Junping Du
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: