Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-28222

Leak in ExportSnapshot during verifySnapshot on S3A

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.6.0, 3.0.0-beta-1
    • None
    • None
    • Hide
      ExportSnapshot now uses FileSystems from the global FileSystem cache, and as such does not close those FileSystems when it finishes. If users plan to run ExportSnapshot over and over in a single process for different FileSystem urls, they should run FileSystem.closeAll() between runs. See JIRA for details.
      Show
      ExportSnapshot now uses FileSystems from the global FileSystem cache, and as such does not close those FileSystems when it finishes. If users plan to run ExportSnapshot over and over in a single process for different FileSystem urls, they should run FileSystem.closeAll() between runs. See JIRA for details.

    Description

      Each S3AFileSystem creates an S3AInstrumentation and various metrics sources, with no real way to disable that. In HADOOP-18526, a bug was fixed so that these are not leaked. But in order to use that, you must call S3AFileSystem.close() when done.

      In ExportSnapshot, ever since HBASE-12819 we set fs.impl.disable.cache to true. It looks like that was added in order to prevent conflicting calls to close() between mapper and main thread when running in a single JVM.

      When verifySnapshot is enabled, SnapshotReferenceUtil.verifySnapshot iterates all storefiles (could be many thousands) and calls SnapshotReferenceUtil.verifyStoreFile on them. verifyStoreFile makes a number of static calls which end up in CommonFSUtils.getRootDir, which does Path.getFileSystem().

      Since the FS cache is disabled, every single call to Path.getFileSystem() creates a new FileSystem instance. That FS is short lived, and gets GC'd. But in the case of S3AFileSystem, this leaks all of the metrics stuff.

      We have two easy possible fixes:

      1. Only set fs.impl.disable.cache when running hadoop in local mode, since that was the original problem.
      2. When calling verifySnapshot, create a new Configuration which does not include the fs.impl.disable.cache setting.

      I tested out #2 in my environment and it fixed the leak.

      Attachments

        Issue Links

          Activity

            People

              bbeaudreault Bryan Beaudreault
              bbeaudreault Bryan Beaudreault
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: