Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-11381

HdfsDirectoryFactory throws NPE on cleanup because file system has been closed

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Trivial
    • Resolution: Fixed
    • None
    • 8.1, 9.0
    • Hadoop Integration, hdfs
    • None

    Description

      I saw this happening on tests related to autoscaling. The old directory clean up is triggered on core close in a separate thread. This can cause a race condition where the filesystem is closed before the cleanup starts running. Then a NPE is thrown and cleanup fails.

      Fixing the NPE is simple but I think this is a real bug where old directories can be left around on HDFS. I don't know enough about HDFS to investigate further. Leaving it here for interested people to pitch in.

      105029 ERROR (OldIndexDirectoryCleanupThreadForCore-control_collection_shard1_replica_n1) [n:127.0.0.1:58542_ c:control_collection s:shard1 r:core_node2 x:control_collection_shard1_replica_n1] o.a.s.c.HdfsDirectoryFactory Error checking for old index directories to clean-up.
      java.io.IOException: Filesystem closed
      	at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808)
      	at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2083)
      	at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2069)
      	at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:791)
      	at org.apache.hadoop.hdfs.DistributedFileSystem.access$700(DistributedFileSystem.java:106)
      	at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:853)
      	at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:849)
      	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
      	at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:860)
      	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1517)
      	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1557)
      	at org.apache.solr.core.HdfsDirectoryFactory.cleanupOldIndexDirectories(HdfsDirectoryFactory.java:540)
      	at org.apache.solr.core.SolrCore.lambda$cleanupOldIndexDirectories$32(SolrCore.java:3019)
      	at java.lang.Thread.run(Thread.java:745)
      105030 ERROR (OldIndexDirectoryCleanupThreadForCore-control_collection_shard1_replica_n1) [n:127.0.0.1:58542_ c:control_collection s:shard1 r:core_node2 x:control_collection_shard1_replica_n1] o.a.s.c.SolrCore Failed to cleanup old index directories for core control_collection_shard1_replica_n1
      java.lang.NullPointerException
      	at org.apache.solr.core.HdfsDirectoryFactory.cleanupOldIndexDirectories(HdfsDirectoryFactory.java:558)
      	at org.apache.solr.core.SolrCore.lambda$cleanupOldIndexDirectories$32(SolrCore.java:3019)
      	at java.lang.Thread.run(Thread.java:745)
      

      Attachments

        Issue Links

          Activity

            People

              krisden Kevin Risden
              shalin Shalin Shekhar Mangar
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: