Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-22690

When the directories from HDFS are deleted while running MSCK it fails with FileNotFoundException

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.1.1
    • None
    • Hive
    • None

    Description

      Assume a table `emp` defined as follows

       

      create external table 
          emp (id int, name string) 
      partitioned by 
          (dept string)
      location
          'hdfs://namenode.com:8020/hive/data/db/emp'
      ;

      Create say 1000 partitions in the HDFS

       

      Now to synchronize the MetaStore, if we run the MSCK command and parallely delete the HDFS directories, at some point MSCK fails with FieNotFoundException. Here is the stack trace.

       

      2019-12-10 23:21:50,027 WARN  hive.ql.exec.DDLTask: [HiveServer2-Background-Pool: Thread-500224]: Failed to run metacheck: 
      org.apache.hadoop.hive.ql.metadata.HiveException: java.io.FileNotFoundException: File hdfs://namenode.com:8020/hive/data/db/emp/dept=CS does not exist.
      	at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkPartitionDirs(HiveMetaStoreChecker.java:554) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
      	at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkPartitionDirs(HiveMetaStoreChecker.java:443) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
      	at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.findUnknownPartitions(HiveMetaStoreChecker.java:334) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
      	at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:310) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
      	at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:253) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
      	at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkMetastore(HiveMetaStoreChecker.java:118) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
      	at org.apache.hadoop.hive.ql.exec.DDLTask.msck(DDLTask.java:1862) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
      	at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:413) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
      	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
      	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
      	at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2200) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
      	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1843) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
      	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1563) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
      	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1339) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
      	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1334) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
      	at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:256) [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
      	at org.apache.hive.service.cli.operation.SQLOperation.access$600(SQLOperation.java:92) [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
      	at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345) [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
      	at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_121]
      	at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_121]
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) [hadoop-common-3.0.0-cdh6.2.1.jar:?]
      	at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:357) [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_121]
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_121]
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_121]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_121]
      	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
      Caused by: java.io.FileNotFoundException: File hdfs://namenode.com:8020/hive/data/db/emp/dept=CS does not exist.
      	at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:985) ~[hadoop-hdfs-client-3.0.0-cdh6.2.1.jar:?]
      	at org.apache.hadoop.hdfs.DistributedFileSystem.access$1000(DistributedFileSystem.java:121) ~[hadoop-hdfs-client-3.0.0-cdh6.2.1.jar:?]
      	at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1045) ~[hadoop-hdfs-client-3.0.0-cdh6.2.1.jar:?]
      	at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1042) ~[hadoop-hdfs-client-3.0.0-cdh6.2.1.jar:?]
      	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-3.0.0-cdh6.2.1.jar:?]
      	at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:1052) ~[hadoop-hdfs-client-3.0.0-cdh6.2.1.jar:?]
      	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1853) ~[hadoop-common-3.0.0-cdh6.2.1.jar:?]
      	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1895) ~[hadoop-common-3.0.0-cdh6.2.1.jar:?]
      	at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker$PathDepthInfoCallable.processPathDepthInfo(HiveMetaStoreChecker.java:474) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
      	at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker$PathDepthInfoCallable.call(HiveMetaStoreChecker.java:467) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
      	at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker$PathDepthInfoCallable.call(HiveMetaStoreChecker.java:448) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
      	... 4 more
      

      I analyzed the stack trace and found that the problem is in class HiveMetaStoreChecker::processPathDepthInfo [1]

       

      What we are doing here is

      1. Create a Q
      2. Put the table's data directory in the Q
      3. Start few threads which explore the directories in Q and add the newly discovered ones to the Q.

      This process has a flaw. Say there are 1000 first level directories and 1000*500 second level directories, then we can prove that there exists sufficient amount of time between putting a path in the Q and exploring the content of the same directory. This time is large enough to do a HDFS delete and if done so results in the above failure.

       

      What can be the improvement.

      1. [best according to me] Consume the exception and may be print it in DEBUG mode
      2. Check the existence of the directory before listing the content in it.

       

      References:

      [1] https://github.com/apache/hive/blob/01faca2f9d7dcb0f5feabfcb07fa5ea12b79c5b9/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java#L474

       

      Attachments

        Activity

          People

            mmpataki Madhusoodan
            mmpataki Madhusoodan
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: