Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-11503

Dropping files of Iceberg table in HiveCatalog will cause DROP TABLE to fail

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • Impala 4.1.0
    • None
    • Frontend
    • ghx-label-9

    Description

      When the files of n Iceberg table are dropped then a DROP TABLE will result in an error while the table will still show up in SHOW TABLES
      Here are the steps to repro:

      1) Run from Impala-shell

      DROP DATABASE IF EXISTS `drop_incomplete_table2` CASCADE;
      CREATE DATABASE `drop_incomplete_table2`;
      CREATE TABLE drop_incomplete_table2.iceberg_tbl (i int) stored as iceberg;
      INSERT INTO drop_incomplete_table2.iceberg_tbl VALUES (1), (2), (3); 

      2) Drop the folder of the table with hdfs dfs

      hdfs dfs -rm -r hdfs://localhost:20500/test-warehouse/drop_incomplete_table2.db/iceberg_tbl 

      3) Try to drop the table from Impala-shell

      DROP TABLE drop_incomplete_table2.iceberg_tbl;
      

      This results in the following error:

      ERROR: NotFoundException: Failed to open input stream for file: hdfs://localhost:20500/test-warehouse/drop_incomplete_table2.db/iceberg_tbl/metadata/00001-e2568132-d74d-44c2-9b7f-8838453e5944.metadata.json
      CAUSED BY: FileNotFoundException: File does not exist: /test-warehouse/drop_incomplete_table2.db/iceberg_tbl/metadata/00001-e2568132-d74d-44c2-9b7f-8838453e5944.metadata.json
          at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:87)
          at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:77)
          at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:159)
          at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2040)
          at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:737)
          at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:454)
          at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
          at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
          at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
          at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:989)
          at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:917)
          at java.security.AccessController.doPrivileged(Native Method)
          at javax.security.auth.Subject.doAs(Subject.java:422)
          at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
          at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2894)CAUSED BY: RemoteException: File does not exist: /test-warehouse/drop_incomplete_table2.db/iceberg_tbl/metadata/00001-e2568132-d74d-44c2-9b7f-8838453e5944.metadata.json
          at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:87)
          at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:77)
          at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:159)
          at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2040)
          at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:737)
          at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:454)
          at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
          at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
          at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
          at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:989)
          at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:917)
          at java.security.AccessController.doPrivileged(Native Method)
          at javax.security.auth.Subject.doAs(Subject.java:422)
          at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
          at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2894) 

      While table is still there in show tables output even after an invalidate metadata.

      Note, it's important for the repro to execute some SQL against the newly created table to load it in Impala. In this case I used an INSERT INTO but e.g. an ALTER TABLE would also be good. Apparently, when the table is "incomplete" (this is the state right after running CREATE TABLE) this works fine but not if the table is loaded.
      The suspicious part of code is in StmtMetadataLoader.loadTables() and getMissingTables() where there is a distinction between loaded and Incomplete tables.
      https://github.com/apache/impala/blob/2f74e956aa10db5af6a7cdc47e2ad42f63d5030f/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java#L196

       

      Note2, the issue is quite similar to https://issues.apache.org/jira/browse/IMPALA-11502 but here the repro steps and the error is somewhat different.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              gaborkaszab Gabor Kaszab
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: