Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
Impala 4.3.0
-
None
-
ghx-label-7
Description
REFRESH <table> PARTITION could fail in firing RELOAD events (when --enable_reload_events=true) if the partition is dropped by a concurrent DDL. The failure is a NullPointerException:
E0229 15:04:25.578933 7381 JniUtil.java:183] 824a23c46a6f71de:78a2f3dc00000000] Error in REFRESH TABLE default.part_tbl PARTITIONS issued by quanlong. Time spent: 1s061ms I0229 15:04:25.579373 7381 jni-util.cc:302] 824a23c46a6f71de:78a2f3dc00000000] java.lang.NullPointerException at org.apache.impala.catalog.HdfsPartition.access$500(HdfsPartition.java:101) at org.apache.impala.catalog.HdfsPartition$Builder.<init>(HdfsPartition.java:1314) at org.apache.impala.service.CatalogOpExecutor.fireReloadEventAndUpdateRefreshEventId(CatalogOpExecutor.java:6810) at org.apache.impala.service.CatalogOpExecutor.execResetMetadataImpl(CatalogOpExecutor.java:6744) at org.apache.impala.service.CatalogOpExecutor.execResetMetadata(CatalogOpExecutor.java:6612) at org.apache.impala.service.JniCatalog.lambda$resetMetadata$4(JniCatalog.java:327) at org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90) at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58) at org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89) at org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:100) at org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:243) at org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:257) at org.apache.impala.service.JniCatalog.resetMetadata(JniCatalog.java:326)
The problem is that in the implementation of execResetMetadataImpl(), the table lock is not held all the way. Instead, it's held when reloading the metadata then released, and held again when we need to fire RELOAD events. In the time between these, the partition could be dropped by concurrent DDL. Then firing the RELOAD events failed by not finding the partition.
Reproducing the issue
For how to reproduce the issue, start catalogd with --enable_reload_events=true
bin/start-impala-cluster.py --catalogd_args="--enable_reload_events=true"
Create a partitioned table
create table part_tbl (i int) partitioned by (p int);
Run a loop to ADD+DROP partition on this table
while true; do impala-shell.sh --quiet -B -q "ALTER TABLE part_tbl ADD PARTITION (p=1); ALTER TABLE part_tbl DROP PARTITION (p=1);" > /dev/null; done
In another session, run a loop to REFRESH the partition
while true; do impala-shell.sh --quiet -B -q "REFRESH part_tbl PARTITION (p=1)" > /dev/null; done
After a while, some REFRESH would fail:
Could not execute command: REFRESH part_tbl PARTITION (p=1) ERROR: NullPointerException: null
Attachments
Issue Links
- is caused by
-
IMPALA-11822 Optimize the Refresh/Invalidate event processing by skipping unnecessary events
- Resolved