[IMPALA-12855] NullPointerException in firing RELOAD events if the partition is just dropped - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: Impala 4.3.0
Fix Version/s: Impala 4.4.0
Component/s: Catalog
Labels:
None

Epic Link:
event-processor-completeness
Epic Color:
ghx-label-7

Description

REFRESH <table> PARTITION could fail in firing RELOAD events (when --enable_reload_events=true) if the partition is dropped by a concurrent DDL. The failure is a NullPointerException:

E0229 15:04:25.578933  7381 JniUtil.java:183] 824a23c46a6f71de:78a2f3dc00000000] Error in REFRESH TABLE default.part_tbl PARTITIONS issued by quanlong. Time spent: 1s061ms
I0229 15:04:25.579373  7381 jni-util.cc:302] 824a23c46a6f71de:78a2f3dc00000000] java.lang.NullPointerException
        at org.apache.impala.catalog.HdfsPartition.access$500(HdfsPartition.java:101)
        at org.apache.impala.catalog.HdfsPartition$Builder.<init>(HdfsPartition.java:1314)
        at org.apache.impala.service.CatalogOpExecutor.fireReloadEventAndUpdateRefreshEventId(CatalogOpExecutor.java:6810)
        at org.apache.impala.service.CatalogOpExecutor.execResetMetadataImpl(CatalogOpExecutor.java:6744)
        at org.apache.impala.service.CatalogOpExecutor.execResetMetadata(CatalogOpExecutor.java:6612)
        at org.apache.impala.service.JniCatalog.lambda$resetMetadata$4(JniCatalog.java:327)
        at org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90)
        at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58)
        at org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89)
        at org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:100)
        at org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:243)
        at org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:257)
        at org.apache.impala.service.JniCatalog.resetMetadata(JniCatalog.java:326)

The problem is that in the implementation of execResetMetadataImpl(), the table lock is not held all the way. Instead, it's held when reloading the metadata then released, and held again when we need to fire RELOAD events. In the time between these, the partition could be dropped by concurrent DDL. Then firing the RELOAD events failed by not finding the partition.

Reproducing the issue

For how to reproduce the issue, start catalogd with --enable_reload_events=true

bin/start-impala-cluster.py --catalogd_args="--enable_reload_events=true"

Create a partitioned table

create table part_tbl (i int) partitioned by (p int);

Run a loop to ADD+DROP partition on this table

while true; do impala-shell.sh --quiet -B -q "ALTER TABLE part_tbl ADD PARTITION (p=1); ALTER TABLE part_tbl DROP PARTITION (p=1);" > /dev/null; done

In another session, run a loop to REFRESH the partition

while true; do impala-shell.sh --quiet -B -q "REFRESH part_tbl PARTITION (p=1)" > /dev/null; done

After a while, some REFRESH would fail:

Could not execute command: REFRESH part_tbl PARTITION (p=1)
ERROR: NullPointerException: null

Attachments

Issue Links

is caused by

IMPALA-11822 Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Resolved

Activity

People

Assignee:: Quanlong Huang

Reporter:: Quanlong Huang

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 29/Feb/24 07:15

Updated:: 07/Mar/24 01:02

Resolved:: 07/Mar/24 01:02