Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-12855

NullPointerException in firing RELOAD events if the partition is just dropped

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 4.3.0
    • Impala 4.4.0
    • Catalog
    • None

    Description

      REFRESH <table> PARTITION could fail in firing RELOAD events (when --enable_reload_events=true) if the partition is dropped by a concurrent DDL. The failure is a NullPointerException:

      E0229 15:04:25.578933  7381 JniUtil.java:183] 824a23c46a6f71de:78a2f3dc00000000] Error in REFRESH TABLE default.part_tbl PARTITIONS issued by quanlong. Time spent: 1s061ms
      I0229 15:04:25.579373  7381 jni-util.cc:302] 824a23c46a6f71de:78a2f3dc00000000] java.lang.NullPointerException
              at org.apache.impala.catalog.HdfsPartition.access$500(HdfsPartition.java:101)
              at org.apache.impala.catalog.HdfsPartition$Builder.<init>(HdfsPartition.java:1314)
              at org.apache.impala.service.CatalogOpExecutor.fireReloadEventAndUpdateRefreshEventId(CatalogOpExecutor.java:6810)
              at org.apache.impala.service.CatalogOpExecutor.execResetMetadataImpl(CatalogOpExecutor.java:6744)
              at org.apache.impala.service.CatalogOpExecutor.execResetMetadata(CatalogOpExecutor.java:6612)
              at org.apache.impala.service.JniCatalog.lambda$resetMetadata$4(JniCatalog.java:327)
              at org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90)
              at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58)
              at org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89)
              at org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:100)
              at org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:243)
              at org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:257)
              at org.apache.impala.service.JniCatalog.resetMetadata(JniCatalog.java:326)

      The problem is that in the implementation of execResetMetadataImpl(), the table lock is not held all the way. Instead, it's held when reloading the metadata then released, and held again when we need to fire RELOAD events. In the time between these, the partition could be dropped by concurrent DDL. Then firing the RELOAD events failed by not finding the partition.

      Reproducing the issue

      For how to reproduce the issue, start catalogd with --enable_reload_events=true

      bin/start-impala-cluster.py --catalogd_args="--enable_reload_events=true"

      Create a partitioned table

      create table part_tbl (i int) partitioned by (p int);

      Run a loop to ADD+DROP partition on this table

      while true; do impala-shell.sh --quiet -B -q "ALTER TABLE part_tbl ADD PARTITION (p=1); ALTER TABLE part_tbl DROP PARTITION (p=1);" > /dev/null; done

      In another session, run a loop to REFRESH the partition

      while true; do impala-shell.sh --quiet -B -q "REFRESH part_tbl PARTITION (p=1)" > /dev/null; done

      After a while, some REFRESH would fail:

      Could not execute command: REFRESH part_tbl PARTITION (p=1)
      ERROR: NullPointerException: null

      Attachments

        Issue Links

          Activity

            People

              stigahuang Quanlong Huang
              stigahuang Quanlong Huang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: