Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-3286

Add special handling for empty strings for Bloom filter predicate push down

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.13.0
    • 1.15.0
    • None
    • None
    • Hide
      Updated hash computation for empty strings in the FastHash implementation to conform with the
      handling in Apache Impala. For Bloom filter predicate pushdown feature that uses FastHash,
      this makes the Kudu client older than version 1.15.0 incompatible with Kudu server version 1.15.0
      and Kudu client version at or newer than 1.15.0 incompatible with Kudu server version earlier than
      1.15.0. Both client library and Kudu server need to be updated to version 1.15.0 or above if using
      the Bloom filter predicate feature.

      Manifestations of this incompatibility are following messages in the logs:

      - "Not implemented: call requires unsupported application feature flags: 4".
      - "Not implemented: call requires unsupported application feature flags: 5".
      Show
      Updated hash computation for empty strings in the FastHash implementation to conform with the handling in Apache Impala. For Bloom filter predicate pushdown feature that uses FastHash, this makes the Kudu client older than version 1.15.0 incompatible with Kudu server version 1.15.0 and Kudu client version at or newer than 1.15.0 incompatible with Kudu server version earlier than 1.15.0. Both client library and Kudu server need to be updated to version 1.15.0 or above if using the Bloom filter predicate feature. Manifestations of this incompatibility are following messages in the logs: - "Not implemented: call requires unsupported application feature flags: 4". - "Not implemented: call requires unsupported application feature flags: 5".

    Description

      Fast hash used with Bloom filter predicate pushdown has special handling for nullptr.

      https://github.com/apache/kudu/blob/master/src/kudu/util/hash_util.h#L95

      However there isn't any special handling for empty objects/strings. Fast hash for an empty string with seed=0 generates a hash value of 0. This doesn't set any bits in Bloom filter and as a result empty strings are reported as not present.

      Impala uses the direct bloom filter approach and includes special handling for empty strings.
      https://github.com/apache/impala/blob/master/be/src/runtime/raw-value.inline.h#L352

      This leads to discrepancy between Impala and Kudu and returns incorrect join results.

      Attachments

        Activity

          People

            bankim Bankim Bhavsar
            bankim Bankim Bhavsar
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: