Details
-
Improvement
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
Impala 2.5.0
Description
TBloomFilters have a 'directory' structure that is a list of individual buckets (buckets are about 64k wide). The total size of the directory can be 1MB or even much more. That leads to a lot of buckets, and very inefficient deserialisation as each bucket has to be allocated on the heap.
Instead, the TBloomFilter representation should use one contiguous string (like the real BloomFilter does, so that it can be allocated with a single operation (and deserialized with a single copy).
Attachments
Issue Links
- relates to
-
IMPALA-3100 Runtime filter kenrel spinlock contention driven by memory allocation
- Resolved