Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-2256

Adding Compression for BloomFilter

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • format-2.9.0
    • None
    • parquet-format
    • None

    Description

      In Current Parquet implementions, if BloomFilter doesn't set the ndv, most implementions will guess the 1M as the ndv. And use it for fpp. So, if fpp is 0.01, the BloomFilter size may grows to 2M for each column, which is really huge. Should we support compression for BloomFilter, like:

       

      ```

       /**

      • The compression used in the Bloom filter.
        **/
        struct Uncompressed {}
        union BloomFilterCompression { 1: Uncompressed UNCOMPRESSED; +2: CompressionCodec COMPRESSION; }

      ```

      Attachments

        Activity

          People

            mwish Xuwei Fu
            mwish Xuwei Fu
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: