Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5304

Parquet scanner transfers decompression buffers when not needed

    XMLWordPrintableJSON

Details

    • ghx-label-4

    Description

      The Parquet scanner always transfers decompression buffers to the scratch batch:

      Status BaseScalarColumnReader::ReadDataPage() {
        // We're about to move to the next data page.  The previous data page is
        // now complete, pass along the memory allocated for it.
        parent_->scratch_batch_->mem_pool()->AcquireData(decompressed_data_pool_.get(), false);
      

      These in turn are passed along with the row batch. This is safe but unnecessary in many cases where the batch does not hold pointers into the decompression buffer: if the column has only fixed-length data, or if the data page is dictionary-encoded.

      This can make problems like IMPALA-4923 worse than they would be otherwise because extra data is transferred across threads.

      Attachments

        Issue Links

          Activity

            People

              tarmstrong Tim Armstrong
              tarmstrong Tim Armstrong
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: