Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6383

Memory from previous row groups can accumulate in Parquet scanner

    XMLWordPrintableJSON

Details

    Description

      I ran across this bug when working on porting scanners to the new buffer pool. Before that the only symptom of the failures was excessive memory consumption, but with the reservations they become easy-to-detect hard failures.

      The problem is in HdfsParquetScanner::NextRowGroup(), which calls InitColumns() on column readers, which starts scans, which allocate memory. The problem is that, if the row group is skipped because of dictionary predicates or some other error, the scans aren't cancelled and the I/O buffers aren't releated.

      Attachments

        Issue Links

          Activity

            People

              tarmstrong Tim Armstrong
              tarmstrong Tim Armstrong
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: