Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-8416

Memory leak when the async Parquet reader skips empty pages

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.21.0
    • 1.21.1
    • Storage - Parquet
    • None

    Description

      If I try to query (

      SELECT * FROM `hdfs.data`.`./v2/meta_steps/me-2023-03-20-13-15-30-inv230021-kontrollsystemf39st9qrx20-03-2/meta_steps.parquet`

      ) the following parquet file which is stored on hadoop file system I am getting the following error:

      org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. Memory leaked: (64) Allocator(op:0:0:1:ParquetRowGroupScan) 1000000/64/34688/10000000000 (res/actual/peak/limit)

      Everything is working fine with drill version 1.19.

      If I select only columns without NULL values, the query also works in 1.21.0:

      SELECT `name`,`type` FROM `hdfs.data`.`./v2/meta_steps/me-2023-03-20-13-15-30-inv230021-kontrollsystemf39st9qrx20-03-2/meta_steps.parquet`

      Generated a new example.parquet with pyarrow 8.0.0 and a float column with NULL valuues and the same error happened.

      Attachments

        1. meta_steps.parquet
          3 kB
          Matthias Rosenthaler
        2. example.parquet
          2 kB
          Matthias Rosenthaler

        Activity

          People

            dzamo James Turton
            matthros Matthias Rosenthaler
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: