Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4835 HDFS scans should operate with a constrained number of I/O buffers
  3. IMPALA-6290

Simplify ScannerContext buffer management to only use one I/O buffer at a time.

    XMLWordPrintableJSON

Details

    Description

      I'm doing this as part of the HDFS buffer management work but splitting it out as a subtask since it's a logically independent change.

      ScannerContext currently depends on the scanners calling ReleaseCompletedResources() repeatedly to free up buffers. Currently this works ok, but if we add a hard constraint to the number of I/O buffers, then we could hit resource exhaustion if we scan too far ahead without calling ReleaseCompletedResources(). E.g. if we have 3 * 8MB I/O buffers to use and try to scan 25MB before calling ReleaseCompletedResources(), we end up in a state where all I/O buffers are sitting in the ScannerContext.

      Certain ScannerContext operations also can exhaust the I/O buffers no matter how frequently ReleaseCompletedResources() is called. E.g. ReadBytes(25MB) or SkipBytes(25MB) would run into that problem with the current implementation.

      I spent some time looking at the ScannerContext API and the calling patterns of the scanners and came to the conclusion that there's no requirement for us to accumulate buffers in completed_io_buffers_ - after IMPALA-5307 we don't generally assume that the memory returned from previous calls remains valid when the read position from the stream is advanced.

      Attachments

        Activity

          People

            tarmstrong Tim Armstrong
            tarmstrong Tim Armstrong
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: