Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2831

Impala can spin up too many scanner threads

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 2.2, Impala 2.3.0, Impala 2.5.0
    • Impala 2.7.0
    • Backend

    Description

      We have observed a number of problems with the way Impala dynamically creates scanner threads, where more scanner threads are created than is ideal.

      • The scanner memory heuristic can lead to excessive memory consumption, especially for very selective scans with wide rows. The current heuristic for limiting memory consumption does not do well in these cases. There are likely several interlinked causes here, which will need further investigation.
      • The non-deterministic scanner thread heuristic can lead to a great deal of performance variability. At a minimum, the number of scanner threads should always converge to the same number for the same plan and data if the query is the only one running on the cluster.
      • Beyond a point, adding additional scanner threads does not improve performance (and can degrade it), but the heuristic will keep on spinning up scanner threads if there are tokens and memory available.

      Attachments

        Issue Links

          Activity

            People

              kwho Michael Ho
              tarmstrong Tim Armstrong
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: