Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3825

Distribute runtime filter aggregation across cluster

    XMLWordPrintableJSON

Details

    Description

      Runtime filters can be tens of MB or more, and incasting all filters from all shuffle joins to the coordinator can put a lot of memory pressure on that node. To alleviate this we should consider spreading out the aggregation operation across the cluster, so that a different node aggregates each runtime filter.

      This still restricts aggregation to #runtime-filters nodes, which will usually be less than the cluster size. If we want to smooth that out further we could use tree-based aggregation, but let's measure the benefits of simply distributing the aggregation work first.

      Attachments

        Issue Links

          Activity

            People

              rizaon Riza Suminto
              henryr Henry Robinson
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: