Details
-
Improvement
-
Status: Resolved
-
Critical
-
Resolution: Resolved
-
Impala 2.5.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, Impala 2.10.0
Description
Queries may exhibit very slow planning times if they have the following characteristics:
- references many tables and/or views with many columns
- many nested views, inline views and/or WITH-clause views
- many unions and/or unions with many operands
The slowness is due to the expensive equivalence class computation which grows exponentially in the number of "slots" in the query. Slots are roughly equivalent to column references (physical as well as virtual).
The planning timeline will show a large time spent in "Equivalence classes computed":
Planner Timeline: 13m12s
- Metadata load started: 2s062ms (2s062ms)
- Metadata load finished: 9s573ms (7s510ms)
- Analysis finished: 12s043ms (2s470ms)
- Equivalence classes computed: 13m1s (12m49s) <--- Very slow!
- Single node plan created: 13m10s (9s136ms)
- Runtime filters computed: 13m10s (20.928ms)
- Distributed plan created: 13m11s (278.404ms)
- Planning finished: 13m12s (1s016ms)
For example, while experimenting with inserting a single row per partition as part of ETL performance testing I noticed that queries with a large number of UNION operators spend a lot of time in computeEquivClasses().
The majority of the time is spent in HashSet.contains.
Stack Trace | Sample Count | Percentage(%) |
---|---|---|
java.util.HashSet.contains(Object) | 35,273 | 80.402 |
com.cloudera.impala.analysis.Analyzer$ValueTransferGraph.hasValueTransfer(SlotId, SlotId) | 35,273 | 80.402 |
com.cloudera.impala.analysis.Analyzer.hasValueTransfer(SlotId, SlotId) | 35,273 | 80.402 |
com.cloudera.impala.analysis.Analyzer.computeEquivClasses() | 35,104 | 80.016 |
com.cloudera.impala.planner.SingleNodePlanner.createSingleNodePlan() | 35,104 | 80.016 |
com.cloudera.impala.planner.Planner.createPlan() | 35,104 | 80.016 |
com.cloudera.impala.service.Frontend.createExecRequest(TQueryCtx, StringBuilder) | 35,104 | 80.016 |
com.cloudera.impala.service.JniFrontend.createExecRequest(byte[]) | 35,104 | 80.016 |
Not clear what the complexity of loops in computeEquivClasses() but it is obviously inefficient.
https://github.com/apache/incubator-impala/blob/master/fe/src/main/java/org/apache/impala/analysis/Analyzer.java#L1954
For the attached query with 1800 UNION operators computing equivalence classes takes several minutes.
Attachments
Attachments
Issue Links
- is duplicated by
-
IMPALA-1798 Very slow performance of Views on top of another Views
- Resolved
1.
|
Remove equivalence classes | Resolved | Tianyi Wang | |
2.
|
Only compute value transfers for materialized slots | Open | Unassigned |