Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
Impala 2.2, Impala 2.3.0, Impala 2.5.0, Impala 2.4.0, Impala 2.6.0, Impala 2.7.0
-
None
Description
In order to improve compression and/or the effectiveness of min/max pruning, it is desirable to control the order in which rows are inserted into table (mostly for Parquet).
To that end, we should introduce a "sortby" plan hint for insert statements: Example
CREATE TABLE dst (...);
INSERT INTO dst /*+ sortby(day,hour) */ SELECT * FROM src;
This would produce the following plan:
SCAN -> SORT(day,hour) -> TABLE SINK
Syntax and behavior
INSERT INTO dst /*+ sortby(day,hour) */ SELECT * FROM src;
- We will not support the legacy-hint style with brackets
[sortby(day,hour)]
- To keep the "clustered" hint strictly separate from the "sortby" hint, it is only legal to use non-partition columns in "sortby" for HDFS tables.
- Similarly, it is only legal to mention non-primary-key columns of Kudu tables.
Attachments
Issue Links
- blocks
-
IMPALA-3909 Parquet file writer should populate the min/max statistics per block per column to be used by the reader
- Resolved