[HIVE-22489] Reduce Sink operator should order nulls by parameter - ASF JIRA

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 4.0.0-alpha-1
Component/s: Query Planning
Labels:
None

Target Version/s:

4.0.0

Description

When the property hive.default.nulls.last is set to true and no null order is explicitly specified in the ORDER BY clause of the query null ordering should be NULLS LAST.
But some of the Reduce Sink operators still orders null first.

SET hive.default.nulls.last=true;

EXPLAIN EXTENDED
SELECT src1.key, src2.value FROM src src1 JOIN src src2 ON (src1.key = src2.key) ORDER BY src1.key LIMIT 5;

PREHOOK: query: EXPLAIN EXTENDED
SELECT src1.key, src2.value FROM src src1 JOIN src src2 ON (src1.key = src2.key) ORDER BY src1.key
PREHOOK: type: QUERY
PREHOOK: Input: default@src
#### A masked pattern was here ####
POSTHOOK: query: EXPLAIN EXTENDED
SELECT src1.key, src2.value FROM src src1 JOIN src src2 ON (src1.key = src2.key) ORDER BY src1.key
POSTHOOK: type: QUERY
POSTHOOK: Input: default@src
#### A masked pattern was here ####
OPTIMIZED SQL: SELECT `t0`.`key`, `t2`.`value`
FROM (SELECT `key`
FROM `default`.`src`
WHERE `key` IS NOT NULL) AS `t0`
INNER JOIN (SELECT `key`, `value`
FROM `default`.`src`
WHERE `key` IS NOT NULL) AS `t2` ON `t0`.`key` = `t2`.`key`
ORDER BY `t0`.`key`
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Tez
#### A masked pattern was here ####
      Edges:
        Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE)
        Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
#### A masked pattern was here ####
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: src1
                  filterExpr: key is not null (type: boolean)
                  Statistics: Num rows: 500 Data size: 43500 Basic stats: COMPLETE Column stats: COMPLETE
                  GatherStats: false
                  Filter Operator
                    isSamplingPred: false
                    predicate: key is not null (type: boolean)
                    Statistics: Num rows: 500 Data size: 43500 Basic stats: COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: key (type: string)
                      outputColumnNames: _col0
                      Statistics: Num rows: 500 Data size: 43500 Basic stats: COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: string)
                        null sort order: a
                        sort order: +
                        Map-reduce partition columns: _col0 (type: string)
                        Statistics: Num rows: 500 Data size: 43500 Basic stats: COMPLETE Column stats: COMPLETE
                        tag: 0
                        auto parallelism: true
            Execution mode: vectorized, llap
            LLAP IO: no inputs
            Path -> Alias:
#### A masked pattern was here ####
            Path -> Partition:
#### A masked pattern was here ####
                Partition
                  base file name: src
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  properties:
                    COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}}
                    bucket_count -1
                    bucketing_version 2
                    column.name.delimiter ,
                    columns key,value
                    columns.comments 'default','default'
                    columns.types string:string
#### A masked pattern was here ####
                    name default.src
                    numFiles 1
                    numRows 500
                    rawDataSize 5312
                    serialization.ddl struct src { string key, string value}
                    serialization.format 1
                    serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                    totalSize 5812
#### A masked pattern was here ####
                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                    properties:
                      COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}}
                      bucket_count -1
                      bucketing_version 2
                      column.name.delimiter ,
                      columns key,value
                      columns.comments 'default','default'
                      columns.types string:string
#### A masked pattern was here ####
                      name default.src
                      numFiles 1
                      numRows 500
                      rawDataSize 5312
                      serialization.ddl struct src { string key, string value}
                      serialization.format 1
                      serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                      totalSize 5812
#### A masked pattern was here ####
                    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                    name: default.src
                  name: default.src
            Truncated Path -> Alias:
              /src [src1]
        Map 4 
            Map Operator Tree:
                TableScan
                  alias: src2
                  filterExpr: key is not null (type: boolean)
                  Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column stats: COMPLETE
                  GatherStats: false
                  Filter Operator
                    isSamplingPred: false
                    predicate: key is not null (type: boolean)
                    Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: key (type: string), value (type: string)
                      outputColumnNames: _col0, _col1
                      Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: string)
                        null sort order: a
                        sort order: +
                        Map-reduce partition columns: _col0 (type: string)
                        Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column stats: COMPLETE
                        tag: 1
                        value expressions: _col1 (type: string)
                        auto parallelism: true
            Execution mode: vectorized, llap
            LLAP IO: no inputs
            Path -> Alias:
#### A masked pattern was here ####
            Path -> Partition:
#### A masked pattern was here ####
                Partition
                  base file name: src
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  properties:
                    COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}}
                    bucket_count -1
                    bucketing_version 2
                    column.name.delimiter ,
                    columns key,value
                    columns.comments 'default','default'
                    columns.types string:string
#### A masked pattern was here ####
                    name default.src
                    numFiles 1
                    numRows 500
                    rawDataSize 5312
                    serialization.ddl struct src { string key, string value}
                    serialization.format 1
                    serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                    totalSize 5812
#### A masked pattern was here ####
                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                    properties:
                      COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}}
                      bucket_count -1
                      bucketing_version 2
                      column.name.delimiter ,
                      columns key,value
                      columns.comments 'default','default'
                      columns.types string:string
#### A masked pattern was here ####
                      name default.src
                      numFiles 1
                      numRows 500
                      rawDataSize 5312
                      serialization.ddl struct src { string key, string value}
                      serialization.format 1
                      serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                      totalSize 5812
#### A masked pattern was here ####
                    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                    name: default.src
                  name: default.src
            Truncated Path -> Alias:
              /src [src2]
        Reducer 2 
            Execution mode: llap
            Needs Tagging: false
            Reduce Operator Tree:
              Merge Join Operator
                condition map:
                     Inner Join 0 to 1
                keys:
                  0 _col0 (type: string)
                  1 _col0 (type: string)
                outputColumnNames: _col0, _col2
                Position of Big Table: 1
                Statistics: Num rows: 791 Data size: 140798 Basic stats: COMPLETE Column stats: COMPLETE
                Select Operator
                  expressions: _col0 (type: string), _col2 (type: string)
                  outputColumnNames: _col0, _col1
                  Statistics: Num rows: 791 Data size: 140798 Basic stats: COMPLETE Column stats: COMPLETE
                  Reduce Output Operator
                    key expressions: _col0 (type: string)
                    null sort order: z
                    sort order: +
                    Statistics: Num rows: 791 Data size: 140798 Basic stats: COMPLETE Column stats: COMPLETE
                    tag: -1
                    value expressions: _col1 (type: string)
                    auto parallelism: false
        Reducer 3 
            Execution mode: vectorized, llap
            Needs Tagging: false
            Reduce Operator Tree:
              Select Operator
                expressions: KEY.reducesinkkey0 (type: string), VALUE._col0 (type: string)
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 791 Data size: 140798 Basic stats: COMPLETE Column stats: COMPLETE
                File Output Operator
                  compressed: false
                  GlobalTableId: 0
#### A masked pattern was here ####
                  NumFilesPerFileSink: 1
                  Statistics: Num rows: 791 Data size: 140798 Basic stats: COMPLETE Column stats: COMPLETE
#### A masked pattern was here ####
                  table:
                      input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                      output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                      properties:
                        columns _col0,_col1
                        columns.types string:string
                        escape.delim \
                        hive.serialization.extend.additional.nesting.levels true
                        serialization.escape.crlf true
                        serialization.format 1
                        serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                  TotalFiles: 1
                  GatherStats: false
                  MultiFileSpray: false

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-22489.1.patch
13/Nov/19 14:18
24 kB
Krisztian Kasa
HIVE-22489.10.patch
07/Jan/20 17:11
8.80 MB
Krisztian Kasa
HIVE-22489.10.patch
07/Jan/20 11:22
8.80 MB
Krisztian Kasa
HIVE-22489.11.patch
09/Jan/20 08:04
8.80 MB
Krisztian Kasa
HIVE-22489.12.patch
13/Jan/20 13:33
8.85 MB
Krisztian Kasa
HIVE-22489.13.patch
16/Jan/20 17:29
8.85 MB
Krisztian Kasa
HIVE-22489.13.patch
16/Jan/20 05:36
8.85 MB
Krisztian Kasa
HIVE-22489.13.patch
14/Jan/20 20:39
8.85 MB
Krisztian Kasa
HIVE-22489.14.patch
17/Jan/20 13:58
8.85 MB
Krisztian Kasa
HIVE-22489.2.patch
18/Nov/19 20:01
25 kB
Krisztian Kasa
HIVE-22489.3.patch
04/Dec/19 10:23
1.26 MB
Krisztian Kasa
HIVE-22489.3.patch
29/Nov/19 14:46
1.26 MB
Krisztian Kasa
HIVE-22489.4.patch
12/Dec/19 15:00
6.98 MB
Krisztian Kasa
HIVE-22489.5.patch
18/Dec/19 17:12
8.43 MB
Krisztian Kasa
HIVE-22489.6.patch
19/Dec/19 11:35
8.42 MB
Krisztian Kasa
HIVE-22489.7.patch
03/Jan/20 11:50
8.44 MB
Krisztian Kasa
HIVE-22489.8.patch
06/Jan/20 12:23
8.77 MB
Krisztian Kasa
HIVE-22489.9.patch
07/Jan/20 07:01
8.79 MB
Krisztian Kasa
HIVE-22489.9.patch
06/Jan/20 14:24
8.79 MB
Krisztian Kasa

Issue Links

links to

Review Board

Reduce Sink operator should order nulls by parameter

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates