Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4866

Hash join node does not apply limits correctly

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 2.8.0, Impala 2.9.0
    • Impala 2.10.0
    • Backend

    Description

      There is a bunch of interesting behaviour where limits are not applied correctly and the number of rows returned is not updated correctly. Most the issues are masked by the fact that the planner sticks a exchange on top of most joins, but these turn into correctness issues when num_nodes=1.

      Note how these examples return 10 rows (the batch size) when the limit is set to 5.

      [localhost:21000] > set batch_size=10;
      BATCH_SIZE set to 10
      [localhost:21000] > set num_nodes=1;
      NUM_NODES set to 1
      [localhost:21000] > select straight_join t1.id, t2.id from functional.alltypes t1 inner join functional.alltypes t2 on t1.id = t2.id limit 5;summary;
      Query: select straight_join t1.id, t2.id from functional.alltypes t1 inner join functional.alltypes t2 on t1.id = t2.id limit 5
      Query submitted at: 2017-02-01 16:26:29 (Coordinator: http://tarmstrong-box:25000)
      Query progress can be monitored at: http://tarmstrong-box:25000/query_plan?query_id=414f96c409fdeace:79ad287d00000000
      +------+------+
      | id   | id   |
      +------+------+
      | 5460 | 5460 |
      | 5461 | 5461 |
      | 5462 | 5462 |
      | 5463 | 5463 |
      | 5464 | 5464 |
      | 5465 | 5465 |
      | 5466 | 5466 |
      | 5467 | 5467 |
      | 5468 | 5468 |
      | 5469 | 5469 |
      +------+------+
      Fetched 10 row(s) in 1.04s
      +-----------------+--------+----------+----------+-------+------------+-----------+---------------+------------------------+
      | Operator        | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem  | Est. Peak Mem | Detail                 |
      +-----------------+--------+----------+----------+-------+------------+-----------+---------------+------------------------+
      | 02:HASH JOIN    | 1      | 5.90ms   | 5.90ms   | 0     | 5          | 1.25 MB   | 31.37 KB      | INNER JOIN             |
      | |--01:SCAN HDFS | 1      | 7.09ms   | 7.09ms   | 7.30K | 7.30K      | 622.09 KB | 160.00 MB     | functional.alltypes t2 |
      | 00:SCAN HDFS    | 1      | 766.09ms | 766.09ms | 10    | 7.30K      | 606.09 KB | 160.00 MB     | functional.alltypes t1 |
      +-----------------+--------+----------+----------+-------+------------+-----------+---------------+------------------------+
      
      [localhost:21000] > select straight_join t1.id, t2.id from functional.alltypes t1 right join functional.alltypes t2 on t1.id = t2.int_col + 100000 limit 5;summary;
      Query: select straight_join t1.id, t2.id from functional.alltypes t1 right join functional.alltypes t2 on t1.id = t2.int_col + 100000 limit 5
      Query submitted at: 2017-02-01 16:30:11 (Coordinator: http://tarmstrong-box:25000)
      Query progress can be monitored at: http://tarmstrong-box:25000/query_plan?query_id=e485f8d3a8d2487:c8cb4f2100000000
      +------+------+
      | id   | id   |
      +------+------+
      | NULL | 3953 |
      | NULL | 3943 |
      | NULL | 3933 |
      | NULL | 3923 |
      | NULL | 3913 |
      | NULL | 3903 |
      | NULL | 3893 |
      | NULL | 3883 |
      | NULL | 3873 |
      | NULL | 3863 |
      +------+------+
      Fetched 10 row(s) in 1.05s
      +-----------------+--------+----------+----------+-------+------------+-----------+---------------+------------------------+
      | Operator        | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem  | Est. Peak Mem | Detail                 |
      +-----------------+--------+----------+----------+-------+------------+-----------+---------------+------------------------+
      | 02:HASH JOIN    | 1      | 17.78ms  | 17.78ms  | 10    | 5          | 1.02 MB   | 62.74 KB      | RIGHT OUTER JOIN       |
      | |--01:SCAN HDFS | 1      | 8.13ms   | 8.13ms   | 7.30K | 7.30K      | 626.09 KB | 160.00 MB     | functional.alltypes t2 |
      | 00:SCAN HDFS    | 1      | 757.83ms | 757.83ms | 7.30K | 7.30K      | 630.09 KB | 160.00 MB     | functional.alltypes t1 |
      +-----------------+--------+----------+----------+-------+------------+-----------+---------------+------------------------+
      

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            anujphadke Anuj Phadke
            tarmstrong Tim Armstrong
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment