Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-12336

Sort Merge Partition Map Join

    XMLWordPrintableJSON

Details

    Description

      Logically and functionally bucketing and partitioning are quite similar - both provide mechanism to segregate and separate the table's data based on its content. Thanks to that significant further optimisations like [partition] PRUNING or [bucket] MAP JOIN are possible.
      The difference seems to be imposed by design where the PARTITIONing is open/explicit while BUCKETing is discrete/implicit.
      Partitioning seems to be very common if not a standard feature in all current RDBMS while BUCKETING seems to be HIVE specific only.
      In a way BUCKETING could be also called by "hashing" or simply "IMPLICIT PARTITIONING".

      Regardless of the fact that these two are recognised as two separate features available in Hive there should be nothing to prevent leveraging same existing query/join optimisations across the two.

      PARTITION SORT MERGE MAPJOIN
      Use the same type of optimization as in SORT MERGE BUCKETED MAP JOIN for partitioned tables.
      The sort-merge join optimization could be performed when PARTITIONED tables being joined are sorted and partitioned on the join columns.

      The corresponding partitions are joined with each other at the mapper. If both A and B have partitions set on their columns KEY, the following join
      SELECT /*+ MAPJOIN(b) */ a.key, a.value
      FROM A a JOIN B b ON a.key = b.key
      can be done on the mapper only. The mapper for the partition key='201512' for A will traverse the corresponding partition for B. Traversing is possible if the corresponding partitions are sorted on the same columns. This is dependent on (taken care by HIVE-11525)

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              mkoc Maciek Kocon
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: