Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-17204 support un-bucketed tables in acid
  3. HIVE-17206

make a version of Compactor specific to unbucketed tables

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • None
    • None
    • Transactions
    • None

    Description

      current Compactor will work but is not optimized/flexible enough

      The current compactor is designed to generate the number of splits equal to the number of buckets in the table. That is the degree of parallelism.

      For unbucketed tables, the same is used but the "number of buckets" is derived from the files found in the deltas. For small writes, there will likely be just 1 bucket_00000 file. For large writes, the parallelism of the write determines the number of output files.

      Need to make sure Compactor can control parallelism for unbucketed tables as it wishes. For example, hash partition all records (by ROW__ID?) into N disjoint sets.

      Attachments

        Activity

          People

            ekoifman Eugene Koifman
            ekoifman Eugene Koifman
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: