Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-21769 Support Partition level filtering for hive replication command
  3. HIVE-21771

Support partition filter (where clause) in REPL dump command (Bootstrap Dump)

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 4.0.0
    • None
    • HiveServer2, repl

    Description

      Bootstrap for managed table

      User should be allowed to execute REPL DUMP with where clause. The where clause should support filtering out partition from dump. Format of the where clause should be similar to "REPL DUMP dbname from 10 where 't0' where key < 10,'t1' where key = 3, '(t2*)|'t3' where key > 3".* For initial version, very basic filter condition will be supported and later the complexity will be increased as and when required.

      • From the AST generated for the where clause, extract the table information.
      • Generate AST for each table.
      • List the partition for each table using the AST generated for each table using the   same metastore API used by select query.
      • During bootstrap load use the partition list to dump the partitions.
      • During incremental dump, use the list to filter out the event.

      In case of bootstrap load, all the tables of the database will be scanned and

      • If table is not partitioned, then it will be dumped.
      • If key provided in the filter condition for the table is not a partition column, then dump will fail.
      • If table is not mentioned in the where clause, then all partitions of the table will be dumped.
      • All the partitioned of the table satisfying the where clause will be dumped.

      Incremental for managed table (Not part of this patch)

      In case of Incremental Dump, the events from the notification log will be scanned and once the partition spec is extracted from the event, the partition spec will be filtered against the condition.

      • If table is not partitioned then the event will be added to the dump.
      • If key mentioned is not a partition column, then dump will fail.
      • If the table is not mentioned in the filter then event will be added to the dump.
      • If the event is multi partitioned, then the event will be added to the dump. (Filtering out redundant partitions from message will be done as part of separate task).
      • If the partition spec matches the filter, then the event will be added to the dump*.*

       

      Attachments

        1. HIVE-21771.01.patch
          33 kB
          mahesh kumar behera
        2. HIVE-21771.02.patch
          33 kB
          mahesh kumar behera

        Issue Links

          Activity

            People

              maheshk114 mahesh kumar behera
              maheshk114 mahesh kumar behera
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h