Bootstrap for managed table
User should be allowed to execute REPL DUMP with where clause. The where clause should support filtering out partition from dump. Format of the where clause should be similar to "REPL DUMP dbname from 10 where 't0' where key < 10,'t1' where key = 3, '(t2*)|'t3' where key > 3".* For initial version, very basic filter condition will be supported and later the complexity will be increased as and when required.
- From the AST generated for the where clause, extract the table information.
- Generate AST for each table.
- List the partition for each table using the AST generated for each table using the same metastore API used by select query.
- During bootstrap load use the partition list to dump the partitions.
- During incremental dump, use the list to filter out the event.
In case of bootstrap load, all the tables of the database will be scanned and
- If table is not partitioned, then it will be dumped.
- If key provided in the filter condition for the table is not a partition column, then dump will fail.
- If table is not mentioned in the where clause, then all partitions of the table will be dumped.
- All the partitioned of the table satisfying the where clause will be dumped.
Incremental for managed table (Not part of this patch)
In case of Incremental Dump, the events from the notification log will be scanned and once the partition spec is extracted from the event, the partition spec will be filtered against the condition.
- If table is not partitioned then the event will be added to the dump.
- If key mentioned is not a partition column, then dump will fail.
- If the table is not mentioned in the filter then event will be added to the dump.
- If the event is multi partitioned, then the event will be added to the dump. (Filtering out redundant partitions from message will be done as part of separate task).
- If the partition spec matches the filter, then the event will be added to the dump*.*