Hive external table replication is done differently than managed table replication. In case of external table, list is created for the locations of the table and partitions to be replicated. If the partition location is within the table location, then partition location is not added to the list. For partitions with location outside table, partition location is added to the list. In case of incremental dump, the data related events are ignored and just the metadata related events are dumped. The list of location is prepared and that is used for replication. During load, the events are replayed and then the distcp tasks are created, one for each location present in the list.
For partition level replication, not all partition will be present in the dump. So even if the partition locations are within the table location, each partition location will be added to the list.
- If where condition is present in the REPL DUMP command then add location for each satisfying partition even though the partition location is within table location.
- If table is not mentioned in the where clause then follow the older behavior.
- If table is mentioned with a key but the key does not match any of the partitioned column then fail repl dump.
- If the table is mentioned with the key and even if all the partitions are satisfying the filter condition, add location for each partition. This is to avoid copying partitions which are added using alter after the dump.