Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-21029

External table replication for existing deployments running incremental replication.

    XMLWordPrintableJSON

Details

    Description

      Existing deployments using hive replication do not get external tables replicated. For such deployments to enable external table replication they will have to provide a specific switch to first bootstrap external tables as part of hive incremental replication, following which the incremental replication will take care of further changes in external tables.

      The switch will be provided by an additional hive configuration (for ex: hive.repl.bootstrap.external.tables) and is to be used in

       WITH 

      clause of

       REPL DUMP 

      command.

      Additionally the existing hive config hive.repl.include.external.tables will always have to be set to "true" in the above clause.

      Proposed usage for enabling external tables replication on existing replication policy.
      1. Consider an ongoing repl policy <db1> in incremental phase.
      Enable hive.repl.include.external.tables=true and hive.repl.bootstrap.external.tables=true in next incremental REPL DUMP.

      • Dumps all events but skips events related to external tables.
      • Instead, combine bootstrap dump for all external tables under “_bootstrap” directory.
      • Also, includes the data locations file "_external_tables_info”.
      • LIMIT or TO clause shouldn’t be there to ensure the latest events are dumped before bootstrap dumping external tables.

      2. REPL LOAD on this dump applies all the events first, copies external tables data and then bootstrap external tables (metadata).

      • It is possible that the external tables (metadata) are not point-in time consistent with rest of the tables.
      • But, it would be eventually consistent when the next incremental load is applied.
      • This REPL LOAD is fault tolerant and can be retried if failed.

      3. All future REPL DUMPs on this repl policy should set hive.repl.bootstrap.external.tables=false.

      • If not set to false, then target might end up having inconsistent set of external tables as bootstrap wouldn’t clean-up any dropped external tables.

      Attachments

        1. HIVE-21029.01.patch
          50 kB
          Sankar Hariappan
        2. HIVE-21029.02.patch
          52 kB
          Sankar Hariappan
        3. HIVE-21029.03.patch
          56 kB
          Sankar Hariappan
        4. HIVE-21029.04.patch
          57 kB
          Sankar Hariappan

        Issue Links

          Activity

            People

              sankarh Sankar Hariappan
              anishek Anishek Agarwal
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: