Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-21206

Bootstrap replication is slow as it opens lot of metastore connections.

    XMLWordPrintableJSON

Details

    Description

      Hive bootstrap replication of 1TB data onprem to onprem in Hive3 is running slower compared to Hive2.

      Time taken for bootstrap replication of table with 1000 partitions are as below:

      Hive2- Hive2 Hive3 - Hive3
      Bootstrap: 7m BootStrap: 17m

      Every MoveTask is closing and opening new metastore connection which is causing slow down.

      2019-02-08T12:28:30,174 INFO  [HiveServer2-Background-Pool: Thread-1134]: ql.Driver (:()) - Starting task [Stage-5:MOVE] in serial mode
      2019-02-08T12:28:30,177 INFO  [HiveServer2-Background-Pool: Thread-1134]: exec.Task (:()) - Loading data to table nondefault.nondefault_table1 from hdfs://mycluster1/warehouse/tablespace/managed/hive/nondefault.db/nondefault_table1/.hive-staging_hive_2019-02-08_12-28-23_584_1482331698286040936-3/-ext-10001
      2019-02-08T12:28:30,189 INFO  [HiveServer2-Background-Pool: Thread-1134]: metastore.HiveMetaStoreClient (:()) - Trying to connect to metastore with URI thrift://ctr-e139-1542663976389-62755-01-000014.hwx.site:9083
      2019-02-08T12:28:30,189 INFO  [HiveServer2-Background-Pool: Thread-1134]: metastore.HiveMetaStoreClient (:()) - HMSC::open(): Could not find delegation token. Creating KERBEROS-based thrift connection.
      2019-02-08T12:28:30,206 INFO  [HiveServer2-Background-Pool: Thread-1134]: metastore.HiveMetaStoreClient (:()) - Opened a connection to metastore, current connections: 4
      2019-02-08T12:28:30,206 INFO  [HiveServer2-Background-Pool: Thread-1134]: metastore.HiveMetaStoreClient (:()) - Connected to metastore.
      2019-02-08T12:28:30,206 INFO  [HiveServer2-Background-Pool: Thread-1134]: metastore.RetryingMetaStoreClient (:()) - RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=hive/ctr-e139-1542663976389-62755-01-000014.hwx.site@HWQE.HORTONWORKS.COM (auth:KERBEROS) retries=24 delay=5 lifetime=0
      2019-02-08T12:28:30,325 INFO  [org.apache.ranger.audit.queue.AuditBatchQueue1]: provider.BaseAuditHandler (:()) - Audit Status Log: name=hiveServer2.async.multi_dest.batch, finalDestination=hiveServer2.async.multi_dest.batch.solr, interval=01:00.002 minutes, events=2, succcessCount=1, totalEvents=56, totalSuccessCount=25
      2019-02-08T12:28:30,520 INFO  [HiveServer2-Background-Pool: Thread-1134]: common.FileUtils (FileUtils.java:mkdir(580)) - Creating directory if it doesn't exist: hdfs://mycluster1/warehouse/tablespace/managed/hive/nondefault.db/nondefault_table1/base_0000001
      2019-02-08T12:28:31,245 INFO  [HiveServer2-Background-Pool: Thread-1134]: ql.Driver (:()) - Starting task [Stage-11:MOVE] in serial mode
      2019-02-08T12:28:31,245 INFO  [HiveServer2-Background-Pool: Thread-1134]: metastore.HiveMetaStoreClient (:()) - Closed a connection to metastore, current connections: 3
      2019-02-08T12:28:31,246 INFO  [HiveServer2-Background-Pool: Thread-1134]: exec.Task (:()) - Loading data to table nondefault.nondefault_table2 from hdfs://mycluster1/warehouse/tablespace/managed/hive/nondefault.db/nondefault_table2/.hive-staging_hive_2019-02-08_12-28-23_810_7457138692783022870-3/-ext-10002
      2019-02-08T12:28:31,327 INFO  [HiveServer2-Background-Pool: Thread-1134]: metastore.HiveMetaStoreClient (:()) - Trying to connect to metastore with URI thrift://ctr-e139-1542663976389-62755-01-000014.hwx.site:9083
      2019-02-08T12:28:31,327 INFO  [HiveServer2-Background-Pool: Thread-1134]: metastore.HiveMetaStoreClient (:()) - HMSC::open(): Could not find delegation token. Creating KERBEROS-based thrift connection.
      2019-02-08T12:28:31,336 INFO  [HiveServer2-Background-Pool: Thread-1134]: metastore.HiveMetaStoreClient (:()) - Opened a connection to metastore, current connections: 4
      2019-02-08T12:28:31,337 INFO  [HiveServer2-Background-Pool: Thread-1134]: metastore.HiveMetaStoreClient (:()) - Connected to metastore.
      2019-02-08T12:28:31,337 INFO  [HiveServer2-Background-Pool: Thread-1134]: metastore.RetryingMetaStoreClient (:()) - RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=hive/ctr-e139-1542663976389-62755-01-000014.hwx.site@HWQE.HORTONWORKS.COM (auth:KERBEROS) retries=24 delay=5 lifetime=0
      2019-02-08T12:28:31,642 INFO  [HiveServer2-Background-Pool: Thread-1134]: common.FileUtils (FileUtils.java:mkdir(580)) - Creating directory if it doesn't exist: hdfs://mycluster1/warehouse/tablespace/managed/hive/nondefault.db/nondefault_table2/base_0000001
      

      Attachments

        1. HIVE-21206.03.patch
          5 kB
          Sankar Hariappan
        2. HIVE-21206.02.patch
          4 kB
          Sankar Hariappan
        3. HIVE-21206.01.patch
          2 kB
          Sankar Hariappan

        Issue Links

          Activity

            People

              sankarh Sankar Hariappan
              sankarh Sankar Hariappan
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m