Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16865 Handle replication bootstrap of large databases
  3. HIVE-16896

move replication load related work in semantic analysis phase to execution phase using a task

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0
    • None
    • None
    • Reviewed

    Description

      we want to not create too many tasks in memory in the analysis phase while loading data. Currently we load all the files in the bootstrap dump location as FileStatus[] and then iterate over it to load objects, we should rather move to

      org.apache.hadoop.fs.RemoteIterator<LocatedFileStatus>	listFiles(Path f, boolean recursive)
      

      which would internally batch and return values.

      additionally since we cant hand off partial tasks from analysis pahse => execution phase, we are going to move the whole repl load functionality to execution phase so we can better control creation/execution of tasks (not related to hive Task, we may get rid of ReplCopyTask)

      Additional consideration to take into account at the end of this jira is to see if we want to specifically do a multi threaded load of bootstrap dump.

      Attachments

        1. HIVE-16896.1.patch
          119 kB
          Anishek Agarwal
        2. HIVE-16896.2.patch
          120 kB
          Anishek Agarwal
        3. HIVE-16896.3.patch
          140 kB
          Anishek Agarwal

        Issue Links

          Activity

            People

              anishek Anishek Agarwal
              anishek Anishek Agarwal
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: