Uploaded image for project: 'HCatalog'
  1. HCatalog
  2. HCATALOG-142

Reducing JobConf size used by HCatInputFormat

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 0.2, 0.3
    • None
    • None

    Description

      Currently, the .setInput() call in HCat fetches information regarding all the partitions we want to read from, and stores it in the JobConf. The reason it stores it there is because it is statically called, and that information is required at the time the MR framework calls getSplits(). Since the first call is a static call and the second is a call on an object instantiated by the MR framework (implying no member variable based info passing), we pass that information along through the JobConf.

      Now, we could move the place where we contact the metastore to the getSplits() time, which means we contact the metastore late, but that breaks other things like being able to check whether the input can/will succeed, or checking the schema/etc. Now, we could follow a hybrid approach to address that too, and contact the metastore during the setInput() to get the schema, check whether input is possible, and not get the partition objects at that time to set in the jobconf, and then contact the metastore again during the getSplits() to populate the splits with information fetched from the partition objects.

      Issues with this approach still exist :
      a) Multiple contacts to the metastore increase number of times metastore load (technically, it's still only moving accesses around, so it should be okay, just that it's separated a bit more)
      b) Things like testing whether the partition objects are valid, whether the storage drivers specified exist/can be instantiated, etc are now at getSplits() time, which means the programs have a harder time of error-handling, since this happens after they submit a job rather than as a pre-run check-time. (this should also be okay for most programs)

      Further discussion/thoughts on this issue is welcome.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              sushanth Sushanth Sowmyan
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: