Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-318

[Hive] union all queries broken - all kinds of problems

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • None
    • 0.3.0
    • Query Processor
    • None
    • Reviewed
    • HIVE-318. Fix union all queries. (Namit Jain via zshao)

    Description

      1. Map-only job : same input
      Hangs because mapper tries to same open twice, and hadoop filesystem complains.

      Fix: Only initialize once - keep state at the Operator level for the same. Should do same for Close.

      2. Map-only job : different inputs
      Loss of data due to rename.

      Fix: change rename to move files to the directory.

      3. Map-only job in subquery + RedSink: works currently

      4. 2 variables: so 4 sub-cases

      Number of sub-queries having map-reduce jobs. (1/2)
      Operator after Union (RS/FS)

      a. Number of sub-queries having map-reduce jobs. 1
      Operator after Union: RS

      Can be done in 2MR - really difficult with current infrastructure.
      Should do with 3 MR jobs now - break on top of UNION.
      Future optimization: move operators between Union and RS before Union.

      b. Number of sub-queries having map-reduce jobs. 2
      Operator after Union: RS

      Needs 3MR - Should do with 3 MR jobs - break on top of UNION.
      Future optimization: move operators between Union and RS before Union.

      c. Number of sub-queries having map-reduce jobs. 1
      Operator after Union: FS

      Can be done in 1MR - really difficult with current infrastructure.
      Can be easily done with 2 MR by removing UNION and cloning operators between Union and FS.
      Should do with 3 MR jobs now - break on top of UNION.
      Followup optimization: 2MR should be able to handle

      d. Number of sub-queries having map-reduce jobs. 2
      Operator after Union: FS

      Can be easily done with 2 MR by removing UNION and cloning operators between Union and FS.
      Should do with 3 MR jobs now - break on top of UNION.
      Followup optimization: 2MR should be able to handle

      Attachments

        1. hive.318.patch
          165 kB
          Namit Jain
        2. hive.318.2.patch
          166 kB
          Namit Jain
        3. hive.318.3.patch
          165 kB
          Namit Jain
        4. hive.318.4.patch
          162 kB
          Namit Jain
        5. hive.318.5.patch
          167 kB
          Namit Jain
        6. hive.318.6.patch
          3 kB
          Namit Jain
        7. hive.318.7.patch
          168 kB
          Namit Jain

        Issue Links

          Activity

            People

              namit Namit Jain
              namit Namit Jain
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: