Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-21096

Remove unnecessary Spark dependency from HS2 process

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • HiveServer2, Spark
    • None

    Description

      When a HiveOnSpark job is kicked off most of the work is done by the RemoteDriver, which is a separate process. There a couple of smaller parts of code, where HS2 process depends on Spark jars, these for example include receiving stats from the driver or putting together a Spark conf object - used mostly during communication with RemoteDriver.

      We can limit the data types used for such communication so that we don't use (and serialize) types that are in Spark codebase, and hence we can refactor our code to only use Spark jars in the Remote Driver process.

      I think this way would be cleaner from dependencies point of view, and also less erroneous when users have to compile the classpath for their HS2 processes.
      (E.g. due to a change between Spark 2.2 and 2.4 we had to also include spark-unsafe*.jar - though it's an internal change to Spark..)

      Attachments

        Activity

          People

            szita Ádám Szita
            szita Ádám Szita
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: