Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-7644

hive custom udf cannot be used in the join_condition(on)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.12.0
    • None
    • Clients
    • None

    Description

      console:
      hive> ADD JAR xxxxx;
      Added xxxxx to class path
      Added resource: xxxxx
      hive> create temporary function func1 as 'xxx';
      OK
      Time taken: 0.009 seconds
      hive> list jars;
      xxx.jar
      hive> select /*+ MAPJOIN(certain column1) */
      > *
      > from tb1
      > join tb2 on tb1.column2 = func1(tb2.column3)
      > ;
      Total MapReduce jobs = 1
      Execution log at: /tmp/[username]/[username]_20140807173838_d673690f-c452-4ebb-bf53-9d663c49d04e.log
      2014-08-07 05:38:05 Starting to launch local task to process map join; maximum memory = 2027290624
      Execution failed with exit status: 2
      Obtaining error information

      Task failed!
      Task ID:
      Stage-4

      Logs:

      /tmp/[username]/hive.log
      FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask

      --------------------------------------------------------------------------------------
      Then I watch the log named /tmp/[username]/[username]_20140807173838_d673690f-c452-4ebb-bf53-9d663c49d04e.log, it writes:
      2014-08-07 16:46:59,105 INFO mr.MapredLocalTask (SessionState.java:printInfo(417)) - 2014-08-07 04:46:59 Starting to launch local task to process map join; maximum memory = 2027290624
      2014-08-07 16:46:59,114 INFO mr.MapredLocalTask (MapredLocalTask.java:initializeOperators(389)) - fetchoperator for tmp_compete created
      2014-08-07 16:46:59,196 INFO exec.TableScanOperator (Operator.java:initialize(338)) - Initializing Self 0 TS
      2014-08-07 16:46:59,197 INFO exec.TableScanOperator (Operator.java:initializeChildren(403)) - Operator 0 TS initialized
      2014-08-07 16:46:59,197 INFO exec.TableScanOperator (Operator.java:initializeChildren(407)) - Initializing children of 0 TS
      2014-08-07 16:46:59,197 INFO exec.HashTableSinkOperator (Operator.java:initialize(442)) - Initializing child 1 HASHTABLESINK
      2014-08-07 16:46:59,197 INFO exec.HashTableSinkOperator (Operator.java:initialize(338)) - Initializing Self 1 HASHTABLESINK
      2014-08-07 16:46:59,198 INFO mapjoin.MapJoinMemoryExhaustionHandler (MapJoinMemoryExhaustionHandler.java:<init>(72)) - JVM Max Heap Size: 2027290624
      2014-08-07 16:46:59,222 ERROR mr.MapredLocalTask (MapredLocalTask.java:executeFromChildJVM(324)) - Hive Runtime Error: Map local work failed
      org.apache.hadoop.hive.ql.exec.UDFArgumentException: The UDF implementation class 'xxx' is not present in the class path
      at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.initialize(GenericUDFBridge.java:142)
      at org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:116)
      at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:127)
      at org.apache.hadoop.hive.ql.exec.JoinUtil.getObjectInspectorsFromEvaluators(JoinUtil.java:66)
      at org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.initializeOp(HashTableSinkOperator.java:140)
      at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377)
      at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:453)
      at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:409)
      at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:188)
      at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377)
      at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.initializeOperators(MapredLocalTask.java:408)
      at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:302)
      at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:728)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:606)
      at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

      ------------------------------------------------------------------------------------
      I ensure there is no authorization problem with it,and when the udf is not in the join-condition such as 'select udf(column_name)' or 'where udf(column_name)' it works good.
      Anyone else encountered the problem?

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              Hayok_bee Hayok
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: