Uploaded image for project: 'Zeppelin'
  1. Zeppelin
  2. ZEPPELIN-4971

XGBOOST4j Spark Fails String Indexer

    XMLWordPrintableJSON

Details

    Description

      I'm trying to follow the tutorial for running XGBOOST[ XGBOOST-SPARK|https://xgboost.readthedocs.io/en/latest/jvm/xgboost4j_spark_tutorial.html] on a Spark 3.0.0 cluster in Apache Zeppelin 0.8.2.

      However, when I load the dependencies:

      export SPARK_SUBMIT_OPTIONS="--package ml.dmlc:xgboo4j-spark_2.12:1.00"
      

      I get the following error when I run the following StringIndexer.

      val stringIndexer = new StringIndexer().
        setInputCol("class").
        setOutputCol("classIndex").
        fit(rawInput)
      
       
      java.lang.NoSuchMethodError: com.esotericsoftware.kryo.Kryo.setInstantiatorStrategy(Lorg/objenesis/strategy/InstantiatorStrategy;)V at com.twitter.chill.KryoBase.setInstantiatorStrategy(KryoBase.scala:99) at com.twitter.chill.EmptyScalaKryoInstantiator.newKryo(ScalaKryoInstantiator.scala:62) at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:131) at org.apache.spark.serializer.KryoSerializer$$anon$1.create(KryoSerializer.scala:102) at com.esotericsoftware.kryo.pool.KryoPoolQueueImpl.borrow(KryoPoolQueueImpl.java:48) at org.apache.spark.serializer.KryoSerializer$PoolWrapper.borrow(KryoSerializer.scala:109) at org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:336) at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:389) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.apply(Unknown Source) at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:184) at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:175) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) at scala.collection.TraversableLike.map(TraversableLike.scala:237) at scala.collection.TraversableLike.map$(TraversableLike.scala:230) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198) at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3625) at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2938) at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3616) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3614) at org.apache.spark.sql.Dataset.collect(Dataset.scala:2938) at org.apache.spark.ml.feature.StringIndexer.countByValue(StringIndexer.scala:204) at org.apache.spark.ml.feature.StringIndexer.sortByFreq(StringIndexer.scala:212) at org.apache.spark.ml.feature.StringIndexer.fit(StringIndexer.scala:241) ... 46 elided
      

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            vinhdiesal Vinh Tran
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: