Uploaded image for project: 'Zeppelin'
  1. Zeppelin
  2. ZEPPELIN-5914

spark.sparkContext.textFile(...) is not working

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 0.11.0
    • None
    • spark, zeppelin-interpreter
    • None

    Description

      When using spark-3.1.2 with hadoop 3.2.0 spark.sparkContext.textFile(..) throws a NPE:

       
      java.lang.NullPointerException
      at org.apache.hadoop.mapred.TextInputFormat.isSplitable(TextInputFormat.java:49)
      at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:363)
      at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:205)
      at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300)
      at scala.Option.getOrElse(Option.scala:189)
      at org.apache.spark.rdd.RDD.partitions(RDD.scala:296)
      at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
      at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300)
      at scala.Option.getOrElse(Option.scala:189)
      at org.apache.spark.rdd.RDD.partitions(RDD.scala:296)
      at org.apache.spark.SparkContext.runJob(SparkContext.scala:2261)
      at org.apache.spark.rdd.RDD.count(RDD.scala:1253)
      ... 114 elided
       

      Built zepplein from master branch via
      mvn clean package -DskipTests -Dcheckstyle.skip -Pbuild-distr -Pspark-3.1 -Phadoop3 -Pspark-scala-2.12 -Pscala-2.12 -Pinclude-hadoop -Pweb-dist
       

      This works on spark-shell (spark-3.1.2 w/ hadoop 3.2.0) but not on zeppelins spark interpreter. 

       

      This is blocking us from using spark 3.1.2 in production.

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            amandeep.kaur Amandeep
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: