Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19776

Is the JavaKafkaWordCount example correct for Spark version 2.1?

    XMLWordPrintableJSON

Details

    • Question
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 2.1.0
    • None
    • Examples, ML
    • None

    Description

      My question is https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaKafkaWordCount.java correct?

      I'm pretty new to both Spark and Java. I wanted to find an example of Spark Streaming using Java, streaming from Kafka. The JavaKafkaWordCount at https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaKafkaWordCount.java looked to be perfect.

      However, when I tried running it, I found a couple of issues that I needed to overcome.

      1. This line was unnecessary:

      StreamingExamples.setStreamingLogLevels();
      

      Having this line in there (and the associated import) caused me to go looking for a dependency spark-examples_2.10 which of no real use to me.

      2. After running it, this line:

      JavaPairReceiverInputDStream<String, String> messages = KafkaUtils.createStream(jssc, args[0], args[1], topicMap);
      

      Appeared to throw an error around logging:

      Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/Logging
              at java.lang.ClassLoader.defineClass1(Native Method)
              at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
              at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
              at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
              at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
              at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
              at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
              at java.security.AccessController.doPrivileged(Native Method)
              at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
              at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
              at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
              at org.apache.spark.streaming.kafka.KafkaUtils$.createStream(KafkaUtils.scala:91
              at org.apache.spark.streaming.kafka.KafkaUtils$.createStream(KafkaUtils.scala:66
              at org.apache.spark.streaming.kafka.KafkaUtils$.createStream(KafkaUtils.scala:11
              at org.apache.spark.streaming.kafka.KafkaUtils.createStream(KafkaUtils.scala)
              at main.java.com.cm.JavaKafkaWordCount.main(JavaKafkaWordCount.java:72)
      

      To get around this, I found that the code sample in https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html helped me to come up with the right lines to see streaming from Kafka in action. Specifically this called createDirectStream instead of createStream.

      So is the example in https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaKafkaWordCount.java or is there something I could have done differently to get that example working?

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              russellacm Russell Abedin
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: