Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-18670

Spark application's dependency conflicts with Hadoop's dependency

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Invalid
    • 3.3.2
    • None
    • common
    • None

    Description

      The issue I'm going to describe happens with the distribution: Spark 3.3.2 (git revision 5103e00c4c) built for Hadoop 3.3.2

      Based on this ticket, as per my understanding, from Hadoop v3, there shouldn't be any conflict between the Hadoop's and Spark app's dependencies. But, I see a runtime failure with my spark app because of this conflict. Pasting the stack trace below:

      Caused by: java.lang.NoSuchMethodError: com.google.common.collect.Sets.newConcurrentHashSet()Ljava/util/Set;
          at org.apache.cassandra.config.Config.<init>(Config.java:102)
          at org.apache.cassandra.config.DatabaseDescriptor.clientInitialization(DatabaseDescriptor.java:288)
          at org.apache.cassandra.io.sstable.CQLSSTableWriter.<clinit>(CQLSSTableWriter.java:109)
          at com.<redacted>.spark.cassandra.bulkload.GameRecommendationsSSTWriter.init(GameRecommendationsSSTWriter.java:60)
          at com.<redacted>.spark.cassandra.bulkload.GameRecommendationsSSTWriter.<init>(GameRecommendationsSSTWriter.java:23)
          at com.<redacted>.spark.cassandra.bulkload.CassandraBulkLoad.execute(CassandraBulkLoad.java:93)
          at com.<redacted>.spark.cassandra.bulkload.CassandraBulkLoad.main(CassandraBulkLoad.java:60)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          at java.lang.reflect.Method.invoke(Method.java:498)
          at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:740)

      My Spark app has a transitive dependency on Guava library. It depends on cassandra-all lib, which does on guava lib. The jar guava-14.0.1 that comes in "spark-3.3.2-bin-hadoop3/jars" directory is a decade old and doesn't have Sets.newConcurrentHashSet() method. I'm able to run the spark app successfully by deleting that old version of guava jar from /jar directory and by including a recent version in my project's pom.xml.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              kirann Kiran N
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: