Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12557

Spark 1.5.1 is unable to read S3 file system (Java exception - s3a.S3AFileSystem not found)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 1.5.1
    • None
    • EC2, PySpark
    • None
    • AWS (EC2) instances + S3 + Hadoop CDH

    Description

      Hello Technical Support team,

      This is one of critical production issue we are facing on Spark version 1.5.1. It is throwing JAVA runtime exception error "apache.hadoop.fs.s3a.S3AFileSystem" not found. Although it works perfectly on Spark version 1.3.1. Is this known issue on Spark1.5.1? I have opened case with Cloudera CDH but they are not fully supporting this yet. We are using spark-shell (scala) now a day lot so end user would prefer this environment to execute there HQL and most of datasets exist at S3 bucket. Note that there is no complain if the dataset call from HDFS (Hadoop FS) so it seems to be related to my Spark configuration or something similar. Pls help to identify root cause and its solution. Following is the more technical info for review :

      scala> val rdf1 = sqlContext.sql("Select * from ntcom.nc_currency_dim").collect()
      rdf1: Array[org.apache.spark.sql.Row] = Array([-1,UNK,UNKNOWN,UNKNOWN,0.74,1.35,1.0,1.0,DBUDAL,11-JUN-2014 20:36:41,JHOSLE,2008-03-26 00:00:00.0,105.0,6.1,2014-06-11 20:36:41,2015-07-08 22:10:02,N], [-1,UNK,UNKNOWN,UNKNOWN,1.0,1.0,1.0,1.0,PDHAVA,08-JUL-2015 22:10:03,JHOSLE,2008-03-26 00:00:00.0,null,null,2015-07-08 22:10:03,3000-01-01 00:00:00,Y], [1,DKK,Danish Krone,Danish Krone,0.13,7.46,0.180965147453,5.53,DBUDAL,11-JUN-2014 20:36:41,NCBATCH,2007-01-16 00:00:00.0,19.0,1.1,2014-06-11 20:36:41,2015-07-08 22:10:02,N], [1,DKK,Danish Krone,Danish Krone,0.134048257372654,7.46,0.134048257372654,7.46,PDHAVA,08-JUL-2015 22:10:03,NCBATCH,2007-01-16 00:00:00.0,null,null,2015-07-08 22:10:03,3000-01-01 00:00:00,Y], [2,EUR,Euro,EMU currency (Euro),1.0,1.0,1.35,0.74,DBUDAL,11-JUN-2014 20:36:41,NCBA...

      rdf1 = sqlContext.sql("Select * from dev_ntcom.nc_currency_dim").collect()
      11:52 AM
      ava.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
      at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2074)
      at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2578)
      at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$getTableOption$1$$anonfun$2.apply(ClientWrapper.scala:303)
      at scala.Option.map(Option.scala:145)
      Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
      at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1980)
      at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2072)
      ... 120 more
      15/11/05 20:31:01 ERROR log: error in initSerDe: org.apache.hadoop.hive.serde2.SerDeException Encountered exception determining schema. Returning signal schema to indicate problem: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
      org.apache.hadoop.hive.serde2.SerDeException: Encountered exception determining schema. Returning signal schema to indicate problem: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs
      at org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:524)
      at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:391)

      Attachments

        Activity

          People

            Unassigned Unassigned
            chiragkumarp Chiragkumar
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: