[SPARK-12557] Spark 1.5.1 is unable to read S3 file system (Java exception - s3a.S3AFileSystem not found) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Not A Problem
Affects Version/s: 1.5.1
Fix Version/s: None
Component/s: EC2, PySpark
Labels:
None
Environment:

AWS (EC2) instances + S3 + Hadoop CDH

Description

Hello Technical Support team,

This is one of critical production issue we are facing on Spark version 1.5.1. It is throwing JAVA runtime exception error "apache.hadoop.fs.s3a.S3AFileSystem" not found. Although it works perfectly on Spark version 1.3.1. Is this known issue on Spark1.5.1? I have opened case with Cloudera CDH but they are not fully supporting this yet. We are using spark-shell (scala) now a day lot so end user would prefer this environment to execute there HQL and most of datasets exist at S3 bucket. Note that there is no complain if the dataset call from HDFS (Hadoop FS) so it seems to be related to my Spark configuration or something similar. Pls help to identify root cause and its solution. Following is the more technical info for review :

scala> val rdf1 = sqlContext.sql("Select * from ntcom.nc_currency_dim").collect()
rdf1: Array[org.apache.spark.sql.Row] = Array([-1,UNK,UNKNOWN,UNKNOWN,0.74,1.35,1.0,1.0,DBUDAL,11-JUN-2014 20:36:41,JHOSLE,2008-03-26 00:00:00.0,105.0,6.1,2014-06-11 20:36:41,2015-07-08 22:10:02,N], [-1,UNK,UNKNOWN,UNKNOWN,1.0,1.0,1.0,1.0,PDHAVA,08-JUL-2015 22:10:03,JHOSLE,2008-03-26 00:00:00.0,null,null,2015-07-08 22:10:03,3000-01-01 00:00:00,Y], [1,DKK,Danish Krone,Danish Krone,0.13,7.46,0.180965147453,5.53,DBUDAL,11-JUN-2014 20:36:41,NCBATCH,2007-01-16 00:00:00.0,19.0,1.1,2014-06-11 20:36:41,2015-07-08 22:10:02,N], [1,DKK,Danish Krone,Danish Krone,0.134048257372654,7.46,0.134048257372654,7.46,PDHAVA,08-JUL-2015 22:10:03,NCBATCH,2007-01-16 00:00:00.0,null,null,2015-07-08 22:10:03,3000-01-01 00:00:00,Y], [2,EUR,Euro,EMU currency (Euro),1.0,1.0,1.35,0.74,DBUDAL,11-JUN-2014 20:36:41,NCBA...

rdf1 = sqlContext.sql("Select * from dev_ntcom.nc_currency_dim").collect()
11:52 AM
ava.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2074)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2578)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$getTableOption$1$$anonfun$2.apply(ClientWrapper.scala:303)
at scala.Option.map(Option.scala:145)
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1980)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2072)
... 120 more
15/11/05 20:31:01 ERROR log: error in initSerDe: org.apache.hadoop.hive.serde2.SerDeException Encountered exception determining schema. Returning signal schema to indicate problem: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
org.apache.hadoop.hive.serde2.SerDeException: Encountered exception determining schema. Returning signal schema to indicate problem: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs
at org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:524)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:391)

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Chiragkumar

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 29/Dec/15 13:59

Updated:: 29/Dec/15 14:38

Resolved:: 29/Dec/15 14:03