Details
Description
When trying to create a Dataset from an RDD of Person (all using the Java API), I got the error "java.lang.UnsupportedOperationException: no encoder found for example_java.dataset.Person". This is not a very helpful error and no other logging information was apparent to help troubleshoot this.
It turned out that the problem was that my Person class did not have a default constructor and also did not have setter methods and that was the root cause.
This JIRA is for implementing a more usful error message to help Java developers who are trying out the Dataset API for the first time.
The full stack trace is:
Exception in thread "main" java.lang.UnsupportedOperationException: no encoder found for example_java.common.Person at org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$extractorFor(JavaTypeInference.scala:403) at org.apache.spark.sql.catalyst.JavaTypeInference$.extractorsFor(JavaTypeInference.scala:314) at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.javaBean(ExpressionEncoder.scala:75) at org.apache.spark.sql.Encoders$.bean(Encoder.scala:176) at org.apache.spark.sql.Encoders.bean(Encoder.scala)
NOTE that if I do provide EITHER the default constructor OR the setters, but not both, then I get a stack trace with much more useful information, but omitting BOTH causes this issue.
The original source is below.
public class JavaDatasetExample { public static void main(String[] args) throws Exception { SparkConf sparkConf = new SparkConf() .setAppName("Example") .setMaster("local[*]"); JavaSparkContext sc = new JavaSparkContext(sparkConf); SQLContext sqlContext = new SQLContext(sc); List<Person> people = ImmutableList.of( new Person("Joe", "Bloggs", 21, "NY") ); Dataset<Person> dataset = sqlContext.createDataset(people, Encoders.bean(Person.class));
class Person implements Serializable { String first; String last; int age; String state; public Person() { } public Person(String first, String last, int age, String state) { this.first = first; this.last = last; this.age = age; this.state = state; } public String getFirst() { return first; } public String getLast() { return last; } public int getAge() { return age; } public String getState() { return state; } }