Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12932

Bad error message with trying to create Dataset from RDD of Java objects that are not bean-compliant

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Trivial
    • Resolution: Fixed
    • 1.6.0
    • 1.6.1, 2.0.0
    • Java API
    • None
    • Ubuntu 15.10 / Java 8

    Description

      When trying to create a Dataset from an RDD of Person (all using the Java API), I got the error "java.lang.UnsupportedOperationException: no encoder found for example_java.dataset.Person". This is not a very helpful error and no other logging information was apparent to help troubleshoot this.

      It turned out that the problem was that my Person class did not have a default constructor and also did not have setter methods and that was the root cause.

      This JIRA is for implementing a more usful error message to help Java developers who are trying out the Dataset API for the first time.

      The full stack trace is:

      Exception in thread "main" java.lang.UnsupportedOperationException: no encoder found for example_java.common.Person
      	at org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$extractorFor(JavaTypeInference.scala:403)
      	at org.apache.spark.sql.catalyst.JavaTypeInference$.extractorsFor(JavaTypeInference.scala:314)
      	at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.javaBean(ExpressionEncoder.scala:75)
      	at org.apache.spark.sql.Encoders$.bean(Encoder.scala:176)
      	at org.apache.spark.sql.Encoders.bean(Encoder.scala)
      

      NOTE that if I do provide EITHER the default constructor OR the setters, but not both, then I get a stack trace with much more useful information, but omitting BOTH causes this issue.

      The original source is below.

      Example.java
      public class JavaDatasetExample {
      
          public static void main(String[] args) throws Exception {
      
              SparkConf sparkConf = new SparkConf()
                      .setAppName("Example")
                      .setMaster("local[*]");
      
              JavaSparkContext sc = new JavaSparkContext(sparkConf);
      
              SQLContext sqlContext = new SQLContext(sc);
      
              List<Person> people = ImmutableList.of(
                      new Person("Joe", "Bloggs", 21, "NY")
              );
      
              Dataset<Person> dataset = sqlContext.createDataset(people, Encoders.bean(Person.class));
      
      
      Person.java
      class Person implements Serializable {
      
          String first;
          String last;
          int age;
          String state;
      
          public Person() {
          }
      
          public Person(String first, String last, int age, String state) {
              this.first = first;
              this.last = last;
              this.age = age;
              this.state = state;
          }
      
          public String getFirst() {
              return first;
          }
      
          public String getLast() {
              return last;
          }
      
          public int getAge() {
              return age;
          }
      
          public String getState() {
              return state;
          }
      
      }
      

      Attachments

        Activity

          People

            andygrove Andy Grove
            andygrove Andy Grove
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: