Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-40074

Error while creating dataset in Java spark-3.x using Encoders bean with Dense Vector. (Issue arises when updating spark from 2.4 to 3.x)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.1.2, 3.3.0, 3.2.2
    • None
    • Java API, ML, SQL
    • None
    • Scala 2.12

      Spark 3.x

    Description

      Encountered a compatibility issue while upgrading spark from 2.4 to 3.x (also scala is upgraded from 2.11 to 2.12). 

      This java code below used to work with spark 2.4 but when migrated to 3.x it gives the error (mentioned below) I have done my own research but couldn't find a solution or any related information.
       

       

      Code.java
      public void test() {
      
      final SparkSession spark = SparkSession.builder()
      .appName("Test")
      .getOrCreate();
      
      DenseClass denseFactor1 = new DenseClass( new DenseVector( new double[]{0.13, 0.24}));
      
      DenseClass denseFactor2 = new DenseClass( new DenseVector( new double[]{0.24, 0.32}));
      
      final List<DenseClass> inputsNew = Arrays.asList(denseFactor1, denseFactor2);
      
      final Dataset<DenseClass> denseVectorDf = spark.createDataset(inputsNew, Encoders.bean(DenseClass.class));
      
      denseVectorDf.printSchema();
      }
      
      
      public static class DenseClass implements Serializable
      
      { private org.apache.spark.ml.linalg.DenseVector denseVector; }

      The error occurs while creating the dataset denseVectorDf .

      Error
       

      }}
      {{org.apache.spark.sql.AnalysisException: Cannot up cast `denseVector` from struct<> to struct<type:tinyint,size:int,indices:array<int>,values:array<double>>.
      The type path of the target object is:
       - field (class: "org.apache.spark.ml.linalg.DenseVector", name: "denseVector")
      You can either add an explicit cast to the input data or choose a higher precision type of the field in the target object}}
      
      {{

      I have tried to use double instead of dense vector and it works just fine, but fails on using the dense vector with encoders bean.

       

      StackOverflow link for the issue: https://stackoverflow.com/questions/73313660/error-while-creating-dataset-in-java-spark-3-x-using-encoders-bean-with-dense-ve

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            anujgrgv Anuj Gargava
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: