Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-35814

Mismatched types when creating a new column when using Arrow

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Invalid
    • 3.1.2
    • None
    • R
    • None

    Description

      I was looking into a question on StackOverflow where a user had an error due to a mismatch of field types  (https://stackoverflow.com/questions/68032084/how-to-fix-error-in-readbin-using-arrow-package-with-sparkr/).  I've made a reproducible example below.

       

      library(SparkR)
      
      library(arrow)
      
      SparkR::sparkR.session(sparkConfig = list(spark.sql.execution.arrow.sparkr.enabled = "true"))
      
      readr::write_csv(iris, "iris.csv")dfSchema <- structType(
        structField("Sepal.Length", "int"),
        structField("Sepal.Width", "double"),
        structField("Petal.Length", "double"),
        structField("Petal.Width", "double"),
        structField("Species", "string")
      )
      
      spark_df <- SparkR::read.df(path="iris.csv", source="csv", schema=dfSchema)
      
      # Apply an R native function to each partition.
      returnSchema <-  structType(structField("Sep_add_one", "int"))
      
      df <- SparkR::dapply(spark_df, function(rdf){
        data.frame(rdf$Sepal.Length + 1)
        }, returnSchema
      )
      
      collect(df)
      

      It results in this error:

      Error in readBin(con, raw(), as.integer(dataLen), endian = "big") : 
        invalid 'n' argument
      

      I've never used SparkR before, so could be wrong, but I think it may be due to the use of this line in DataFrame.R:

      arrowTable <- arrow::read_ipc_stream(readRaw(conn))
      

      readRaw assumes that the the value is an integer; however when returnSchema is being created, the code creates a double (the code runs without error if it's updated to `rdf$Sepal.Length + 1L`)

      I wasn't sure if perhaps readTypedObject needs to be used or if maybe some type checking might be useful?

      Apologies for the partial explanation - not entirely sure of the cause.

      Attachments

        Activity

          People

            Unassigned Unassigned
            thisisnic Nicola Crane
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: