Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-41937

SparkR datetime column compare with Sys.time() throws error in R (>= 4.2.0)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.3.0
    • 3.4.0
    • R, SparkR

    Description

      Base R 4.2.0 introduced a change ([Rd] R 4.2.0 is released), "Calling if() or while() with a condition of length greater than one gives an error rather than a warning."

      The below code is a reproducible example of the issue. If it is executed in R >=4.2.0 then it will generate an error, or else just a warning message. `Sys.time()` is a multi-class object in R, and throughout the Spark R repository 'if' statement is used as: `if(class == "Column")` - this causes error in the latest R version >= 4.2.0. Note that R allows an object to have multiple 'class' names as a character vector (R: Object Classes); hence this type of check itself was not a good idea in the first place.

      The below chunks are executed on R version 4.1.3.

      {
       SparkR::sparkR.session()
       t <- Sys.time()
       sdf <- SparkR::createDataFrame(data.frame(x = t + c(-1, 1, -1, 1, -1)))
       SparkR::collect(SparkR::filter(sdf, SparkR::column('x') > t))
      }
      #> Warning in if (class(e2) == 'Column') {: the condition has length > 1 
      #> and only the first element will be used
      #> x
      #> 1 2023-01-07 20:40:20
      #> 2 2023-01-07 20:40:20 
      
      

       

       

      {
       Sys.setenv(`_R_CHECK_LENGTH_1_CONDITION_` = "true")
       SparkR::sparkR.session()
       t <- Sys.time()
       sdf <- SparkR::createDataFrame(data.frame(x = t + c(-1, 1, -1, 1, -1)))
       SparkR::collect(SparkR::filter(sdf, SparkR::column('x') > t))
      }
      #> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' 
      #> in selecting a method for function 'collect': error in evaluating the 
      #> argument 'condition' in selecting a method for function 'filter': the
      #> condition has length > 1 

       

      Similar issue is noted for these SparkR functions where Sys.time() type of multi-class data might be used: lit, fillna, when, otherwise, contains, ifelse 

      The suggested change is to add the `all` function (or `any`, as appropriate) while doing the check of whether `class(.)` is `Column` or not: `if(all(class(.) == "Column"))`. Or, better to use `base::inherits` for this check as `if(inherits(., "Column"))`.

      Attachments

        Activity

          People

            atalvivek Vivek Atal
            atalvivek Vivek Atal
            Vivek Atal Vivek Atal
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: