Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32046

current_timestamp called in a cache dataframe freezes the time for all future calls

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 2.3.0, 2.4.4, 3.0.0
    • None
    • SQL

    Description

      If I call current_timestamp 3 times while caching the dataframe variable in order to freeze that dataframe's time, the 3rd dataframe time and beyond (4th, 5th, ...) will be frozen to the 2nd dataframe's time. The 1st dataframe and the 2nd will differ in time but will become static on the 3rd usage and beyond (when running on Zeppelin or Jupyter).

      Additionally, caching only caused 2 dataframes to cache skipping the 3rd. However,

      val df = Seq(java.time.LocalDateTime.now.toString).toDF("datetime").cache
      df.count
      
      // this can be run 3 times no issue.
      // then later cast to TimestampType

      doesn't have this problem and all 3 dataframes cache with correct times displaying.

      Running the code in shell and Jupyter or Zeppelin (ZP) also produces different results. In the shell, you only get 1 unique time no matter how many times you run it, current_timestamp. However, in ZP or Jupyter I have always received 2 unique times before it froze.

       

      val df1 = spark.range(1).select(current_timestamp as "datetime").cache
      df1.count
      
      df1.show(false)
      
      Thread.sleep(9500)
      
      val df2 = spark.range(1).select(current_timestamp as "datetime").cache
      df2.count 
      
      df2.show(false)
      
      Thread.sleep(9500)
      
      val df3 = spark.range(1).select(current_timestamp as "datetime").cache 
      df3.count 
      
      df3.show(false)

      Attachments

        Activity

          People

            Unassigned Unassigned
            dustin.smith.TDG Dustin Smith
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: