Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25146

avg() returns null on some decimals

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 2.3.0, 2.3.1
    • None
    • SQL
    • None

    Description

      We compute some 0-10 numbers in a pipeline using Spark SQL. Then we average them. The average in some cases comes out to null to our surprise (and disappointment).

      After a bit of digging it looks like these numbers have ended up with the decimal(37,30) type. I've got a Spark Shell (2.3.0 and 2.3.1) repro with this type:

      scala> (1 to 10000).map(_*0.001).toDF.createOrReplaceTempView("x")
      
      scala> spark.sql("select cast(value as decimal(37, 30)) as v from x").createOrReplaceTempView("x")
      
      scala> spark.sql("select avg(v) from x").show
      +------+
      |avg(v)|
      +------+
      |  null|
      +------+
      

      For up to 4471 numbers it is able to calculate the average. For 4472 or more numbers it's null.

      Now I'll just change these numbers to double. But we got the types entirely automatically. We never asked for decimal. If this is the default type, it's important to support averaging a handful of them. (Sorry for the bitterness. I like double more. )

      Curiously, sum() works. And count() too. So it's quite the surprise that avg() fails.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              darabos Daniel Darabos
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: