Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-31310

percentile_approx function not working as expected

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • 2.4.0, 2.4.1, 2.4.3
    • None
    • Spark Core
    • None
    • park-sql-2.4.1v with Java 8

    Description

      I'm using spark-sql-2.4.1v with Java 8 and I'm trying to do find quantiles, i.e. percentile 0, percentile 25, etc, on the given column data of dataframe.

      Column values data set is as below 23456.55,34532.55,23456.55

      When I use percentile_approx() function the results are not matching to that of Excel percentile_inc() function.

      Ex :

      for above data set i.e. 23456.55,34532.55,23456.55
      percentile_0,percentile_10,percentile_25,percentile_50,percentile_75,percentile_90,percentile_100 respectively
      using percentile_approx() function
      23456.55,23456.55,23456.55,23456.55,23456.55,23456.55,23456.55

      Using excel i.e. percentile_inc()
      23456.55,23456.55,23456.55,23456.55,28994.550000000003,32317.350000000002,34532.55
      How to get correct percentiles as excel using percentile_approx() function?

      For the details please check it.

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            BdLearner Shyam
            L. C. Hsieh L. C. Hsieh
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: