Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-47397

count_distinct ignores null values

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 3.4.1
    • None
    • Documentation, Spark Core
    • None

    Description

      The documentation states, that in group by and count statements, null values will not be ignored / form their own groups.


      However, the behavior of count_distinct does not account for nulls. 
      Either the documentation or the implementation is wrong here...

      Attachments

        1. image-2024-04-02-10-32-44-461.png
          56 kB
          Martin Rueckl
        2. image-2024-03-14-16-13-03-107.png
          134 kB
          Martin Rueckl
        3. image-2024-03-14-16-12-35-267.png
          125 kB
          Martin Rueckl

        Activity

          People

            Unassigned Unassigned
            martinitus Martin Rueckl
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: