Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-12495

DataFrame API: groupby(dropna=False) still drops NAs when grouping on multiple columns or indexes

Details

    Description

      df.groupby(['foo', 'bar'], dropna=False).sum()
      

      This will still drop NAs in the output.

      This is due to pandas bug 36470 "BUG: groupby(..., dropna=False) excludes NA values when grouping on MultiIndex levels".

      We implement groupby by moving all grouped data into the index and requiring Index() partitioning, so we will always run into this issue, even when the user is grouping on columns, not indexes.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              bhulette Brian Hulette

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 10m
                  2h 10m

                  Slack

                    Issue deployment