Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-44101 Support pandas 2
  3. SPARK-43282

Investigate DataFrame.sort_values with pandas behavior.

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 4.0.0
    • None
    • Pandas API on Spark
    • None

    Description

      import pandas as pd
      pdf = pd.DataFrame(
          {
              "a": pd.Categorical([1, 2, 3, 1, 2, 3]),
              "b": pd.Categorical(
                  ["b", "a", "c", "c", "b", "a"], categories=["c", "b", "d", "a"]
              ),
          },
      )
      pdf.groupby("a").apply(lambda x: x).sort_values(["a"])
      
      Traceback (most recent call last):
      ...
      ValueError: 'a' is both an index level and a column label, which is ambiguous. 

      We should investigate this issue whether this is intended behavior or just bug in pandas.

      Attachments

        Activity

          People

            Unassigned Unassigned
            itholic Haejoon Lee
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: