[SPARK-43282] Investigate DataFrame.sort_values with pandas behavior. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Won't Fix
Affects Version/s: 4.0.0
Fix Version/s: None
Component/s: Pandas API on Spark
Labels:
None

Description

import pandas as pd
pdf = pd.DataFrame(
    {
        "a": pd.Categorical([1, 2, 3, 1, 2, 3]),
        "b": pd.Categorical(
            ["b", "a", "c", "c", "b", "a"], categories=["c", "b", "d", "a"]
        ),
    },
)
pdf.groupby("a").apply(lambda x: x).sort_values(["a"])

Traceback (most recent call last):
...
ValueError: 'a' is both an index level and a column label, which is ambiguous.

We should investigate this issue whether this is intended behavior or just bug in pandas.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Haejoon Lee

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 25/Apr/23 13:07

Updated:: 15/Sep/23 05:59

Resolved:: 15/Sep/23 05:59