Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-36348

unexpected Index loaded: pd.Index([10, 20, None], name="x")

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.2.0
    • 3.3.0
    • PySpark
    • None

    Description

      pidx = pd.Index([10, 20, 15, 30, 45, None], name="x")
      psidx = ps.Index(pidx)
      self.assert_eq(psidx.astype(str), pidx.astype(str))
      

      [left pandas on spark]: Index(['10.0', '20.0', '15.0', '30.0', '45.0', 'nan'], dtype='object', name='x')
      [right pandas]: Index(['10', '20', '15', '30', '45', 'None'], dtype='object', name='x')

      The index is loaded as float64, so the follow step like astype would be diff with pandas

      [1] https://github.com/apache/spark/blob/bcc595c112a23d8e3024ace50f0dbc7eab7144b2/python/pyspark/pandas/tests/indexes/test_base.py#L2249

      Attachments

        Issue Links

          Activity

            People

              yikunkero Yikun Jiang
              yikunkero Yikun Jiang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: