Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.4
Description
If you attempt to run
df = df.replace(float('nan'), somethingToReplaceWith)
It will replace all 0 s in columns of type Integer
Example code snippet to repro this:
from pyspark.sql import SQLContext spark = SQLContext(sc).sparkSession df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) df.show() df = df.replace(float('nan'), 5) df.show()
Here's the output I get when I run this code:
Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.4.4 /_/ Using Python version 3.7.5 (default, Nov 1 2019 02:16:32) SparkSession available as 'spark'. >>> from pyspark.sql import SQLContext >>> spark = SQLContext(sc).sparkSession >>> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) >>> df.show() +-----+-----+ |index|value| +-----+-----+ | 1| 0| | 2| 3| | 3| 0| +-----+-----+ >>> df = df.replace(float('nan'), 5) >>> df.show() +-----+-----+ |index|value| +-----+-----+ | 1| 5| | 2| 3| | 3| 5| +-----+-----+ >>>