Description
As per *https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala#L119-L126)*,
"Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the referenced columns only include the internal corrupt record column"
But it's allowing while querying only the internal corrupt record column in case of filter operation.
from pyspark.sql.types import * schema = StructType([ StructField("_corrupt_record", StringType(), False), StructField("Name", StringType(), False), StructField("Colour", StringType(), True), StructField("Price", IntegerType(), True), StructField("Quantity", IntegerType(), True)]) df = spark.read.csv("fruit.csv", schema=schema, mode="PERMISSIVE") df.filter(df._corrupt_record.isNotNull()).show() # Allowed