Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-46862

Incorrect count() of a dataframe loaded from CSV datasource

    XMLWordPrintableJSON

Details

    Description

      The example below portraits the issue:

      >>> df=spark.read.option("multiline", "true").option("header", "true").option("escape", '"').csv("es-939111-data.csv")
      >>> df.count()
      4
      >>> df.cache()
      DataFrame[jobID: string, Name: string, City: string, Active: string]
      >>> df.count()
      5

      Attachments

        1. es-939111-data.csv
          0.1 kB
          Max Gekk

        Activity

          People

            maxgekk Max Gekk
            maxgekk Max Gekk
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: