Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18178

Importing Pandas Tables with Missing Values

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.0.0
    • None
    • PySpark

    Description

      If you import a table with missing values (like below) and create a dataframe from it, everything works fine until the command is actually execute (.first(), or .toPandas(), etc). The problem came up with a much larger table with values that were not NAN, just empty.

      ```
      import pandas as pd
      from io import StringIO
      test_df = pd.read_csv(StringIO(',Scan Options\n15,SAT2\n16,\n'))
      sqlContext.createDataFrame(test_df).registerTempTable('Test')
      o_qry = sqlContext.sql("SELECT * FROM Test LIMIT 1")
      o_qry.first()
      ```

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              skicavs Kevin Mader
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: