Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18593

JDBCRDD returns incorrect results for filters on CHAR of PostgreSQL

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.6.2, 1.6.3
    • 2.0.0
    • SQL

    Description

      In Apache Spark 1.6.x, JDBCRDD returns incorrect results for a query with filters on CHAR column with PostgreSQL CHAR type. The root cause is PostgreSQL returns `space padded string` for a result. So, the post processing filter `Filter (a#0 = A)` is evaluated false. Spark 2.0.0 removes the post filter because it is already handled in the database by `PushedFilters: [EqualTo(a,A)]`.

      scala> val t_char = sqlContext.read.option("user", "postgres").option("password", "rootpass").jdbc("jdbc:postgresql://localhost:5432/postgres", "t_char", new java.util.Properties())
      t_char: org.apache.spark.sql.DataFrame = [a: string]
      
      scala> val t_varchar = sqlContext.read.option("user", "postgres").option("password", "rootpass").jdbc("jdbc:postgresql://localhost:5432/postgres", "t_varchar", new java.util.Properties())
      t_varchar: org.apache.spark.sql.DataFrame = [a: string]
      
      scala> t_char.show
      +----------+
      |         a|
      +----------+
      |A         |
      |AA        |
      |AAA       |
      +----------+
      
      
      scala> t_varchar.show
      +---+
      |  a|
      +---+
      |  A|
      | AA|
      |AAA|
      +---+
      
      
      scala> t_char.filter(t_char("a")==="A").show
      +---+
      |  a|
      +---+
      +---+
      
      
      scala> t_char.filter(t_char("a")==="A         ").show
      +----------+
      |         a|
      +----------+
      |A         |
      +----------+
      
      
      scala> t_varchar.filter(t_varchar("a")==="A").show
      +---+
      |  a|
      +---+
      |  A|
      +---+
      
      
      scala> t_char.filter(t_char("a")==="A").explain
      == Physical Plan ==
      Filter (a#0 = A)
      +- Scan JDBCRelation(jdbc:postgresql://localhost:5432/postgres,t_char,[Lorg.apache.spark.Partition;@2f65c341,{user=postgres, password=rootpass})[a#0] PushedFilters: [EqualTo(a,A)]
      

      Attachments

        Activity

          People

            maropu Takeshi Yamamuro
            DurgaPrasad16 Durga Prasad Gunturu
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: