Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
1.6.2, 1.6.3
Description
In Apache Spark 1.6.x, JDBCRDD returns incorrect results for a query with filters on CHAR column with PostgreSQL CHAR type. The root cause is PostgreSQL returns `space padded string` for a result. So, the post processing filter `Filter (a#0 = A)` is evaluated false. Spark 2.0.0 removes the post filter because it is already handled in the database by `PushedFilters: [EqualTo(a,A)]`.
scala> val t_char = sqlContext.read.option("user", "postgres").option("password", "rootpass").jdbc("jdbc:postgresql://localhost:5432/postgres", "t_char", new java.util.Properties()) t_char: org.apache.spark.sql.DataFrame = [a: string] scala> val t_varchar = sqlContext.read.option("user", "postgres").option("password", "rootpass").jdbc("jdbc:postgresql://localhost:5432/postgres", "t_varchar", new java.util.Properties()) t_varchar: org.apache.spark.sql.DataFrame = [a: string] scala> t_char.show +----------+ | a| +----------+ |A | |AA | |AAA | +----------+ scala> t_varchar.show +---+ | a| +---+ | A| | AA| |AAA| +---+ scala> t_char.filter(t_char("a")==="A").show +---+ | a| +---+ +---+ scala> t_char.filter(t_char("a")==="A ").show +----------+ | a| +----------+ |A | +----------+ scala> t_varchar.filter(t_varchar("a")==="A").show +---+ | a| +---+ | A| +---+ scala> t_char.filter(t_char("a")==="A").explain == Physical Plan == Filter (a#0 = A) +- Scan JDBCRelation(jdbc:postgresql://localhost:5432/postgres,t_char,[Lorg.apache.spark.Partition;@2f65c341,{user=postgres, password=rootpass})[a#0] PushedFilters: [EqualTo(a,A)]