Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
2.4.4, 2.4.5
-
None
-
1、Linux environment: CentOS Linux release 7.3.1611 or CentOS Linux release 7.5.1804
2、Spark Client environment: Spark-2.4.4-bin-hadoop2.6 or Spark-2.4.5-bin-hadoop2.6
3、Hadoop environment: hadoop-2.6.0-cdh5.8.4
4、Hive environment: hive-1.1.0-cdh5.8.4
5、Java environment: jdk1.8.0_181
6、Python environment: python 2.7.5
1、Linux environment: CentOS Linux release 7.3.1611 or CentOS Linux release 7.5.1804 2、Spark Client environment: Spark-2.4.4-bin-hadoop2.6 or Spark-2.4.5-bin-hadoop2.6 3、Hadoop environment: hadoop-2.6.0-cdh5.8.4 4、Hive environment: hive-1.1.0-cdh5.8.4 5、Java environment: jdk1.8.0_181 6、Python environment: python 2.7.5
Description
The problem recurs as follows:
- create table test_1(id int,name string) partitioned by(profile string)
- insert into test_1 values(1,null)
- select * from test_1 where profile is null
Go through the above steps,the result is nothing.But if add the condition profile='_HIVE_DEFAULT_PARTITION_',the result is OK.
The temporary solution:
select * from test_1 where profile is null or profile='_HIVE_DEFAULT_PARTITION_'
The result is OK
Special instructions:
1、The above phenomenon,Only the partition filed type is string can happen
2、The above operation in hive is no problem
Problem orientation:
As far as I'm consider the problem is in org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils and org.apache.spark.sql.catalyst.catalog.CatalogTablePartition.Especially the toRow function in CatalogTablePartition.