Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Not A Bug
-
2.1.0
-
None
-
centos 6.7 spark 2.1 jdk8
Description
use thriftserver create table with partitions.
session 1:
SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) partitioned by (pt string) stored as parquet;
--ok
!exit
session 2:
SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) partitioned by (pt string) stored as parquet;
--ok
!exit
session 3:
--connect the thriftserver
SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 partition(pt='1') select count(1) count from tmp_11;
--ok
!exit
session 4(do it again):
--connect the thriftserver
SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 partition(pt='1') select count(1) count from tmp_11;
--error
!exit
-------------------------------------------------------------------------------------
17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing query, currentState RUNNING,
java.lang.reflect.InvocationTargetException
......
......
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
512282-2/-ext-10000/part-00000 to destination hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-00000
at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
... 45 more
Caused by: java.io.IOException: Filesystem closed
....
-------------------------------------------------------------------------------------
the doc about the parquet table desc here http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
Hive metastore Parquet table conversion
When reading from and writing to Hive metastore Parquet tables, Spark SQL will try to use its own Parquet support instead of Hive SerDe for better performance. This behavior is controlled by the spark.sql.hive.convertMetastoreParquet configuration, and is turned on by default.
I am confused the problem appear in the table(partitions) but it is ok with table(with out partitions) . It means spark do not use its own parquet ?
Maybe someone give any suggest how could I avoid the issue?