[SPARK-21725] spark thriftserver insert overwrite table partition select - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Not A Bug
Affects Version/s: 2.1.0
Fix Version/s: None
Component/s: Spark Core
Labels:
- spark-sql
Environment:

centos 6.7 spark 2.1 jdk8

Description

use thriftserver create table with partitions.

session 1:
SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) partitioned by (pt string) stored as parquet;
--ok
!exit

session 2:
SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) partitioned by (pt string) stored as parquet;
--ok
!exit

session 3:
--connect the thriftserver
SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 partition(pt='1') select count(1) count from tmp_11;
--ok
!exit

session 4(do it again):
--connect the thriftserver
SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 partition(pt='1') select count(1) count from tmp_11;
--error
!exit

-------------------------------------------------------------------------------------
17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing query, currentState RUNNING,
java.lang.reflect.InvocationTargetException
......
......
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
512282-2/-ext-10000/part-00000 to destination hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-00000
at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
... 45 more
Caused by: java.io.IOException: Filesystem closed
....
-------------------------------------------------------------------------------------

the doc about the parquet table desc here http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files

Hive metastore Parquet table conversion
When reading from and writing to Hive metastore Parquet tables, Spark SQL will try to use its own Parquet support instead of Hive SerDe for better performance. This behavior is controlled by the spark.sql.hive.convertMetastoreParquet configuration, and is turned on by default.

I am confused the problem appear in the table(partitions) but it is ok with table(with out partitions) . It means spark do not use its own parquet ?
Maybe someone give any suggest how could I avoid the issue?

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: hereTac

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 14/Aug/17 10:42

Updated:: 19/Oct/18 11:00

Resolved:: 02/Nov/17 13:24