Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21725

spark thriftserver insert overwrite table partition select

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Bug
    • 2.1.0
    • None
    • Spark Core
    • centos 6.7 spark 2.1 jdk8

    Description

      use thriftserver create table with partitions.

      session 1:
      SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) partitioned by (pt string) stored as parquet;
      --ok
      !exit

      session 2:
      SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) partitioned by (pt string) stored as parquet;
      --ok
      !exit

      session 3:
      --connect the thriftserver
      SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 partition(pt='1') select count(1) count from tmp_11;
      --ok
      !exit

      session 4(do it again):
      --connect the thriftserver
      SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 partition(pt='1') select count(1) count from tmp_11;
      --error
      !exit

      -------------------------------------------------------------------------------------
      17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing query, currentState RUNNING,
      java.lang.reflect.InvocationTargetException
      ......
      ......
      Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
      512282-2/-ext-10000/part-00000 to destination hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-00000
      at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
      at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
      at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
      at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
      ... 45 more
      Caused by: java.io.IOException: Filesystem closed
      ....
      -------------------------------------------------------------------------------------

      the doc about the parquet table desc here http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files

      Hive metastore Parquet table conversion
      When reading from and writing to Hive metastore Parquet tables, Spark SQL will try to use its own Parquet support instead of Hive SerDe for better performance. This behavior is controlled by the spark.sql.hive.convertMetastoreParquet configuration, and is turned on by default.

      I am confused the problem appear in the table(partitions) but it is ok with table(with out partitions) . It means spark do not use its own parquet ?
      Maybe someone give any suggest how could I avoid the issue?

      Attachments

        Activity

          People

            Unassigned Unassigned
            zhangxin0112zx hereTac
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: