[SPARK-39379] FileAlreadyExistsException while insertInto() DF to hive table or directly write().parquet() - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.4.8
Fix Version/s: None
Component/s: SQL
Labels:
- FileOutputCommitter
- spark-sql
Environment:

java.version = 1.8
spark.version = 2.4.8
hadoop.version = 3.1.3

File Output Committer Algorithm version is 2

FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false

Language:
- Java

Description

I have such structure of table where I want to write DF:

CREATE EXTERNAL TABLE `usl_rdm_idl_spark_stg.okogu_h`(
  `ctl_loading` bigint,
  `ctl_validfrom` timestamp,
  `end_dt` date,
  `okogu_accept_dt` date)
PARTITIONED BY (
  `p1day` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  'hdfs://FESS-DEV/data/usl/rdm_idl_spark/stg/okogu_h'
TBLPROPERTIES (
  'bucketing_version'='2',
  'spark.sql.partitionProvider'='catalog',
  'transient_lastDdlTime'='1654082666')

Final DF has the same structure as mentioned table structure. The issue happens when attr "p1day" (table is partitioned by this attr) has null value only. So when I try to write it with any option

finalDF.write().mode(SaveMode.Append).partitionBy("p1day").parquet("somepath);

finalDF.write().mode(SaveMode.Append).insertInto(String.format("%s.%s", tgtSchema, tgtTable));

I see such error:

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.fs.FileAlreadyExistsException: /data/usl/rdm_idl_spark/stg/okogu_h/.hive-staging_hive_2022-06-01_16-59-37_442_6329951430234699240-1/-ext-10000/_temporary/0/_temporary/attempt_20220601165937_0116_m_000001_586/p1day=__HIVE_DEFAULT_PARTITION__/part-00001-05999af9-8a25-406e-a307-f97781547db2.c000 for client 10.106.105.11 already exists

For me it works correctly only when I replace null value in "p1day" column with any value( for ex. "1"):

finalDF.withColumn("p1day",lit("1"));

Is it a bug in spark-sql code? I use org.apache.spark:spark-sql_2.11:2.4.8

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Filimonov Valentin

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 04/Jun/22 16:16

Updated:: 04/Jun/22 16:20