[SPARK-25480] Dynamic partitioning + saveAsTable with multiple partition columns create empty directory - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Incomplete
Affects Version/s: 2.3.0
Fix Version/s: None
Component/s: SQL
Labels:
- bulk-closed

Description

We use .saveAsTable and dynamic partitioning as our only way to write data to S3 from Spark.

When only 1 partition column is defined for a table, .saveAsTable behaves as expected:

with Overwrite mode it will create a table if it doesn't exist and write the data
with Append mode it will append to a given partition
with Overwrite mode if the table exists it will overwrite the partition

If 2 partition columns are used however, the directory is created on S3 with the SUCCESS file, but no data is actually written

our solution is to check if the table doesn't exist, and in that case, set the partitioning mode back to static before running saveAsTable:

spark.conf.set("spark.sql.sources.partitionOverwriteMode", "dynamic")
df.write.mode("overwrite").partitionBy("year", "month").option("path", "s3://hbc-data-warehouse/integration/users_test").saveAsTable("users_test")

Attachments

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

dynamic_partitioning.json
20/Sep/18 10:00
13 kB
Daniel Mateus Pires

Activity

People

Assignee:: Unassigned

Reporter:: Daniel Mateus Pires

Votes:: 1 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 20/Sep/18 09:59

Updated:: 08/Oct/19 05:44

Resolved:: 08/Oct/19 05:44