[SPARK-40062] Spark - Creating Sub Folder while writing to Partitioned Hive Table - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: 2.4.7
Fix Version/s: None
Component/s: Spark Submit
Labels:
None

Description

We had been writing to a Partitioned Hive Table and realized that data is being written has sub-folder.

For ex- Refer Table definition as below -

Create table T1 ( name string, address string) Partitioned by (process_date string) stored as parquet location '/mytable/a/b/c/org=employee';

While writing to table HDFS path being written looks something like this -

/mytable/a/b/c/org=employee/process_date=20220812/org=employee

The unnecessary addition of org=employee after process_date partition is because Hive Table has location consisting "=" operator, which Hive uses as syntax to determine partition column.

Re-defining Table resolves above problem -

Create table T1 ( name string, address string) Partitioned by (process_date string) stored as parquet location '/mytable/a/b/c/employee';

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: dinesh sachdev

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 12/Aug/22 20:38

Updated:: 13/Aug/22 06:28