Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-40062

Spark - Creating Sub Folder while writing to Partitioned Hive Table

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 2.4.7
    • None
    • Spark Submit
    • None

    Description

      We had been writing to a Partitioned Hive Table and realized that data is being written has sub-folder.

      For ex- Refer Table definition as below - 

      Create table T1 ( name string, address string) Partitioned by (process_date string) stored as parquet location '/mytable/a/b/c/org=employee';

       

      While writing to table HDFS path being written looks something like this - 

      /mytable/a/b/c/org=employee/process_date=20220812/org=employee

       

      The unnecessary addition of  org=employee after process_date partition is because Hive Table has location consisting "=" operator, which Hive uses as syntax to determine partition column.

      Re-defining Table resolves above problem - 

      Create table T1 ( name string, address string) Partitioned by (process_date string) stored as parquet location '/mytable/a/b/c/employee';

      Attachments

        Activity

          People

            Unassigned Unassigned
            dinesh028 dinesh sachdev
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: