Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
2.4.3
-
None
-
Hadoop 2.7
Scala 2.11
Tested:
- Spark 2.3.3 - Works
- Spark 2.4.x - All have the same issue
Description
When writing a parquet using partitionBy the group file permissions are being changed as shown below. This causes members of the group to get "org.apache.hadoop.security.AccessControlException: Open failed for file.... error: Permission denied (13)"
This worked in 2.3. I found a workaround which was to set "spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2" which gives the correct behaviour
Code I used to reproduce issue:
Seq(("H", 1), ("I", 2))
.toDF("Letter", "Number")
.write
.partitionBy("Letter")
.parquet(...)
sparktesting$ tree -dp
├── [drwxrws---] letter_testing2.3-defaults
│ ├── [drwxrws---] Letter=H
│ └── [drwxrws---] Letter=I
├── [drwxrws---] letter_testing2.4-defaults
│ ├── [drwxrwS---] Letter=H
│ └── [drwxrwS---] Letter=I
└── [drwxrws---] letter_testing2.4-file-writer2
├── [drwxrws---] Letter=H
└── [drwxrws---] Letter=I