Description
Temporary files like .DS_Store generated by Mac OS X finder may cause trouble for partition discovery. A directory whose layout looks like the following
> find parquet_partitioned parquet_partitioned parquet_partitioned/._common_metadata.crc parquet_partitioned/._metadata.crc parquet_partitioned/._SUCCESS.crc parquet_partitioned/_common_metadata parquet_partitioned/_metadata parquet_partitioned/_SUCCESS parquet_partitioned/year=2014/.DS_Store parquet_partitioned/year=2014/month=9 parquet_partitioned/year=2014/month=9/.DS_Store parquet_partitioned/year=2014/month=9/day=1/.DS_Store parquet_partitioned/year=2014/month=9/day=1/.part-r-00008.gz.parquet.crc parquet_partitioned/year=2014/month=9/day=1/part-r-00008.gz.parquet parquet_partitioned/year=2015 parquet_partitioned/year=2015/month=10 parquet_partitioned/year=2015/month=10/day=25 parquet_partitioned/year=2015/month=10/day=25/.part-r-00002.gz.parquet.crc parquet_partitioned/year=2015/month=10/day=25/.part-r-00004.gz.parquet.crc parquet_partitioned/year=2015/month=10/day=25/part-r-00002.gz.parquet parquet_partitioned/year=2015/month=10/day=25/part-r-00004.gz.parquet parquet_partitioned/year=2015/month=10/day=26 parquet_partitioned/year=2015/month=10/day=26/.part-r-00005.gz.parquet.crc parquet_partitioned/year=2015/month=10/day=26/part-r-00005.gz.parquet parquet_partitioned/year=2015/month=9 parquet_partitioned/year=2015/month=9/day=1 parquet_partitioned/year=2015/month=9/day=1/.part-r-00007.gz.parquet.crc parquet_partitioned/year=2015/month=9/day=1/part-r-00007.gz.parquet
causes exception like this:
scala> val df = sqlContext.read.parquet("parquet_partitioned") java.lang.AssertionError: assertion failed: Conflicting partition column names detected: ArrayBuffer(year, month) ArrayBuffer(year) ArrayBuffer(year, month, day) at scala.Predef$.assert(Predef.scala:179) at org.apache.spark.sql.sources.PartitioningUtils$.resolvePartitions(PartitioningUtils.scala:189) at org.apache.spark.sql.sources.PartitioningUtils$.parsePartitions(PartitioningUtils.scala:87) at org.apache.spark.sql.sources.HadoopFsRelation.org$apache$spark$sql$sources$HadoopFsRelation$$discoverPartitions(interfaces.scala:492) at org.apache.spark.sql.sources.HadoopFsRelation$$anonfun$partitionSpec$3.apply(interfaces.scala:449) at org.apache.spark.sql.sources.HadoopFsRelation$$anonfun$partitionSpec$3.apply(interfaces.scala:448)
This is because .DS_Store files are considered as a data file.
Attachments
Issue Links
- is duplicated by
-
SPARK-8036 Ignores files whose name starts with "." while enumerating files in HadoopFsRelation
- Resolved
- links to