[SPARK-8624] DataFrameReader doesn't respect MERGE_SCHEMA setting for Parquet - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Won't Fix
Affects Version/s: 1.4.0
Fix Version/s: None
Component/s: SQL
Labels:
- parquet

Description

In 1.4.0, parquet is read by DataFrameReader.parquet, when creating ParquetRelation2 object, "parameters" is hard-coded as "Map.empty[String, String]", so ParquetRelation2.shouldMergeSchemas is always true (the default value).
In previous version, spark.sql.hive.convertMetastoreParquet.mergeSchema config is respected.
This bug downgrade performance a lot for a folder with hundreds of parquet files and we don't want a schema merge.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Rex Xiong

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 25/Jun/15 09:40

Updated:: 08/Oct/16 04:04

Resolved:: 08/Oct/16 04:04