[SPARK-20682] Add new ORCFileFormat based on Apache ORC - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.4.1, 1.5.2, 1.6.3, 2.1.1, 2.2.0
Fix Version/s: 2.3.0
Component/s: SQL
Labels:
- releasenotes

Description

Since ~~SPARK-2883~~, Apache Spark supports Apache ORC inside `sql/hive` module with Hive dependency. This issue aims to add a new and faster ORC data source inside `sql/core` and to replace the old ORC data source eventually. In this issue, the latest Apache ORC 1.4.0 (released yesterday) is used.

There are four key benefits.

Speed: Use both Spark `ColumnarBatch` and ORC `RowBatch` together. This is faster than the current implementation in Spark.
Stability: Apache ORC 1.4.0 has many fixes and we can depend on ORC community more.
Usability: User can use `ORC` data sources without hive module, i.e, `-Phive`.
Maintainability: Reduce the Hive dependency and can remove old legacy code later.

Attachments

Issue Links

blocks

SPARK-20901 Feature parity for ORC with Parquet

Open

SPARK-20728 Make ORCFileFormat configurable between sql/hive and sql/core

Resolved

SPARK-21787 Support for pushing down filters for DateType in native OrcFileFormat

Resolved

is blocked by

SPARK-21422 Depend on Apache ORC 1.4.0

Resolved

is related to

SPARK-35274 old hive table's all columns are read when column pruning applies in spark3.0

Open

supercedes

SPARK-19109 ORC metadata section can sometimes exceed protobuf message size limit

Resolved

SPARK-21791 ORC should support column names with dot

Closed

links to

[Github] Pull Request #17924 (dongjoon-hyun)

[Github] Pull Request #17943 (dongjoon-hyun)

[Github] Pull Request #18953 (dongjoon-hyun)

[Github] Pull Request #19651 (dongjoon-hyun)

(2 supercedes, 5 links to)

Activity

People

Assignee:: Dongjoon Hyun

Reporter:: Dongjoon Hyun

Votes:: 0 Vote for this issue

Watchers:: 13 Start watching this issue

Dates

Created:: 09/May/17 19:02

Updated:: 02/May/21 01:37

Resolved:: 03/Dec/17 14:23