[SPARK-20960] make ColumnVector public - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.3.0
Fix Version/s: 2.3.0
Component/s: SQL
Labels:
- releasenotes

Target Version/s:

2.3.0

Description

ColumnVector is an internal interface in Spark SQL, which is only used for vectorized parquet reader to represent the in-memory columnar format.

In Spark 2.3 we want to make ColumnVector public, so that we can provide a more efficient way for data exchanges between Spark and external systems. For example, we can use ColumnVector to build the columnar read API in data source framework, we can use ColumnVector to build a more efficient UDF API, etc.

We also want to introduce a new ColumnVector implementation based on Apache Arrow(basically just a wrapper over Arrow), so that external systems(like Python Pandas DataFrame) can build ColumnVector very easily.

Attachments

Issue Links

links to

[Github] Pull Request #20116 (cloud-fan)

Sub-Tasks

1.	generalize the dictionary in ColumnVector	Resolved	Wenchen Fan
2.	simplify the array offset and length in ColumnVector	Resolved	Wenchen Fan
3.	Add test suites for complicated cases in ColumnarBatchSuite	Resolved	Jin Xing
4.	move ColumnVector.Array and ColumnarBatch.Row to individual files	Resolved	Wenchen Fan
5.	remove unused features in ColumnarBatch	Resolved	Wenchen Fan
6.	remove ColumnVector#loadBytes	Resolved	Wenchen Fan
7.	remove the get address methods from ColumnVector	Resolved	Wenchen Fan
8.	ColumnarArray should be an immutable view	Resolved	Wenchen Fan
9.	remove set methods in ColumnarRow	Resolved	Wenchen Fan
10.	ColumnarRow should be an immutable view	Resolved	Wenchen Fan
11.	move dictionary related APIs from ColumnVector to WritableColumnVector	Resolved	Wenchen Fan
12.	rename ColumnVector.anyNullsSet to hasNull	Resolved	Wenchen Fan
13.	simplify ColumnVector.getArray	Resolved	Wenchen Fan
14.	add calendar interval type support to ColumnVector	Resolved	Wenchen Fan
15.	add map type support to ColumnVector	Resolved	Wenchen Fan
16.	Implement the copy() method in ColumnarMap	Resolved	Max Gekk

Activity

People

Assignee:: Wenchen Fan

Reporter:: Wenchen Fan

Votes:: 2 Vote for this issue

Watchers:: 22 Start watching this issue

Dates

Created:: 02/Jun/17 04:37

Updated:: 31/Jan/18 07:19

Resolved:: 03/Jan/18 23:32