[SPARK-27473] Support filter push down for status fields in binary file data source - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Documentation
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.0
Fix Version/s: 3.0.0
Component/s: SQL
Labels:
None

Description

As a user, I can use `spark.read.format("binaryFile").load(path).filter($"status.lenght" < 100000000L)` to load files that are less than 1e8 bytes. Spark shouldn't even read files that are bigger than 1e8 bytes in this case.

Attachments

Issue Links

is blocked by

SPARK-25348 Data source for binary files

Resolved

is related to

SPARK-25558 Pushdown predicates for nested fields in DataSource Strategy

Resolved

links to

GitHub Pull Request #24387

Activity

People

Assignee:: Weichen Xu

Reporter:: Xiangrui Meng

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 16/Apr/19 05:18

Updated:: 28/Apr/19 15:47

Resolved:: 21/Apr/19 19:46