[SPARK-28680] redundant code, or possible bug in Partitioner that could mess up check against spark.default.parallelism - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Invalid
Affects Version/s: 2.4.3
Fix Version/s: None
Component/s: Spark Core
Labels:
None
Environment:

master = local[*]

Description

This is a suggestion to reduce (what I think is) some code redundancy.
Looking at this line of code in org.apache.spark.Partitioner:
https://github.com/apache/spark/blob/924d794a6f5abb972fa07bf63adbb4ad544ef246/core/src/main/scala/org/apache/spark/Partitioner.scala#L83

the first part of the && in the if condition is true if hasMaxPartitioner is non empty, which means that after a scan of rdds we found one with a partitioner whose # of partitions was > 0, and hasMaxPartitioner is the Option wrapped RDD which has the
partitioner with greatest number of partitions.

We then pass the rdd inside hasMaxPartitioner to isEligiblePartitioner where we
set maxPartitions = the length of the longest partitioner in rdds and then check
to see if
log10(maxPartitions) - log10(hasMaxPartitioner.getNumPartitions) < 1

It seems to me that the values inside the two calls to log10 will be equal, so subtracting these will result in 0, which is always < 1.

So... isn't this whole block of code redundant ?

It might even be a bug because the right hand side of the && condition after is always
true, so we never check that
defaultNumPartitions <= hasMaxPartitioner.get.getNumPartitions

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Chris Bedford

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 09/Aug/19 20:22

Updated:: 12/Aug/19 07:17

Resolved:: 12/Aug/19 07:17