Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
2.1.1
-
None
Description
In rdd.py, implementation of repartitionandsortwithinpartitions is below.
def repartitionAndSortWithinPartitions(self, numPartitions=None, partitionFunc=portable_hash, ascending=True, keyfunc=lambda x: x):
And at document, there is following sample script.
>>> rdd = sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 3)]) >>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, 2)
The third argument (ascending) expected to be boolean, so following script is better, I think.
>>> rdd = sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 3)]) >>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, True)