Details
Description
Here's a proposal for supporting window functions in the DataFrame DSL:
1. Add an over function to Column:
class Column {
...
def over(window: Window): Column
...
}
2. Window:
object Window { def partitionBy(...): Window def orderBy(...): Window object Frame { def unbounded: Frame def preceding(n: Long): Frame def following(n: Long): Frame } class Frame } class Window { def orderBy(...): Window def rowsBetween(Frame, Frame): Window def rangeBetween(Frame, Frame): Window // maybe add this later }
Here's an example to use it:
df.select( avg(“age”).over(Window.partitionBy(“..”, “..”).orderBy(“..”, “..”) .rowsBetween(Frame.unbounded, Frame.currentRow)) ) df.select( avg(“age”).over(Window.partitionBy(“..”, “..”).orderBy(“..”, “..”) .rowsBetween(Frame.preceding(50), Frame.following(10))) )
Attachments
Issue Links
- blocks
-
SPARK-7822 Window function support in Python DataFrame DSL
- Resolved
- is blocked by
-
SPARK-1442 Add Window function support
- Resolved
- is related to
-
SPARK-7247 Add Pandas' shift method to the Dataframe API
- Closed
- links to
(2 links to)