XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.4.0
    • SQL
    • Spark 1.5 doc/QA sprint

    Description

      Here's a proposal for supporting window functions in the DataFrame DSL:

      1. Add an over function to Column:

      class Column {
        ...
        def over(window: Window): Column
        ...
      }
      

      2. Window:

      object Window {
        def partitionBy(...): Window
        def orderBy(...): Window
      
        object Frame {
          def unbounded: Frame
          def preceding(n: Long): Frame
          def following(n: Long): Frame
        }
      
        class Frame
      }
      
      class Window {
        def orderBy(...): Window
        def rowsBetween(Frame, Frame): Window
        def rangeBetween(Frame, Frame): Window  // maybe add this later
      }
      

      Here's an example to use it:

      df.select(
        avg(“age”).over(Window.partitionBy(“..”, “..”).orderBy(“..”, “..”)
          .rowsBetween(Frame.unbounded, Frame.currentRow))
      )
      
      df.select(
        avg(“age”).over(Window.partitionBy(“..”, “..”).orderBy(“..”, “..”)
          .rowsBetween(Frame.preceding(50), Frame.following(10)))
      )
      

      Attachments

        Issue Links

          Activity

            People

              chenghao Cheng Hao
              rxin Reynold Xin
              Votes:
              2 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: