Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-9819

reduceBy(KeyAnd)Window should specify which is the accumulator argument in invReduceFunc

    XMLWordPrintableJSON

Details

    Description

      reduceByWindow has an optional invReduceFunc argument which allows the reduction to be performed incrementally.

      The incremental reduction performed in ReducedWindowedDStream only depends on the reduction being associative (as shown by the reduce applied to oldValues), but does not require those functions to be commutative.

      In particular, if the inverse reduction is the non-commutative, non-associative substraction (e.g. what you're computing is a running sum), it's necessary to know that the intermediate result (to be substracted from) is the first argument of invReduceFunc and that the second argument is the old value to substract.

      It's only in the commutative case that we don't care which is which.

      The Scaladoc for the various overloads of reduceByWindow should let the user know which is the accumulator, and which is the old value. A concise, unambiguous way to state this is to write an inversion law in the Scaladoc:

      invReduceFunc(reduceFunc(x, y), y) = x

      Attachments

        Activity

          People

            huitseeker François Garillot
            huitseeker François Garillot
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: