Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-13133

sample() imposes partitioning by index unnecessarily

Details

    • Bug
    • Status: Resolved
    • P2
    • Resolution: Fixed
    • None
    • 2.35.0
    • dsl-dataframe
    • None

    Description

      I noticed that sample() requires data to repartitioned when it's used at the beginning of a series of dataframe commands. In practice we should be able to sample within arbitrary partitions before combining the partitions to produce the final result.

      It looks like the root cause is that our sample expressions require partitioning by index, rather than arbitrary partitioning.

      Attachments

        Issue Links

          Activity

            People

              bhulette Brian Hulette
              bhulette Brian Hulette
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 20m
                  2h 20m