Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-10462

org.apache.beam.sdk.transforms corrupt data when a value is Double.NaN

Details

    • Bug
    • Status: Resolved
    • P1
    • Resolution: Fixed
    • 0.2.0-incubating, 2.22.0
    • 2.24.0
    • sdk-java-core

    Description

      When there is a NaN value in the PCollection passed into Min or Max we get a random value back due to the way the CombineFn works. Per the SQL standard, we should always get NaN back. I'm going to add a special case get the right answer.

      Looks like we switched from using Double.compare to `>=` operator in https://github.com/apache/beam/commit/21a5b44c3b541ba6c89df5649afe00412df73d10, which introduced a data corruption bug.

      A test case demonstrating this issue:

        @Test
        public void testDouble() {
          Assert.assertFalse(Double.NaN >= 0.9);
          Assert.assertFalse(0.9 >= Double.NaN);
          Assert.assertFalse(Double.NaN >= Double.POSITIVE_INFINITY);
          Assert.assertFalse(Double.POSITIVE_INFINITY >= Double.NaN);
          Assert.assertTrue(Double.compare(Double.NaN, 0.9) >= 0);
          Assert.assertFalse(Double.compare(0.9, Double.NaN) >= 0);
          Assert.assertTrue(Double.compare(Double.NaN, Double.POSITIVE_INFINITY) >= 0);
          Assert.assertFalse(Double.compare(Double.POSITIVE_INFINITY, Double.NaN) >= 0);
        }
      

      Attachments

        Issue Links

          Activity

            People

              apilloud Andrew Pilloud
              apilloud Andrew Pilloud
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1.5h
                  1.5h