XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.0
    • 2.4.0
    • SQL
    • None

    Description

      In Hive tests, in the Scala 2.12 build, still seeing a few failures that seem to show mismatching schema inference. Not clear whether it's the same as SPARK-25044. Examples:

      - SPARK-5775 read array from partitioned_parquet_with_key_and_complextypes *** FAILED ***
      Results do not match for query:
      Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
      Timezone Env: 
      
      == Parsed Logical Plan ==
      'Project ['arrayField, 'p]
      +- 'Filter ('p = 1)
      +- 'UnresolvedRelation `partitioned_parquet_with_key_and_complextypes`
      
      == Analyzed Logical Plan ==
      arrayField: array<int>, p: int
      Project [arrayField#82569, p#82570]
      +- Filter (p#82570 = 1)
      +- SubqueryAlias `default`.`partitioned_parquet_with_key_and_complextypes`
      +- Relation[intField#82566,stringField#82567,structField#82568,arrayField#82569,p#82570] parquet
      
      == Optimized Logical Plan ==
      Project [arrayField#82569, p#82570]
      +- Filter (isnotnull(p#82570) && (p#82570 = 1))
      +- Relation[intField#82566,stringField#82567,structField#82568,arrayField#82569,p#82570] parquet
      
      == Physical Plan ==
      *(1) Project [arrayField#82569, p#82570]
      +- *(1) FileScan parquet default.partitioned_parquet_with_key_and_complextypes[arrayField#82569,p#82570] Batched: false, Format: Parquet, Location: PrunedInMemoryFileIndex[file:/home/srowen/spark-2.12/sql/hive/target/tmp/spark-d8d87d74-33e7-4f22..., PartitionCount: 1, PartitionFilters: [isnotnull(p#82570), (p#82570 = 1)], PushedFilters: [], ReadSchema: struct<arrayField:array<int>>
      == Results ==
      
      == Results ==
      !== Correct Answer - 10 == == Spark Answer - 10 ==
      !struct<> struct<arrayField:array<int>,p:int>
      ![Range 1 to 1,1] [WrappedArray(1),1]
      ![Range 1 to 10,1] [WrappedArray(1, 2),1]
      ![Range 1 to 2,1] [WrappedArray(1, 2, 3),1]
      ![Range 1 to 3,1] [WrappedArray(1, 2, 3, 4),1]
      ![Range 1 to 4,1] [WrappedArray(1, 2, 3, 4, 5),1]
      ![Range 1 to 5,1] [WrappedArray(1, 2, 3, 4, 5, 6),1]
      ![Range 1 to 6,1] [WrappedArray(1, 2, 3, 4, 5, 6, 7),1]
      ![Range 1 to 7,1] [WrappedArray(1, 2, 3, 4, 5, 6, 7, 8),1]
      ![Range 1 to 8,1] [WrappedArray(1, 2, 3, 4, 5, 6, 7, 8, 9),1]
      ![Range 1 to 9,1] [WrappedArray(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),1] (QueryTest.scala:163)
      - SPARK-2693 udaf aggregates test *** FAILED ***
      Results do not match for query:
      Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
      Timezone Env: 
      
      == Parsed Logical Plan ==
      'GlobalLimit 1
      +- 'LocalLimit 1
      +- 'Project [unresolvedalias('percentile('key, 'array(1, 1)), None)]
      +- 'UnresolvedRelation `src`
      
      == Analyzed Logical Plan ==
      percentile(key, array(1, 1), 1): array<double>
      GlobalLimit 1
      +- LocalLimit 1
      +- Aggregate [percentile(key#205098, cast(array(1, 1) as array<double>), 1, 0, 0) AS percentile(key, array(1, 1), 1)#205101]
      +- SubqueryAlias `default`.`src`
      +- HiveTableRelation `default`.`src`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [key#205098, value#205099]
      
      == Optimized Logical Plan ==
      GlobalLimit 1
      +- LocalLimit 1
      +- Aggregate [percentile(key#205098, [1.0,1.0], 1, 0, 0) AS percentile(key, array(1, 1), 1)#205101]
      +- Project [key#205098]
      +- HiveTableRelation `default`.`src`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [key#205098, value#205099]
      
      == Physical Plan ==
      CollectLimit 1
      +- ObjectHashAggregate(keys=[], functions=[percentile(key#205098, [1.0,1.0], 1, 0, 0)], output=[percentile(key, array(1, 1), 1)#205101])
      +- Exchange SinglePartition
      +- ObjectHashAggregate(keys=[], functions=[partial_percentile(key#205098, [1.0,1.0], 1, 0, 0)], output=[buf#205104])
      +- Scan hive default.src [key#205098], HiveTableRelation `default`.`src`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [key#205098, value#205099]
      == Results ==
      
      == Results ==
      !== Correct Answer - 1 == == Spark Answer - 1 ==
      !struct<array(max(key), max(key)):array<int>> struct<percentile(key, array(1, 1), 1):array<double>>
      ![WrappedArray(498, 498)] [WrappedArray(498.0, 498.0)] (QueryTest.scala:163)

      Attachments

        Activity

          People

            sadhen Darcy Shen
            srowen Sean R. Owen
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: