Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26777

SQL worked in 2.3.2 and fails in 2.4.0

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.4.0
    • None
    • Spark Core
    • None

    Description

      Following SQL worked in Spark 2.3.2 and now fails on 2.4.0 (AWS EMR Spark)

      ------------ PySpark call below:

      spark.sql("select partition_year_utc,partition_month_utc,partition_day_utc \
      from datalake_reporting.copy_of_leads_notification \
      where partition_year_utc = (select max(partition_year_utc) from datalake_reporting.copy_of_leads_notification) \
      and partition_month_utc = \
      (select max(partition_month_utc) from datalake_reporting.copy_of_leads_notification as m \
      where \
      m.partition_year_utc = (select max(partition_year_utc) from datalake_reporting.copy_of_leads_notification)) \
      and partition_day_utc = (select max(d.partition_day_utc) from datalake_reporting.copy_of_leads_notification as d \
      where d.partition_month_utc = \
      (select max(m1.partition_month_utc) from datalake_reporting.copy_of_leads_notification as m1 \
      where m1.partition_year_utc = \
      (select max(y.partition_year_utc) from datalake_reporting.copy_of_leads_notification as y) \
      ) \
      ) \
      order by 1 desc, 2 desc, 3 desc limit 1 ").show(1,False)

      Error: (no need for data, this is syntax).
      py4j.protocol.Py4JJavaError: An error occurred while calling o1326.showString.
      : java.lang.UnsupportedOperationException: Cannot evaluate expression: scalar-subquery#4495 []

       

      Note: all 3 columns in query are Partitioned columns - see bottom of the schema)

       

      Hive EMR AWS Schema is:

       

      CREATE EXTERNAL TABLE `copy_of_leads_notification`(

      `message.environment.siteorigin` string, `dcpheader.dcploaddateutc` string, `message.id` int, `source.properties._country` string, `message.created` string, `dcpheader.generatedmessageid` string, `message.tags` bigint, `source.properties._enqueuedtimeutc` string, `source.properties._leadtype` string, `message.itemid` string, `message.prospect.postcode` string, `message.prospect.email` string, `message.referenceid` string, `message.item.year` string, `message.identifier` string, `dcpheader.dcploadmonthutc` string, `message.processed` string, `source.properties._tenant` string, `message.item.price` string, `message.subscription.confirmresponse` boolean, `message.itemtype` string, `message.prospect.lastname` string, `message.subscription.insurancequote` boolean, `source.exchangename` string, `message.prospect.identificationnumbers` bigint, `message.environment.ipaddress` string, `dcpheader.dcploaddayutc` string, `source.properties._itemtype` string, `source.properties._requesttype` string, `message.item.make` string, `message.prospect.firstname` string, `message.subscription.survey` boolean, `message.prospect.homephone` string, `message.extendedproperties` bigint, `message.subscription.financequote` boolean, `message.uniqueidentifier` string, `source.properties._id` string, `dcpheader.sourcemessageguid` string, `message.requesttype` string, `source.routingkey` string, `message.service` string, `message.item.model` string, `message.environment.pagesource` string, `source.source` string, `message.sellerid` string, `partition_date_utc` string, `message.selleridentifier` string, `message.subscription.newsletter` boolean, `dcpheader.dcploadyearutc` string, `message.leadtype` string, `message.history` bigint, `message.callconnect.calloutcome` string, `message.callconnect.datecreatedutc` string, `message.callconnect.callrecordingurl` string, `message.callconnect.transferoutcome` string, `message.callconnect.hiderecording` boolean, `message.callconnect.callstartutc` string, `message.callconnect.code` string, `message.callconnect.callduration` string, `message.fraudnetinfo` string, `message.callconnect.answernumber` string, `message.environment.sourcedevice` string, `message.comments` string, `message.fraudinfo.servervariables` bigint, `message.callconnect.servicenumber` string, `message.callconnect.callid` string, `message.callconnect.voicemailurl` string, `message.item.stocknumber` string, `message.callconnect.answerduration` string, `message.callconnect.callendutc` string, `message.item.series` string, `message.item.detailsurl` string, `message.item.pricetype` string, `message.item.description` string, `message.item.colour` string, `message.item.badge` string, `message.item.odometer` string, `message.environment.requestheader` string, `message.item.registrationnumber` string, `message.item.bodytype` string, `message.item.fueltype` string, `message.item.redbookcode` string, `message.item.spotid` string, `message.item.id` string, `message.item.transmission` string, `message.item.vin` string, `message.item.enginedescription` string, `message.prospect.mobilephone` string, `message.prospect.membertrackingid` string, `message.environment.username` string, `message.prospect.workphone` string, `message.environment.servername` string, `message.environment.sessionid` string, `message.tradein.type` string, `message.tradein.model` string, `message.tradein.year` string, `message.tradein.make` string, `message.tradein.kms` string, `message.fraudinfo.servertimestamp` string, `message.prospect.suburb` string, `message.callconnect.username` string, `message.callconnect.password` string, `message.status` string, `message.tradein.colour` string, `message.prospect.address` string, `message.prospect.state` string, `message.tradein.detailsurl` string, `message.prospect.faxnumber` string, `message.prospect.companyname` string, `message.prospect.title` string, `message.callconnect.calloutcometext` string, `message.prospect.preferredcontacttime` string, `message.fraudinfo.devicedata` string, `message.prospect.preferredcontactmethod` string, `message.assignment.assigned` string, `message.assignment.email` string, `message.assignment.name` string, `message.country` string, `message.financepackage.id` string, `message.financepackage.version` string)

      PARTITIONED BY ( `partition_year_utc` string, `partition_month_utc` string, `partition_day_utc` string, `job_run_guid` string)

      ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS

      INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'

      OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'

      LOCATION 's3://datalake/yurib_test/leads_notification'

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            yuri.budilov Yuri Budilov
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: