Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-34344

Have functionality to trace back Spark SQL queries from the application ID that got submitted on YARN

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.6.3, 2.3.0, 2.4.5
    • None
    • Spark Shell, SQL
    • None

    Description

      We need to have Application Id from resource manager mapped to the specific spark sql query that got executed with respect to that application Id so that back tracing is possible.

      For example : if i run a query using spark shell : 

      spark.sql("select dt.d_year,item.i_brand_id brand_id,item.i_brand brand,sum(ss_ext_sales_price) sum_agg from date_dim dt,store_sales,item where dt.d_date_sk = store_sales.ss_sold_date_sk and store_sales.ss_item_sk = item.i_item_sk and item.i_manufact_id = 436 and dt.d_moy=12 group by dt.d_year,item.i_brand,item.i_brand_id order by dt.d_year,sum_agg desc,brand_id limit 100").show();

      When  i see the event logs or the history server i don't see the query anywhere, but the query plan is there, so it becomes difficult to trace back what query actually got submitted. (if have to map it to the specific application Id on yarn)

      Attachments

        Activity

          People

            Unassigned Unassigned
            arpan3189 Arpan Bhandari
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: