Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-47034

join between cached temp tables result in missing entries

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.5.0
    • None
    • Examples
    • None

    Description

      we create several temp tables (views) by loading several delta tables and joining between them.
      those views are used for calculation of different metrics. each metric requires different views to be used. some of the more popular views are cached for better performance. 

      we have noticed that once we upgraded from spark 3.4.2  to spark 3.5.0 some of the join started to fail.

      we can reproduce a case were we have 2 data frames (views) (this is not the real names  / values we use. this is just for the example)

      1. users with the column user_id, campaign_id, user_name.
        we make sure it has a single entry
        '111111', '22222', 'Jhon Doe'
      2. actions with the column user_id, campaign_id, action_id, action count
        we make sure it has a single entry
        '111111', '22222', 'clicks', 5

       

      1. users view can be filtered for user_id = '111111' or/and campaign_id = '22222' and it will find the existing single row
      2. actions view can be filtered for user_id = '111111' or/and campaign_id = '22222' and it will find the existing single row
      3. users and actions can be inner join by user_id OR campaign_id and the join will be successful. 
      4. users and actions can not be inner join by user_id AND campaign_id. The join results in no entry.
      1. if we write both of the views to S3 and read them back to new data frames, suddenly the join is working.
      2. if we disable AQE the join is working
      3. running checkpoint on the views does not make join #4 work

      Attachments

        Activity

          People

            Unassigned Unassigned
            shurikm shurik mermelshtein
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: