Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-47217

De-duplication of Relations in Joins, can result in plan resolution failure

    XMLWordPrintableJSON

Details

    Description

      In case of some flavours of  nested joins involving repetition of relation, the projected columns when passed to the DataFrame.select API , as form of df.column , can result in plan resolution failure due to attribute resolution not happening.

      A scenario in which this happens is

                     
                                Project ( dataframe A.column("col-a") )
                                               |
                                            Join2
                                |                            | 
                             Join1                      DataFrame A      
                                |
               DataFrame A            DataFrame B
      
      

      In such cases, If it so happens that Join2 - right leg DataFrame A gets re-aliased due to De-Duplication of relations, and if the project uses Column definition obtained from DataFrame A, its exprId will not match the re-aliased Join2 - right Leg- DataFrame A , causing resolution failure.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ashahid7 Asif
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: