Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26864

Query may return incorrect result when python udf is used as a join condition and the udf uses attributes from both legs of left semi join.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.0
    • 2.4.1, 3.0.0
    • SQL

    Description

      In SPARK-25314, we supported the scenario of having a python UDF that refers to attributes from both legs of a join condition by rewriting
      the plan to convert an inner join or left semi join to a filter over a cross join. In case of left semi join, this transformation may
      cause incorrect results when the right leg of join condition produces duplicate rows based on the join condition.

      Attachments

        Activity

          People

            dkbiswal Dilip Biswal
            dkbiswal Dilip Biswal
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: