Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
2.3.1
-
None
-
None
Description
- Execute the following code:
{ val events = Seq(("a", 0)).toDF("id", "ts") val dim = Seq(("a", 0, 24), ("a", 24, 48)).toDF("id", "start", "end") val dimOriginal = dim.as("dim") val dimShifted = dim.as("dimShifted") val r = events .join(dimOriginal, "id") .where(dimOriginal("start") <= $"ts" && $"ts" < dimOriginal("end")) val r2 = r .join(dimShifted, "id") .where(dimShifted("start") <= $"ts" + 24 && $"ts" + 24 < dimShifted("end")) r2.show() r2.explain(true) }
- Expected effect:
- One row is shown
- Logical plan shows two independent joints with "dim" and "dimShifted"
- Observed effect:
- No rows are printed.
- Logical plan shows two filters are applied:
- 'Filter ((start#17 <= ('ts + 24)) && (('ts + 24) < end#18))'
- Filter ((start#17 <= ts#6) && (ts#6 < end#18))
- Both these filters refer to the same start#17 and start#18 columns, so they are applied to the same dataframe, not two different ones.
- It appears that dimShifted("start") is resolved to be identical to dimOriginal("start")
- I get the desired effect if I replace the second where with
.where($"dimShifted.start" <= $"ts" + 24 && $"ts" + 24 < $"dimShifted.end")