Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
It is fairly easy for users to shoot themselves in the foot if they run cartesian joins. Often they might not even be aware of the join methods chosen. This happened to me a few times in the last few weeks.
It would be a good idea to disable cartesian joins by default, and require explicit enabling of it via "crossJoin" method or in SQL "cross join". This however might be too large of a scope for 2.0 given the timing. As a small and quick fix, we can just have a single config option (spark.sql.join.enableCartesian) that controls this behavior. In the future we can implement the fine-grained control.
Note that the error message should be friendly and say "Set spark.sql.join.enableCartesian to true to turn on cartesian joins."