SQL: Join, Semi Join, Left Join and ON vs WHERE clauses
The relational algebra for a JOIN is the following:
\[\pi_{post.date} \sigma_{post.author = author.id \land author.name = <...>} (post \times author)
\]
that is, on its bare minimum: make the product of two tables, make it consistent, and select the desired rows according to some extra condition.
Misc stuff: “or” statements can be translated using UNION: select the y such that p or q becomes select the y such that p UNION select the y such that q. Now back to the joins.
A Natural JOIN is simply an equi-join where predicates arise implicitly (if they can)
\[R \bowtie S = \sigma_{R.A1 = S.A1 \land … \land R.Am = S.Am} (R \times S)
\]
A Semi Join is defined as
\[R \bowtie S = R \bowtie \pi_{Attr(R) \cap Attr(S)} (S)
\]
and it means: get me all the R which appear in S.
In full details, it means: join R with the projection of the attributes that appear in both R and S (presumably a pair of key/fk). In plain SQL it is equivalent to
SELECT DISTINCT * FROM account NATURAL JOIN (SELECT no FROM movement) AS derived ORDER BY no
or, in a more meaningful query,
SELECT * FROM account WHERE no IN (SELECT no FROM movement)
Note that a semi-join doesn’t really “join” anything; it just (possibly) “cuts” the first table.
A Left Outer Join returns every row in R even if they don’t join; entries that do not join are filled with NULLs.
Now you can also fully appreciate the difference between
t1 JOIN t2 ON x AND y
vs
t1 JOIN t2 ON x WHERE y = <conditions>
In the first case the join doesn’t happen unless both conditions are met at the same time. In other words:
-> if I have a LEFT JOIN and y does not hold, I will have a row of the sort ID | NULL: an entry for the left row which sorry, it didn’t join;
-> whereas in the second case I will not have such entry, cause the rows are joined first and discarded afterwards by the where (unless the condition is IS NULL, but that would be pretty awkward). A visual reminder:
ON WHERE (1, a, a) (1, a, a) (2, a, NULL)