SQL: Join, Semi Join, Left Join and ON vs WHERE clauses

The relational algebra for a JOIN is the following:

\pi_{post.date} \sigma_{post.author = author.id \land author.name = <...>} (post \times author)

that is, on its bare minimum: make the product of two tables, make it consistent, and select the desired rows according to some extra condition.

Misc stuff: “or” statements can be translated using UNION: select the y such that p or q becomes select the y such that p UNION select the y such that q. Now back to the joins.

A Natural JOIN is simply an equi-join where predicates arise implicitly (if they can)

R \bowtie S = \sigma_{R.A1 = S.A1 \land … \land R.Am = S.Am} (R \times S)

A Semi Join is defined as

R \bowtie S = R \bowtie \pi_{Attr(R) \cap Attr(S)} (S)

and it means: get me all the R which appear in S.
In full details, it means: join R with the projection of the attributes that appear in both R and S (presumably a pair of key/fk). In plain SQL it is equivalent to

    (SELECT no FROM movement) AS derived 

or, in a more meaningful query,

SELECT * FROM account WHERE no IN (SELECT no FROM movement)

Note that a semi-join doesn’t really “join” anything; it just (possibly) “cuts” the first table.

A Left Outer Join returns every row in R even if they don’t join; entries that do not join are filled with NULLs.

Now you can also fully appreciate the difference between

 t1 JOIN t2 ON x AND y


t1 JOIN t2 ON x
WHERE y = <conditions>

In the first case the join doesn’t happen unless both conditions are met at the same time. In other words:
-> if I have a LEFT JOIN and y does not hold, I will have a row of the sort ID | NULL: an entry for the left row which sorry, it didn’t join;
-> whereas in the second case I will not have such entry, cause the rows are joined first and discarded afterwards by the where (unless the condition is IS NULL, but that would be pretty awkward). A visual reminder:

    ON                     WHERE
(1, a, a)                (1, a, a)
(2, a, NULL)