# SQL: Join, Semi Join, Left Join and ON vs WHERE clauses

The relational algebra for a **JOIN** is the following:

\pi_{post.date} \sigma_{post.author = author.id \land author.name = <...>} (post \times author)

\]

that is, on its bare minimum: make the product of two tables, make it consistent, and select the desired rows according to some extra condition.

Misc stuff: “or” statements can be translated using UNION: *select the y such that p or q* becomes *select the y such that p UNION select the y such that q*. Now back to the joins.

A **Natural JOIN** is simply an equi-join where predicates arise implicitly (if they can)

R \bowtie S = \sigma_{R.A1 = S.A1 \land … \land R.Am = S.Am} (R \times S)

\]

A ** Semi Join** is defined as

R \bowtie S = R \bowtie \pi_{Attr(R) \cap Attr(S)} (S)

\]

and it means: * get me all the R which appear in S*.

In full details, it means: join R with the projection of the attributes that appear in both R and S (presumably a pair of key/fk). In plain SQL it is equivalent to

SELECT DISTINCT * FROM account NATURAL JOIN (SELECT no FROM movement) AS derived ORDER BY no

or, in a more meaningful query,

SELECT * FROM account WHERE no IN (SELECT no FROM movement)

__Note that a semi-join doesn’t really “join” anything;__ it just (possibly) “cuts” the first table.

A **Left Outer Join** returns *every row in R even if they don’t join*; entries that do not join are filled with NULLs.

Now you can also fully appreciate the difference between

t1 JOIN t2 ON x AND y

vs

t1 JOIN t2 ON x WHERE y = <conditions>

In the first case the join doesn’t happen unless both conditions are met at the same time. In other words:

-> if I have a LEFT JOIN and y does not hold, I will have a row of the sort *ID | NULL*: an entry for the left row which sorry, it didn’t join;

-> whereas in the second case I will **not** have such entry, cause the rows are joined first and discarded afterwards by the *where* (unless the condition is IS NULL, but that would be pretty awkward). A visual reminder:

ON WHERE (1, a, a) (1, a, a) (2, a, NULL)