SQL: string matching, ranking and partitioning
String Matching
- _ => matches a single char
- % => matches any number of chars
Say I have this dummy relation
Tyron Lannister Jamie Lannister Cercei Lannister Arya Stark
Get all the members of the Lannister family:
WHERE name LIKE '% , Lannister'
Get all the first names of the database:
SELECT CASE WHEN POSITION(' ' IN name) = 0 THEN name ELSE SUBSTRING(name FROM 1 FOR POSITION(' ' IN name)-1) END AS name FROM person
When fiddling with (formatted) dates it can be handy to extract the year we’re interested in as in
SELECT EXTRACT(YEAR FROM person.dob) AS born_year
Ranking without a rank function
A ranking problem can be seen as a matter of counting how many (items) are more (something) than the current one. For example, with reference to the mondial db:
SELECT c.name, c.population, COUNT(higher.*) AS rank FROM country AS c JOIN country AS higher ON c.population <= higher.population GROUP BY c.name, c.population ORDER BY rank LIMIT 10
In this case you join every country with every country with a higher population, you group them by countrie’s name, and count how many joins there are. It’s important to join on a value bigger/less than or equal, otherwise the first item would not join, and the list would start off at the second one.
If you feel lazy and the db supports it, using RANK is kind of nice though:
SELECT c.name, c.population, RANK() OVER (ORDER BY population DESC) AS rank FROM country AS c LIMIT 10
PARTITION BY
- GROUP BY returns one row for each value
- PARTITION BY returns a row for each entry, logically grouped by value.
Say I have the relation
located(city, sea)
it is straightforward to look up how many cities are located on the Mediterranean Sea:
SELECT sea, COUNT(city) AS howmany FROM located WHERE sea = 'Mediterranean Sea' GROUP BY sea;
sea | howmany -------------------+--------- Mediterranean Sea | 14
However I cannot add the name of the cities to the above relation, for the good reason that 14 cities do not fit in one row. Here is when partition by comes to the rescue
SELECT sea, city, COUNT(city) OVER (PARTITION BY sea) AS howmany FROM located WHERE sea = 'Mediterranean Sea'
(the outcome would proably be more useful without the sea constraint)
sea | city | howmany -------------------+------------+--------- Mediterranean Sea | Alexandria | 14 Mediterranean Sea | Athens | 14 Mediterranean Sea | Bengasi | 14 Mediterranean Sea | Beirut | 14 (...)
Using CASEs to create new columns
This especially frequent when creating “inventories” of the different values on the same value. Given the schema
organization(abbreviation, established)
we can find out how many orgs were founded in each decade this way:
SELECT SUM(CASE WHEN EXTRACT(YEAR FROM established) >= 1950 AND EXTRACT(YEAR FROM established) <= 1959 THEN 1 ELSE 0 END) AS fifites, SUM(CASE WHEN EXTRACT(YEAR FROM established) >= 1960 AND EXTRACT(YEAR FROM established) <= 1969 THEN 1 ELSE 0 END) AS sixties FROM organization