SQL: string matching, ranking and partitioning
String Matching
- _ => matches a single char
- % => matches any number of chars
Say I have this dummy relation
Tyron Lannister Jamie Lannister Cercei Lannister Arya Stark
Get all the members of the Lannister family:
WHERE name LIKE '% , Lannister'
Get all the first names of the database:
SELECT CASE
WHEN POSITION(' ' IN name) = 0 THEN name
ELSE
SUBSTRING(name FROM 1 FOR POSITION(' ' IN name)-1)
END
AS name
FROM person
When fiddling with (formatted) dates it can be handy to extract the year we’re interested in as in
SELECT EXTRACT(YEAR FROM person.dob) AS born_year
Ranking without a rank function
A ranking problem can be seen as a matter of counting how many (items) are more (something) than the current one. For example, with reference to the mondial db:
SELECT c.name, c.population, COUNT(higher.*) AS rank
FROM country AS c
JOIN country AS higher
ON c.population <= higher.population
GROUP BY c.name, c.population
ORDER BY rank
LIMIT 10
In this case you join every country with every country with a higher population, you group them by countrie’s name, and count how many joins there are. It’s important to join on a value bigger/less than or equal, otherwise the first item would not join, and the list would start off at the second one.
If you feel lazy and the db supports it, using RANK is kind of nice though:
SELECT c.name, c.population,
RANK() OVER (ORDER BY population DESC) AS rank
FROM country AS c
LIMIT 10
PARTITION BY
- GROUP BY returns one row for each value
- PARTITION BY returns a row for each entry, logically grouped by value.
Say I have the relation
located(city, sea)
it is straightforward to look up how many cities are located on the Mediterranean Sea:
SELECT sea, COUNT(city) AS howmany FROM located WHERE sea = 'Mediterranean Sea' GROUP BY sea;
sea | howmany -------------------+--------- Mediterranean Sea | 14
However I cannot add the name of the cities to the above relation, for the good reason that 14 cities do not fit in one row. Here is when partition by comes to the rescue
SELECT sea, city, COUNT(city) OVER (PARTITION BY sea)
AS howmany
FROM located
WHERE sea = 'Mediterranean Sea'
(the outcome would proably be more useful without the sea constraint)
sea | city | howmany -------------------+------------+--------- Mediterranean Sea | Alexandria | 14 Mediterranean Sea | Athens | 14 Mediterranean Sea | Bengasi | 14 Mediterranean Sea | Beirut | 14 (...)
Using CASEs to create new columns
This especially frequent when creating “inventories” of the different values on the same value. Given the schema
organization(abbreviation, established)
we can find out how many orgs were founded in each decade this way:
SELECT
SUM(CASE WHEN EXTRACT(YEAR FROM established) >= 1950
AND EXTRACT(YEAR FROM established) <= 1959
THEN 1 ELSE 0
END)
AS fifites,
SUM(CASE WHEN EXTRACT(YEAR FROM established) >= 1960
AND EXTRACT(YEAR FROM established) <= 1969
THEN 1 ELSE 0
END)
AS sixties
FROM organization