SQL: string matching, ranking and partitioning

String Matching

  • _  => matches a single char
  • % => matches any number of chars

Say I have this dummy relation

Tyron Lannister
Jamie Lannister 
Cercei Lannister
Arya Stark

Get all the members of the Lannister family:

WHERE name LIKE '% , Lannister'

Get all the first names of the database:

SELECT CASE
       WHEN POSITION(' ' IN name) = 0 THEN name
       ELSE
       SUBSTRING(name FROM 1 FOR POSITION(' ' IN name)-1)
       END
AS name
FROM person

When fiddling with (formatted) dates it can be handy to extract the year we’re interested in as in

SELECT EXTRACT(YEAR FROM person.dob) AS born_year

Ranking without a rank function

A ranking problem can be seen as a matter of counting how many (items) are more (something) than the current one. For example, with reference to the mondial db:

SELECT c.name, c.population, COUNT(higher.*) AS rank
FROM country AS c
JOIN country AS higher
     ON c.population <= higher.population
GROUP BY c.name, c.population
ORDER BY rank
LIMIT 10

In this case you join every country with every country with a higher population, you group them by countrie’s name, and count how many joins there are. It’s important to join on a value bigger/less than or equal, otherwise the first item would not join, and the list would start off at the second one.

If you feel lazy and the db supports it, using RANK is kind of nice though:

SELECT c.name, c.population,
     RANK() OVER (ORDER BY population DESC) AS rank
FROM country AS c
LIMIT 10

PARTITION BY

  • GROUP BY returns one row for each value
  • PARTITION BY returns a row for each entry, logically grouped by value.

Say I have the relation

located(city, sea)

it is straightforward to look up how many cities are located on the Mediterranean Sea:

SELECT sea, COUNT(city) AS howmany
FROM located
WHERE sea = 'Mediterranean Sea'
GROUP BY sea;
        sea        | howmany 
-------------------+---------
 Mediterranean Sea |      14

However I cannot add the name of the cities to the above relation, for the good reason that 14 cities do not fit in one row. Here is when partition by comes to the rescue

SELECT sea, city, COUNT(city) OVER (PARTITION BY sea)
       AS howmany
FROM located
WHERE sea = 'Mediterranean Sea'

(the outcome would proably be more useful without the sea constraint)

        sea        |    city    | howmany 
-------------------+------------+---------
 Mediterranean Sea | Alexandria |      14
 Mediterranean Sea | Athens     |      14
 Mediterranean Sea | Bengasi    |      14
 Mediterranean Sea | Beirut     |      14
 (...)

Using CASEs to create new columns

This especially frequent when creating “inventories” of the different values on the same value. Given the schema

organization(abbreviation, established)

we can find out how many orgs were founded in each decade this way:

SELECT
SUM(CASE WHEN EXTRACT(YEAR FROM established) >= 1950
         AND EXTRACT(YEAR FROM established) <= 1959
         THEN 1 ELSE 0
    END)
    AS fifites,
SUM(CASE WHEN EXTRACT(YEAR FROM established) >= 1960
         AND EXTRACT(YEAR FROM established) <= 1969
         THEN 1 ELSE 0
    END)
    AS sixties
FROM organization