SQL: string matching, ranking and partitioning

String Matching

  • _  => matches a single char
  • % => matches any number of chars

Say I have this dummy relation

Tyron Lannister
Jamie Lannister 
Cercei Lannister
Arya Stark

Get all the members of the Lannister family:

WHERE name LIKE '% , Lannister'

Get all the first names of the database:

       WHEN POSITION(' ' IN name) = 0 THEN name
       SUBSTRING(name FROM 1 FOR POSITION(' ' IN name)-1)
AS name
FROM person

When fiddling with (formatted) dates it can be handy to extract the year we’re interested in as in

SELECT EXTRACT(YEAR FROM person.dob) AS born_year

Ranking without a rank function

A ranking problem can be seen as a matter of counting how many (items) are more (something) than the current one. For example, with reference to the mondial db:

SELECT c.name, c.population, COUNT(higher.*) AS rank
FROM country AS c
JOIN country AS higher
     ON c.population <= higher.population
GROUP BY c.name, c.population

In this case you join every country with every country with a higher population, you group them by countrie’s name, and count how many joins there are. It’s important to join on a value bigger/less than or equal, otherwise the first item would not join, and the list would start off at the second one.

If you feel lazy and the db supports it, using RANK is kind of nice though:

SELECT c.name, c.population,
     RANK() OVER (ORDER BY population DESC) AS rank
FROM country AS c


  • GROUP BY returns one row for each value
  • PARTITION BY returns a row for each entry, logically grouped by value.

Say I have the relation

located(city, sea)

it is straightforward to look up how many cities are located on the Mediterranean Sea:

SELECT sea, COUNT(city) AS howmany
FROM located
WHERE sea = 'Mediterranean Sea'
        sea        | howmany 
 Mediterranean Sea |      14

However I cannot add the name of the cities to the above relation, for the good reason that 14 cities do not fit in one row. Here is when partition by comes to the rescue

SELECT sea, city, COUNT(city) OVER (PARTITION BY sea)
       AS howmany
FROM located
WHERE sea = 'Mediterranean Sea'

(the outcome would proably be more useful without the sea constraint)

        sea        |    city    | howmany 
 Mediterranean Sea | Alexandria |      14
 Mediterranean Sea | Athens     |      14
 Mediterranean Sea | Bengasi    |      14
 Mediterranean Sea | Beirut     |      14

Using CASEs to create new columns

This especially frequent when creating “inventories” of the different values on the same value. Given the schema

organization(abbreviation, established)

we can find out how many orgs were founded in each decade this way:

SUM(CASE WHEN EXTRACT(YEAR FROM established) >= 1950
         AND EXTRACT(YEAR FROM established) <= 1959
         THEN 1 ELSE 0
    AS fifites,
SUM(CASE WHEN EXTRACT(YEAR FROM established) >= 1960
         AND EXTRACT(YEAR FROM established) <= 1969
         THEN 1 ELSE 0
    AS sixties
FROM organization