Databases: Functional Dependencies and Candidate Keys

Theory

A Functional Dependency is a constraint between two sets of attributes in a relation

\[
X \rightarrow Y
\]

such that the value of Y is always determined by the value of X. For instance,

author_id -> author_name    // correct
author_name -> author_id    // wrong

The axioms for FDs are:

  1. Reflexivity: If Y is a subset of X, then X → Y
  2. Augmentation: If X → Y, then XZ → YZ
  3. Transitivity: If X → Y and Y → Z, then X → Z

which can be used to prove the following rules:

  1. Union: If X → Y and X → Z, then X → YZ
  2. Decomposition: If X → YZ, then X → Y and X → Z
  3. Pseudotransitivity: If X → Y and WY → Z, then WX → Z

Now some definitions:

  • Super key: a set of attributes X in a relation R that determines all the attributes of the relation
  • Minimal key: a super key such that if you remove even one attribute from it, it is not a key anymore
  • The closure of a set of attributes is the set of all the attributes that can be inferred from a first attribute
  • The closure of a set of FDs is the set of all the FDs that can be inferred from a first FD. Two sets of FDs are equivalent if their closures are equivalent
  • A Minimal Cover of a set of FDs is the smallest set of FDs needed to determine all the FDs in the relation
  • A Prime Attribute is an attribute which is part of a super key.

Computing the Minimal Cover

Given this set of FDs for a relation R(A, B, C, D, E, F, G, H)

AB -> DEH,
BEF -> A, 
FGH -> C,
D -> EG,
EG -> BF,
F -> BH

1) Rewrite all the FDs with only one attribute on the RHS

AB -> D, AB -> E, AB -> H,
BEF -> A, 
FGH -> C,
D -> E, D -> G,
EG -> B, EG -> F,
F -> B, F -> H

2) Consider if any aggregate on the LHS can be derived from something else

Since F -> B, BEF -> A becomes EF -> A
Since F -> H, FGH -> C becomes FG -> C

The new set is therefore

AB -> D, AB -> E, AB -> H,
EF -> A, 
FG -> C,
D -> E, D -> G,
EG -> B, EG -> F,
F -> B, F -> H

3) Apply transitifity and simplify.

My favourite way to do so is: for every equal LHS, pick one, develop the full tree, see if in the tree you hit the RHS of another FD in the same group; if so, the latter FD is redundant. Repeat for every FD in the group.

AB -> D, D -> EG, EG -> BF, F -> BH
thus AB -> E and AB -> H are both redundant;

D -> E (end)
D -> G (end)

EG -> B (end)
EG -> F, F -> BH
thus EG -> B is redundant

F -> B (end)
F -> H (end)

The minimal cover is therefore

AB -> D,
EF -> A, 
FG -> C,
D -> E, D -> G,
EG -> F,
F -> B, F -> H

Minimal Candiate Keys

An important fact to remember is that

  1. If an attributes appears only on the RHS, it is non-prime
  2. If an attributes appears only on the LHS, it is prime and part of every key

Which is kinda obvious: if X can be only determined by something else, it cannot be part of a candidate key; and if X is not determined by anything else then it must be at least part of a key. So the first thing to do is figure out which are the non-prime attributes. Then we fiddle with the rest of the attributes until we find a candidate key for everyone of them. If one is left out it means we haven’t found all the candidate keys.

Let’s the fiddle start.
We spot immediately that C, H are the non-prime attributes.
Then, we compute the closure for any given single attribute on the LHS of the FDs and see if we got a key.

D+ = ABCDEFGH (candidate key)
F+ = BFH (not)

Every attribute that leads to a candidate key is also a key

Since AB -> D, AB is a candidate key

Finally, every attributes that leads to a prime attribute might also be a key

// looking for A or B on the RHS to recreate AB
F -> B, therefore AF is a ck
EF -> A, F -> B therefore EF is a ck 

// looking for E or F on the RHS to recreate EF
EG -> F, therefore EG is a ck

// looking for G on the RHS to recrete EG
// only D leads to G, D already a key. Search is over.