Databases: Functional Dependencies and Candidate Keys
Theory
A Functional Dependency is a constraint between two sets of attributes in a relation
\[X \rightarrow Y
\]
such that the value of Y is always determined by the value of X. For instance,
author_id -> author_name // correct author_name -> author_id // wrong
The axioms for FDs are:
- Reflexivity: If Y is a subset of X, then X → Y
- Augmentation: If X → Y, then XZ → YZ
- Transitivity: If X → Y and Y → Z, then X → Z
which can be used to prove the following rules:
- Union: If X → Y and X → Z, then X → YZ
- Decomposition: If X → YZ, then X → Y and X → Z
- Pseudotransitivity: If X → Y and WY → Z, then WX → Z
Now some definitions:
- Super key: a set of attributes X in a relation R that determines all the attributes of the relation
- Minimal key: a super key such that if you remove even one attribute from it, it is not a key anymore
- The closure of a set of attributes is the set of all the attributes that can be inferred from a first attribute
- The closure of a set of FDs is the set of all the FDs that can be inferred from a first FD. Two sets of FDs are equivalent if their closures are equivalent
- A Minimal Cover of a set of FDs is the smallest set of FDs needed to determine all the FDs in the relation
- A Prime Attribute is an attribute which is part of a super key.
Computing the Minimal Cover
Given this set of FDs for a relation R(A, B, C, D, E, F, G, H)
AB -> DEH, BEF -> A, FGH -> C, D -> EG, EG -> BF, F -> BH
1) Rewrite all the FDs with only one attribute on the RHS
AB -> D, AB -> E, AB -> H, BEF -> A, FGH -> C, D -> E, D -> G, EG -> B, EG -> F, F -> B, F -> H
2) Consider if any aggregate on the LHS can be derived from something else
Since F -> B, BEF -> A becomes EF -> A Since F -> H, FGH -> C becomes FG -> C
The new set is therefore
AB -> D, AB -> E, AB -> H, EF -> A, FG -> C, D -> E, D -> G, EG -> B, EG -> F, F -> B, F -> H
3) Apply transitifity and simplify.
My favourite way to do so is: for every equal LHS, pick one, develop the full tree, see if in the tree you hit the RHS of another FD in the same group; if so, the latter FD is redundant. Repeat for every FD in the group.
AB -> D, D -> EG, EG -> BF, F -> BH thus AB -> E and AB -> H are both redundant; D -> E (end) D -> G (end) EG -> B (end) EG -> F, F -> BH thus EG -> B is redundant F -> B (end) F -> H (end)
The minimal cover is therefore
AB -> D, EF -> A, FG -> C, D -> E, D -> G, EG -> F, F -> B, F -> H
Minimal Candiate Keys
An important fact to remember is that
- If an attributes appears only on the RHS, it is non-prime
- If an attributes appears only on the LHS, it is prime and part of every key
Which is kinda obvious: if X can be only determined by something else, it cannot be part of a candidate key; and if X is not determined by anything else then it must be at least part of a key. So the first thing to do is figure out which are the non-prime attributes. Then we fiddle with the rest of the attributes until we find a candidate key for everyone of them. If one is left out it means we haven’t found all the candidate keys.
Let’s the fiddle start.
We spot immediately that C, H are the non-prime attributes.
Then, we compute the closure for any given single attribute on the LHS of the FDs and see if we got a key.
D+ = ABCDEFGH (candidate key) F+ = BFH (not)
Every attribute that leads to a candidate key is also a key
Since AB -> D, AB is a candidate key
Finally, every attributes that leads to a prime attribute might also be a key
// looking for A or B on the RHS to recreate AB F -> B, therefore AF is a ck EF -> A, F -> B therefore EF is a ck // looking for E or F on the RHS to recreate EF EG -> F, therefore EG is a ck // looking for G on the RHS to recrete EG // only D leads to G, D already a key. Search is over.