Guide to the TechWeb Network

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Whitepapers
Subscribe
Home




Table of Contents



Hollywood Couples


Relationships are harder than you think

by Joe Celko

One of the basic tricks in SQL is representing a many-to-many relationship. You create a third table that references the two (or more) tables involved by their primary keys. This third table has quite a few popular names, such as "junction table" or "join table," but I know that it is a relationship. People tell you this, then leave you on your own to figure out the rest.

For example, here are two tables:

CREATE TABLE Boys
(boy_name VARCHAR(30) NOT NULL PRIMARY KEY
...);
CREATE TABLE Girls
(girl_name VARCHAR(30) NOT NULL PRIMARY KEY,
... );

Yes, I know using names for a key is a bad practice, but doing so will make my examples easier to read. You can make a lot of different many-to-many relationships between these two tables. (If you don't believe me, just watch the Jerry Springer show some time.) The simplest relationship table looks like this:

CREATE TABLE Pairs
(boy_name INTEGER NOT NULL
REFERENCES Boys (boy_name),
girl_name INTEGER NOT NULL,
REFERENCES Girls(girl_name));
The "Pairs" table let’s you insert rows like this:
('Joe Celko', 'Brooke Shields')
('Joe Celko', 'Kim Bassinger')
('Alec Baldwin', 'Kim Bassinger')
('Joe Celko', 'Brooke Shields')

Oops! I am shown twice with 'Brooke Shields' because the "Pairs" table does not have its own key. This mistake is easy to make, but how to fix it is not always obvious.

The "Orgy" table gets rid of the duplicated rows and makes this a proper table:
CREATE TABLE Orgy
(boy_name INTEGER NOT NULL
REFERENCES Boys (boy_name),
girl_name INTEGER NOT NULL,
REFERENCES Girls(girl_name),
PRIMARY KEY (boy_name, girl_name)); -- compound key

The primary key for the table comprises two or more columns and is called a compound key because of that fact.

('Joe Celko', 'Brooke Shields')
('Joe Celko', 'Kim Bassinger')
('Alec Baldwin', 'Kim Bassinger')

But the only restriction on the pairs is that they appear only once. Every boy can be paired with every girl, much to the dismay of the moral majority. Now, I want to make a rule that guys can have as many gals as they want, but the gals have to stick to one guy.

The way I do this is to use a NOT NULL UNIQUE constraint on the girl_name column, which makes it a key. It’s a simple key because it is only one column, but it is also a nested key because it appears as a subset of the compound PRIMARY KEY.

"Playboys" is a proper table, without duplicated pairs, but it also enforces the condition that I get to play around with one or more ladies:

CREATE TABLE Playboys
(boy_name INTEGER NOT NULL
REFERENCES Boys (boy_name),
girl_name INTEGER NOT NULL UNIQUE, -- nested key
REFERENCES Girls(girl_name),
PRIMARY KEY (boy_name, girl_name)); -- compound key

 

('Joe Celko', 'Brooke Shields')
('Joe Celko', 'Kim Bassinger')
The ladies might want the same freedom and keep company with a series of men:
CREATE TABLE Playgirls
(boy_name INTEGER NOT NULL UNIQUE -- nested key
REFERENCES Boys (boy_name),
girl_name INTEGER NOT NULL,
REFERENCES Girls(girl_name),
PRIMARY KEY (boy_name, girl_name)); -- compound key
The Playgirls table would permit these rows from our original set:
('Joe Celko', 'Kim Bassinger')
('Alec Baldwin', 'Kim Bassinger')

The moral majority is pretty upset about this Hollywood scandal and would love for us to stop running around and settle down in nice, stable couples:

CREATE TABLE Couples
(boy_name INTEGER NOT NULL UNIQUE -- nested key
REFERENCES Boys (boy_name),
girl_name INTEGER NOT NULL UNIQUE -- nested key,
REFERENCES Girls(girl_name),
PRIMARY KEY(boy_name, girl_name)); -- compound key
The "Couples" table lets you insert these rows from the original set:
('Joe Celko', 'Brooke Shields')
('Alec Baldwin', 'Kim Bassinger')

Think about this table for a minute. The PRIMARY KEY is now redundant. If each boy appears only once in the table and each girl appears only once in the table, then each (boy_name, girl_name) pair can appear only once.

From a theoretical viewpoint, I could drop the compound key and make either boy_name or girl_name the new primary key, or I could just leave them as candidate keys.

SQL products and theory do not always match. Many products make the assumption that the PRIMARY KEY is in some way special in the data model and will be the way to access the table most of the time.

In fairness, making a special provision for the primary key is not a bad assumption because the REFERENCES clause uses the PRIMARY KEY of the referenced table as a default. Many programmers are not aware that a FOREIGN KEY constraint can also reference any UNIQUE constraint in the same table or in another table. The following nightmare will give you an idea of the possibilities. The multiple column versions follow the same syntax.

CREATE TABLE Foo
(foo_key INTEGER NOT NULL PRIMARY KEY,
...
self_ref INTEGER NOT NULL
REFERENCES Foo(fookey),
outside_ref_1 INTEGER NOT NULL
REFERENCES Bar(bar_key),
outside_ref_2 INTEGER NOT NULL
REFERENCES Bar(other_key),
...);
CREATE TABLE Bar
(bar_key INTEGER NOT NULL PRIMARY KEY,
other_key INTEGER NOT NULL UNIQUE,
...);

But getting back to the nested keys, just how far can you go with them? My favorite example is a teacher's schedule kept in a table like this (I am leaving off reference clauses and CHECK() constraints):

CREATE TABLE Schedule
(teacher VARCHAR(15) NOT NULL,
class CHAR(15) NOT NULL,
room INTEGER NOT NULL,
period INTEGER NOT NULL,
PRIMARY KEY (teacher, class, room, period));

That choice of a primary key is the most obvious one -- use all the columns. Typical rows would look like this: ('Mr. Celko', 'Database 101', 222, 6) The rules you want to enforce are:

  • A teacher is in only one room each period
  • A teacher teaches only one class each period
  • A room has only one class each period
  • A room has only one teacher in it each period.

Stop reading and see what you come up with for an answer.

CREATE TABLE Schedule
(teacher VARCHAR(15) NOT NULL,
class CHAR(15) NOT NULL,
room INTEGER NOT NULL,
period INTEGER NOT NULL,
UNIQUE (teacher, room, period), -- rule #1
UNIQUE (teacher, class, period), -- rule #2
UNIQUE (class, room, period), -- rule #3
UNIQUE (teacher, room, period), -- rule #4
PRIMARY KEY (teacher, class, room, period));

You know that you have 24 ways to pick three objects from a set of four in an ordered sequence (permutation). If order does not matter, then you have a combination and only four subsets, all of which I have used in the UNIQUE constraints. Although column order is important in creating an index, you can ignore it for now and then worry about index tuning later.

I probably want to drop the PRIMARY KEY as redundant if I have all four of these constraints in place. But what happens if I drop the PRIMARY KEY and then one of the constraints?

CREATE TABLE Schedule-2
(teacher VARCHAR(15) NOT NULL,
class CHAR(15) NOT NULL,
room INTEGER NOT NULL,
period INTEGER NOT NULL,
UNIQUE (teacher, room, period), -- rule #1
UNIQUE (teacher, class, period), -- rule #2
UNIQUE (class, room, period)); -- rule #3

I can now insert these rows in the second version of the table:

('Mr. Celko', 'Database 101', 222, 6)
('Mr. Celko', 'Database 102', 223, 6)

This gives me a very tough sixth period class load because I have to be in two different rooms at the same time. Things can get even worse when another teacher is added to the schedule:

('Mr. Celko', 'Database 101', 222, 6)
('Mr. Celko', 'Database 102', 223, 6)
('Ms. Shields', 'Database 101', 223, 6)

Ms. Shields and I are both in room 223, trying to teach different classes at the same time. I think you get the idea that a relationship table is not simple. And I did not even get into referential actions and the fact that a relationship can have its own attributes apart from those of its participants.

Puzzle:

This month, I am giving you a puzzle for which I do not have a complete answer myself. It seems like a good problem after the recent U.S. Presidential election. If you are a science fiction fan, then you should know about the preferential voting system used for the Science Fiction Achievement Awards (The Hugo, named after Hugo Gernsbach).

Here is a simplified version of the ballot process. Voters have 5 or 6 choices, and they number their vote in their order of preference, so a ballot might look like this:

CREATE TABLE Ballots
(ballot_id INTEGER NOT NULL PRIMARY KEY,
bush INTEGER CHECK (bush BETWEEN 1 AND 4),
celko INTEGER CHECK (celko BETWEEN 1 AND 4),
gore INTEGER CHECK (gore BETWEEN 1 AND 4),
nader INTEGER CHECK (nader BETWEEN 1 AND 4))

The counting rule is to take all the votes and count the first place votes. If a candidate has a majority, then he or she is the winner. If not, eliminate the choice that received the lowest number of first place votes and redistribute those votes to the second choice on the ballot. Check again and continue eliminating the candidate with the lowest total votes and redistributing until somebody has a majority of all ballots still being counted. That person is your winner.

Typical ballot rows might look like this:

(212, 1, NULL, 2, 3)

The voter of ballot #212 wants Bush to win, but if Bush is eliminated in the first round, the voter’s second choice is Gore; if both Gore and Bush are eliminated, the voter would prefer that Nader got this vote. If Bush, Gore, and Nader are out of the race, the voter would prefer that no one received this vote, rather than give it to Celko. The NULL means, "take this ballot out of the totals, I am leaving the country." This forces you to recount, using a different number of total ballots cast to determine the majority.

An all NULL ballot like this would be removed before the first count:

(213, NULL, NULL, NULL, NULL)

The Hugo Awards have an explicit "none of the above" candidate that you can use as a protest vote. And it has won quite a few times! But, let's get back to this problem and break it into parts.

1) Write a CHECK() constraint that will make sure we have no gaps in the preference rankings. If you cannot write a CHECK() constraint to prevent gaps, can you write an UPDATE statement that will close the gaps? You can assume the numbers are in the right order, so that (341, 1, 2, NULL, 4) was supposed to be (341, 1, 2, NULL, 3).

2) How do we count the votes for the first round? Remember that there are two possible results: a clear winner or a clear loser who will be removed from the second round.

3) How can we determine the winner, if any, in one query? With the fewest number of steps in a process?

Answer:

The constraint breaks down into three parts. The first condition is that all the rankings are NULL or between 1 and 4; that was done with a column constraints. Next we need to be sure that all the columns are different; we can do that with the first four NOT IN () predicates shown below.

CONSTRAINT all_rankings_different
CHECK (bush NOT IN (celko, gore, nader)
AND celko NOT IN (bush, gore, nader)
AND gore NOT IN (bush, celko, nader)
AND nader NOT IN (bush, celko, gore)

The last condition is that of the numbers we have, they are sequential. One way of doing that would be to see that they fall into one of the allowed patterns, but that leads to over 100 different possible ballot schemes.

We also know that the highest ranking should be equal to the number of non-NULL rankings on each ballot. Determining those values will be hard in this denormalized table, however.

The best trick to use a little algebra and build an expression that takes on only certain values for the sets of integers (1,2,3,4), (1,2,3), (1,2) and (1). Here is one possible answer; there are others.

CONSTRAINT rankings_in_sequence
CHECK ((COALESCE (bush + 1, 1)
* COALESCE (celko + 1, 1)
* COALESCE (gore + 1, 1)
* COALESCE (nader + 1, 1))
IN (2, 6, 24, 120))



Joe Celko is an Atlanta-based independent consultant. He is the author of Instant SQL Programming (Wrox Press, 1997). You can contact him at www.celko.com or 71062.1056@compuserve.com.





IE Weekly Newsletter
Subscribe to the newsletter
    Email Address







InformationWeek Business Technology Network
InformationWeekInformationWeek 500InformationWeek 500 ConferenceInformationWeek AnalyticsInformationWeek CIO
InformationWeek EventsInformationWeek ReportsInformationWeek MagazinebMightyByte and SwitchDark Reading
Digital LibraryIntelligent EnterpriseInternet EvolutionNetwork ComputingNo Jitter
space
Techweb Events Network
InteropVoiceConWeb 2.0 ExpoWeb 2.0 SummitEnterprise 2.0 ConferenceMobile Business ExpoSoftware ConferenceCSI - Computer Security Institute
Black HatGTECEnergy CampMashup CampStartup Camp
space
Light Reading Communications Network
Light ReadingLight Reading EuropeUnstrungLight Reading's Cable Digital NewsConstantinopleInternet Evolution
Heavy ReadingLight Reading Live!Light Reading InsiderEthernet ExpoOptical ExpoTeleco TVTower Technology Summit
space
Financial Technology Network
Advanced TradingBank Systems & TechnologyInsurance & TechnologyWall Street & TechnologyAccelerating Wall StreetBank Systems & Technology Executive SummitBuyside Trading SummitInsurance & Technology Executive Summit
space
Microsoft Technology Network
MSDN MagazineTechNetThe Architecture Journal
space