May 15, 2000, Volume 3 - Number 8
On the Trigger
All things considered, Joe is pretty happy with SQL Server 2000
Back in February, I spoke at the SQL2TheMax conference (www.sql2themax.com) in San Francisco. I gave one of my usual talks on SQL programming tricks, and I take some pride in the fact that I drew an audience at 6 p.m., just after the free beer and food had been served.
This was the week when Microsoft formally announced SQL Server 2000, and we saw a "dog and pony show" from Barry Goffe, a member of the team. Barry had to compete with his boss, Bill Gates, announcing Windows 2000 in another auditorium across the street at the same time. Still, Barry drew a full house of several hundred people. I am not sure if that is more impressive than beating out free beer and food.
The big brag for the announcement was the new TPC-C numbers for SQL Server 2000. Let me offer the usual cautions about TPC numbers and how the vendors pick their hardware and tune their software in order to demonstrate that they have the fastest, biggest, and cheapest database in the whole world.
SQL Server 2000 Enterprise Edition, running on Windows 2000 Advanced Server with COM+ transaction support as the transaction-processing monitor, achieved 227,079 order transactions per minute (tpmC) on the TPC-C benchmark. To put this into human (well, geek) terms, this volume is 575 times larger than the combined transaction volumes of Amazon.com and eBay. The previous record, held by Oracle, was 135,815 tpmC. The week before, Microsoft's best TPC-C score, using SQL Server 7.0, was less than 50,000 tpmC.
Microsoft achieved the new record using 12 Compaq ProLiant 8500 servers, each with eight Intel Pentium III Xeon 550 MHz processors, federated into a single database system. (Yes, that is a lot of highly parallel, highly tuned iron to throw at the problem, but the cost of the boxes is still cheaper than a mainframe.) Oracle's old record was set on a single RISC/Unix-based system.
The cost per tpmC was $19.12, for a total system cost of $4,341,750.48. The fastest non-Microsoft TPC-C score of 135,815 tpmC by Oracle (which, with Microsoft owning the top two spots, is the third fastest score overall) is $52.70 per tpmC. These numbers look real as far as I have been able to verify. Again, the usual warnings -- "actual results may vary" -- may apply.
My big worry is that Windows 2000 is going to be a nightmare. By now, you have probably seen the press releases about the Microsoft internal memo that said Windows 2000 has 63,000 "potentially known defects", although Microsoft claims it is really about 28,000 issues. Great comfort, isn't it?
Microsoft's other response has been to push the Uniform Computer Information Transactions Act (UCITA) in the state of Virginia. UCITA changes the Uniform Commercial Code by shifting power from customers to software vendors. Among other things, UCITA absolves software vendors of any liability for defects in their products, even if they knew the defects were there. Remember, Bill Gates comes from a long line of lawyers. His great innovations have all been legal ones, not technical ones.
However, as my regular readers know, my main crusade in life is to bring every database up to SQL-92 before I die. Therefore, I am pretty happy with SQL Server 2000 for adding declarative referential integrity (DRI) actions, UNION-ed VIEWs, and INSTEAD OF triggers. SQL Server 2000 is the only product I know that supports INSTEAD OF cursors, which allow updatable VIEW logic in the schema. It also adds materialized VIEWs, similar to what Oracle used to make that million-dollar bet a few years ago. The UNION-ed VIEWs can span multiple databases to give you a form of distributed database architecture.
Let me explain each feature and its importance. In the early days of the Sybase family, from which SQL Server evolved, you maintained referential integrity with triggers. Basically, a trigger is a module attached to a table. The trigger executes when an action (INSERT, UPDATE or DELETE) occurs on the table. The triggers can execute either before or after the triggering event. Vendors have different syntax and limitations on triggers in their products, but the trigger module can always contain procedural code.
For example, if I wanted to make sure that we only took orders for goods actually in inventory, I would write a TRIGGER something like this:
CREATE TRIGGER Must_be_in_stock
AFTER DELETE ON Inventory
REFERENCING OLD ROW AS Discontinued
BEGIN
DELETE FROM Orders
WHERE Orders.item_nbr = Discontinued.item_nbr;
END;
With SQL Server 2000, you would write this procedure with DRI as a clause in a CREATE TABLE statement:
CREATE TABLE Orders
( ...
item_nbr INTEGER NOT NULL
CONSTRAINT Must_be_in_stock
REFERENCES Inventory(item_nbr)
ON DELETE CASCADE,
... );
Triggers can obviously do many things that DRI cannot do, such as WRITE ALTER more than one table, write to log files, and so forth. However, you can express DRI constraints as predicates, as in this case:
CREATE ASSERTION Must_be_in_stock
CHECK (NOT EXISTS
(SELECT *
FROM Orders
WHERE NOT EXISTS
(SELECT *
FROM Inventory
WHERE Orders.item_nbr = Inventory.item_nbr)
));
You can take this predicate out of the CHECK() constraint and put it into any working query that uses these tables. This approach gives the optimizer more information. (In this case, if I see an item in an order, I do not need to access the Inventory table to know that it exists in stock.) A TRIGGER cant do that.
Now you ask, if I am so down on triggers, why am I so happy about the INSTEAD OF triggers in SQL Server 2000? Let me back up a bit and explain updatable VIEWs. The SQL-92 standard is actually very conservative about which VIEWs are updatable. According to the standard, the following rules apply:
- A
SELECT statement on one and only one table. This statement can go through several layers of VIEWs on top of VIEWs, however.
- The VIEW must include all the columns of a key (such as a UNIQUE or PRIMARY KEY constraint) in the base table
- All columns not shown in the
VIEW must have default values or be NULL-able. (If you think about it for a minute, this requirement lets you construct and insert a complete row into the base table.)
The whole idea behind these rules is that an updatable VIEW looks and behaves pretty much like a base table, only slightly restricted as to what you can see. One row in the VIEW maps to exactly one row in the base table from which it is drawn. This kind of updatable VIEW is the safest and easiest to define.
However, other versions of updatable VIEWs do exist, and some vendors support more than the basic version given in SQL-92.
Years ago, Nathan Goodman discussed types of updatable VIEWs in an article in the InfoDB newsletter (Summer 1990). His classification schema is as follows:
- Projection from a single table
- Selection/Projection from a single table
- UNION
-ed VIEWs
- Set Difference
VIEWs
- One-to-one joins
- One-to-one outer joins
- One-to-many outer joins
- One-to-one outer joins
- Many-to-many joins
- Translated and coded fields
By Goodmans definition, an updatable VIEW must have INSERT, UPDATE, and DELETE rules under the covers that map its rows back to a single row in the underlying base table(s).
Remember that a non-updatable VIEW is a virtual table that disappears after you use it. It may or may not be materialized, but you can never put an INSERT, UPDATE or DELETE trigger on it -- it has no persistent rows of its own to change.
The INSTEAD OF trigger can go on an updatable VIEW, however. This construct performs the actions given in the body of the trigger, instead of needing to use the INSERT, UPDATE or DELETE. Hurrah! We now have full VIEW updating, if we can write the code ourselves -- a topic I will tackle in another column, after I have had some time to play around with the new toy.
You can construct the UNION-ed VIEWs in SQL Server 2000 from tables in different databases in a manner transparent to users. Are you starting to see how you can build a distributed database system with UNION-ed VIEWs and INSTEAD OF triggers?
Another new feature of SQL Server 2000 is the ability to index on an expression, instead of just simple columns. Other products have had this ability for years, and if you know what you are doing, you can get huge performance boosts. One trick is to create your own hashing functions to distribute the data uniformly over physical storage. (Perhaps readers who have been using products with this feature would like to share their tricks?)
Puzzle Corner
This problem came into my email, and it might be useful to you. You have two sets of teams that you want to pair up and schedule for games over a set of available dates. In this example, we have five Eastern division teams and five Western division teams, so we need 25 available dates. The data looks like this:
CREATE TABLE EasternTeams
(team_name CHAR(15) NOT NULL PRIMARY KEY);
INSERT INTO EasternTeams VALUES ('Knicks');
INSERT INTO EasternTeams VALUES ('Nets');
INSERT INTO EasternTeams VALUES ('Heats');
INSERT INTO EasternTeams VALUES ('Magic');
INSERT INTO EasternTeams VALUES ('Celtics');
CREATE TABLE WesternTeams
(team_name CHAR(15) NOT NULL PRIMARY KEY);
INSERT INTO WesternTeams VALUES ('Lakers');
INSERT INTO WesternTeams VALUES ('Rockets');
INSERT INTO WesternTeams VALUES ('Blazers');
INSERT INTO WesternTeams VALUES ('Spurs');
INSERT INTO WesternTeams VALUES ('Mavericks');
CREATE TABLE AvailableDates
(match_date DATE NOT NULL PRIMARY KEY);
INSERT INTO AvailableDates VALUES ('2000-01-01');
INSERT INTO AvailableDates VALUES ('2000-01-02');
INSERT INTO AvailableDates VALUES ('2000-01-03');
INSERT INTO AvailableDates VALUES ('2000-01-04');
INSERT INTO AvailableDates VALUES ('2000-01-05');
INSERT INTO AvailableDates VALUES ('2000-01-06');
INSERT INTO AvailableDates VALUES ('2000-01-07');
INSERT INTO AvailableDates VALUES ('2000-01-08');
INSERT INTO AvailableDates VALUES ('2000-01-09');
INSERT INTO AvailableDates VALUES ('2000-01-10');
INSERT INTO AvailableDates VALUES ('2000-01-11');
INSERT INTO AvailableDates VALUES ('2000-01-12');
INSERT INTO AvailableDates VALUES ('2001-01-01');
INSERT INTO AvailableDates VALUES ('2001-01-02');
INSERT INTO AvailableDates VALUES ('2001-01-03');
INSERT INTO AvailableDates VALUES ('2001-01-04');
INSERT INTO AvailableDates VALUES ('2001-01-05');
INSERT INTO AvailableDates VALUES ('2001-01-06');
INSERT INTO AvailableDates VALUES ('2001-01-07');
INSERT INTO AvailableDates VALUES ('2001-01-08');
INSERT INTO AvailableDates VALUES ('2001-01-09');
INSERT INTO AvailableDates VALUES ('2001-01-10');
INSERT INTO AvailableDates VALUES ('2001-01-11');
INSERT INTO AvailableDates VALUES ('2001-01-12');
INSERT INTO AvailableDates VALUES ('2000-01-01');
Answer:
Let me just show you the query, and then I will explain what happened.
SELECT E1.team_name AS east, W1.team_name AS west, A1.match_date
FROM EasternTeams AS E1, WesternTeams AS W1,
AvailableDates AS A1, AvailableDates AS A2
WHERE A1.match_date >= A2.match_date
GROUP BY E1.team_name, W1.team_name, A1.match_date
HAVING COUNT(A2.match_date)
= (SELECT COUNT(*)
FROM EasternTeams AS E2, WesternTeams AS W2
WHERE E1.team_name || W1.team_name
<= E2.team_name || W2.team_name);
This is one of the few times that you actually want to keep the results of a CROSS JOIN (formerly known as a Cartesian product) between the two division tables. That is why there is no join condition between them.
What we are doing is counting the number of dates that come before each date in the outermost query. This approach enumerates the dates. Then we count the numbers of pairs of teams that are alphabetically before the current pair in the subquery in the HAVING clause. The HAVING clause matches the two enumerations:
Schedule
east west match_date
==========================================
Nets Spurs 2000-01-01
Nets Rockets 2000-01-02
Nets Mavericks 2000-01-03
Nets Lakers 2000-01-04
Nets Blazers 2000-01-05
Magic Spurs 2000-01-06
Magic Rockets 2000-01-07
Magic Mavericks 2000-01-08
Magic Lakers 2000-01-09
Magic Blazers 2000-01-10
Knicks Spurs 2000-01-11
Knicks Rockets 2000-01-12
Knicks Mavericks 2001-01-01
Knicks Lakers 2001-01-02
Knicks Blazers 2001-01-03
Heats Spurs 2001-01-04
Heats Rockets 2001-01-05
Heats Mavericks 2001-01-06
Heats Lakers 2001-01-07
Heats Blazers 2001-01-08
Celtics Spurs 2001-01-09
Celtics Rockets 2001-01-10
Celtics Mavericks 2001-01-11
Celtics Lakers 2001-01-12
Celtics Blazers 2000-01-01
Resources:
Codd, E. F.; TheRelational Model For Database Management: Ver.2; "RV-6 VIEW Updating"; (ISBN 0-201-14192-2).
Date, C. J. & Hugh Darwen; Relational Database Writings -1989-1991; "Role of Functional Dependencies in Query Decomposition"; (ISBN 0-201-54303-6).
Date, C. J.; Relational Database: Selected Writings; "Updating VIEWs"; (ISBN 0-201-14196-5).
Goodman, Nathan; InfoDB Vol. 5, No 2 (Summer 1990); "VIEW Update is Practical".
Umeshar, Dayal & P. A. Bernstein; ACM Transactions On Database Systems Vol. 7, No 3 (Dec 1982); "On the Correct Translation of Update Operations on Relational VIEWs".
UCITA www.acm.org/technews/articles/2000-2/0310f.html#item7
Joe Celko is an Atlanta-based independent consultant. He is the author of three
books on SQL -- SQL For Smarties (Morgan-Kaufmann, 1995),
SQL Puzzles and Answers (Morgan-Kaufmann, 1995), and
Instant SQL Programming (Wrox Press, 1997) -- and wrote the SQL for Smarties column for DBMS
magazine. You can contact him via email at www.celko.com or 71062.1056@compuserve.com.
Copyright © 2000 CMP Media Inc. ALL RIGHTS RESERVED
No Reproduction without permission