CMP -- United Business Media

Intelligent Enterprise

Better Insight for Business Decisions

UBM
Intelligent Enterprise - Better Insight for Business Decisions
Part of the TechWeb Network
Intelligent Enterprise
search Intelligent Enterprise





October 8, 2002

Knowing the Difference

How well do you know your SQL and set theory?

by Joe Celko

I have not done a theoretical database column in a long time, so let me share this problem from Steve Kass. It's a good measure of how well you know your SQL and set theory. The setup is simple: Create a 1x1 table, and then execute these two queries:

Q1: SELECT CASE WHEN 1 = 1

THEN 99

ELSE COUNT(*) END AS x

FROM floob WHERE 1 = 0;

Q2: SELECT CASE WHEN 1 = 0

THEN 99

ELSE COUNT(*) END AS x

FROM floob WHERE 1 = 0;

(Key this into your query tool and see what happens before you read any further.)

Technically, the MAX(), MIN(), AVG(), SUM(), COUNT(), and COUNT(*) functions are known as set functions in the SQL-92 Standard, but everyone calls them aggregate functions.

Why COUNT(*) Counts

Only COUNT(*) uses the asterisk, and it's not like the other functions. It counts the number of rows in a table — what's called the cardinality of a set. The asterisk isn't being used consistently in SQL; the SELECT * option is a shorthand for all the columns in a table, not rows. This difference is important; a table is not made of rows and columns; it's made of rows, and rows are made of columns.

By definition, COUNT(*) returns a zero when it has an empty set. But the other aggregate functions return an empty set when given an empty set. The other set functions actually perform operations on the values inside the table. If there are no values, they can't return anything — from nothing comes nothing.

One line of reasoning for this result is that you have a COUNT(*) aggregate function hidden in the CASE expression, so you have to return a result — even if that branch of the CASE expression is dead code.

A second line of reasoning is that you can parse the SELECT clause orthogonally and reduce it to a single expression in both sample queries because of the logically constant values in the WHEN and WHERE clauses. So you effectively have

Q1: SELECT 99 AS x

FROM floob WHERE 1 = 0;

which returns an empty set, and

Q2: SELECT COUNT(*) AS x

FROM floob WHERE 1 = 0;

which returns zero.

If SQL were a procedural language, a good compiler would prune out the dead code at compile time, and these results would always occur.

But I asked people to test these queries on various SQL products, and the consensus was to return 99 and 0 for Q1 and Q2, respectively.

These results mean that I can get anything I want out of an empty set. All I have to do is put in a dummy CASE expression with a COUNT(*) in a dead code branch. This ability bothers me — from nothing comes nothing always struck me as a good idea.

Wanted: Precise Syntax

The line of reasoning I favor until SQL establishes firm rules is that you have an ambiguous syntax and you just blow up. Perhaps what SQL should have is two very distinct key words, such as TALLY(<expression>) vs. COUNT(*), or even something more verbose but accurate, such as CARDINALITY(<table expression>). This last suggestion would show that the function operates on the entire table and not from values inside it.



Rate This Article

Comments:

Optional e-mail address:

This situation isn't the result of NULLs, as in some of the other unexpected logical results in SQL. With the CASE expression, these two queries jump between the "set as a whole" level (cardinality) and the "values within the set" level (AVG(), MIN(), MAX(), and so forth), and SQL has different rules for them.

Gregor Cantor invented set theory to explain infinite sets by handling them as completed units of work, instead of a sequence of elements that approach infinity.

At last! We're at the foundations of set theory, and I can finally use my Masters degree in Math. Suddenly, there's a huge difference between set functions and aggregate functions!


Joe Celko [71062.1056@compuserve.com] is an independent consultant in Austin, Texas and the author of Joe Celko's SQL for Smarties: Advanced SQL Programming (Morgan Kaufmann Publishers, 1999).









IE Weekly Newsletter
Subscribe to the newsletter
    Email Address