CMP -- United Business Media

Intelligent Enterprise

Better Insight for Business Decisions

UBM
Intelligent Enterprise - Better Insight for Business Decisions
Part of the TechWeb Network
Intelligent Enterprise
search Intelligent Enterprise




An Old Celko Puzzle | Intelligent Enterprise Blog
An Old Celko Puzzle

Posted by Joe Celko
Friday, March 25, 2005
8:13 PM

If you go over to http://www.dbdebunk.com/page/page/666711.htm, you will find a letter to the editor from someone named PV about an old column of mine. Here is the jist of it:

Back in June of 1996, Jack Wells submitted this SQL problem to my SQL FOR SMARTIES column.

His situation is pretty typical for SQL programmers who work with 3GL people.
The programmers are writing a report on the employees, and they want information about each employee's current and previous salary status. The report will show the date of their promotion and the salary amount.

Jack spoke with Fabian Pascal, the week he was working on this problem, and Mr. Pascal replied that this query could not be done. He said, 'In a truly relational language it could be done, but since SQL is not relational it isn't possible, not even with SQL-92.' Sounds like a challenge to me!

Oh, I forgot to mention an addition constraint; the answer had to be in 1996 Oracle, which had no proper outer joins, no general scalar subexpressions, and so on), so your query had to run under the old SQL-86 or SQL-89 rules back then. Assume that you have this test data:

CREATE TABLE Salaries
(emp_id_id CHAR(10) NOT NULL,
sal_date DATE NOT NULL,
sal_amt DECIMAL (8, 2) NOT NULL,
PRIMARY KEY (emp_id, sal_date));

INSERT INTO Salaries VALUES ('Tom', '1996-06-20', 500.00);
INSERT INTO Salaries VALUES ('Tom', '1996-08-20', 700.00);
INSERT INTO Salaries VALUES ('Tom', '1996-10-20', 800.00);
INSERT INTO Salaries VALUES ('Tom', '1996-12-20', 900.00);
INSERT INTO Salaries VALUES ('Dick', '1996-06-20', 500.00);
INSERT INTO Salaries VALUES ('Harry', '1996-07-20', 500.00);
INSERT INTO Salaries VALUES ('Harry', '1996-09-20', 700.00);

Tom has had two promotions, Dick is a new hire, and Harry has had one promotion. My old solution was:

SELECT S0.emp_id, S0.sal_date, S0.sal_amt,
S1.sal_date, S1.sal_amt
FROM Salaries AS S0, Salaries AS S1
WHERE S0.emp_id = S1.emp_id
AND S0.sal_date
= (SELECT MAX(S2.sal_date)
FROM Salaries AS S2
WHERE S0.emp_id = S2.emp_id)
AND S1.sal_date
= (SELECT MAX(S3.sal_date)
FROM Salaries AS S3
WHERE S0.emp_id = S3.emp_id
AND S3.sal_date < S0.sal_date)
UNION ALL
SELECT S4.emp_id, MAX(S4.sal_date), MAX(S4.sal_amt),
NULL, NULL
FROM Salaries AS S4
GROUP BY S4.emp_id
HAVING COUNT(*) = 1;

+-------+------------+--------+------------+--------+
| emp_id| sal_date |sal_amt | sal_date | sal_amt|
+-------+------------+--------+------------+--------+
| Tom | 1996-12-20 | 900 | 1996-10-20 | 800 |
| Harry | 1996-09-20 | 700 | 1996-07-20 | 500 |
| Dick | 1996-06-20 | 500 | (NULL) |(NULL) |
+-------+------------+--------+------------+--------+

The writer found the solution unintuitive and difficult to understand.
Actually, DB2 programmers will recognize this is as the standard pattern used for OUTER JOINs before that syntax was added to SQL-92.

Fabian Pascal comments that "This was a very long time ago and I do not recall the exact circumstances, and whether my reply was properly represented or understood (particularly coming from Celko). My guess is that it had something to do with inability to resolve such problems without a precise definition of the tables to which the query is to be applied, the business rules in effect for the tables, and the query at issue. I will let Chris Date to respond to PV’s solution."

Chris Date responed: "Regarding whether Celko’s solution is correct or not, I neither know, nor care."

Then he posted an incorrect answer in his Tutorial D language that he cleaned in a second posting to this:

WITH (Salaries RENAME (sal_date AS prev_date,
sal_amt AS prev_amt)) AS t1,
((Salaries JOIN t1) WHERE sal_date > prev_date) AS t2,
((Salaries{emp_id} MINUS t2{emp_id}) JOIN Salaries) AS t3,
(EXTEND t3 ADD (DATE(1900-01-01) AS prev_date,
0.00 AS prev_amt)) AS t4:
t2 UNION t4

The bad news is that this does not help the poor Oracle programmer. And it is still wrong because Chris Date's version of the Relational Model disagrees with Dr. Codd's rules about having NULLs. By having '1900-01-01' as a "magical token" (remember in Date's world, we have no missing values) we just made poor Dick into the oldest emplyee in the company, and the least appreciated. All this answer did was change the programming language, change the specs and change the calendar.

Today the solution is easy enough with an outer join. Here is one way:

SELECT S0.emp_id, S0.sal_date AS curr_sal_date, S0.sal_amt AS curr_sal_amt,
S1.sal_date AS prev_sal_date, S1.sal_amt AS prev_sal_amt
FROM Salaries AS S0
LEFT OUTER JOIN
Salaries AS S1
ON S0.emp_id = S1.emp_id
AND S0.sal_date
= (SELECT MAX(S2.sal_date)
FROM Salaries AS S2
WHERE S0.emp_id = S2.emp_id)
AND S1.sal_date
= (SELECT MAX(S3.sal_date)
FROM Salaries AS S3
WHERE S0.emp_id = S3.emp_id
AND S3.sal_date < S0.sal_date);

This can be done using newer operators or with scalar subquery expression in the SELECT list.

Anyone else got a Standard SQL solution? I am pretty sure this can be made simpler and if you have the SQL-99 OVER operators it should be a snap. The first entry gets a copy of my SQL PROGRAMING STYLE when it comes out or a copy of my TREES & HIERARCHIES IN SQL.



E-MAIL | SLASHDOT | DIGG




This is a public forum. CMP Technology and its affiliates are not responsible for and do not control what is posted herein. CMP Technology makes no warranties or guarantees concerning any advice dispensed by its staff members or readers.

Community standards in this comment area do not permit hate language, excessive profanity, or other patently offensive language. Please be aware that all information posted to this comment area becomes the property of CMP Media LLC and may be edited and republished in print or electronic format as outlined in CMP Technology's Terms of Service.

Important Note: This comment area is NOT intended for commercial messages or solicitations of business.


 




    Subscribe to RSS