Tài liệu SQL Puzzles & Answers- P3 pptx

40 263 0
Tài liệu SQL Puzzles & Answers- P3 pptx

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

62 PUZZLE 15 FIND THE LAST TWO SALARIES employee. If the programmers were not so lazy, you could pass this table to them and let them format it for the report. Answer #2 The real problem is harder. One way to do this within the limits of SQL- 89 is to break the problem into two cases: 1. Employees with only one salary action 2. Employees with two or more salary actions We know that every employee has to fall into one and only one of those cases. One solution is to UNION both of the sets together: SELECT S0.emp_name, S0.sal_date, S0.sal_amt, S1.sal_date, S1.sal_amt FROM Salaries AS S0, Salaries AS S1 WHERE S0.emp_name = S1.emp_name AND S0.sal_date = (SELECT MAX(S2.sal_date) FROM Salaries AS S2 WHERE S0.emp_name = S2.emp_name) AND S1.sal_date = (SELECT MAX(S3.sal_date) FROM Salaries AS S3 WHERE S0.emp_name = S3.emp_name AND S3.sal_date < S0.sal_date) UNION ALL SELECT S4.emp_name, MAX(S4.sal_date), MAX(S4.sal_amt), NULL, NULL FROM Salaries AS S4 GROUP BY S4.emp_name HAVING COUNT(*) = 1; emp_name sal_date sal_amt sal_date sal_amt ======================================================== 'Tom' '1996-12-20' 900.00 '1996-10-20' 800.00 'Harry' '1996-09-20' 700.00 '1996-07-20' 500.00 'Dick' '1996-06-20' 500.00 NULL NULL Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. PUZZLE 15 FIND THE LAST TWO SALARIES 63 DB2 programmers will recognize this as a version of the OUTER JOIN done without an SQL-92 standard OUTER JOIN operator. The first SELECT statement is the hardest. It is a self-join on the Salaries table, with copy S0 being the source for the most recent salary information and copy S1 the source for the next most recent information. The second SELECT statement is simply a grouped query that locates the employees with one row. Since the two result sets are disjoint, we can use the UNION ALL instead of a UNION operator to save an extra sorting operation. Answer #3 I got several answers in response to my challenge for a better solution to this puzzle. Richard Romley of Smith Barney sent in the following SQL-92 solution. It takes advantage of the subquery table expression to avoid VIEWs: SELECT B.emp_name, B.maxdate, Y.sal_amt, B.maxdate2, Z.sal_amt FROM (SELECT A.emp_name, A.maxdate, MAX(X.sal_date) AS maxdate2 FROM (SELECT W.emp_name, MAX(W.sal_date) AS maxdate FROM Salaries AS W GROUP BY W.emp_name) AS A LEFT OUTER JOIN Salaries AS X ON A.emp_name = X.emp_name AND A.maxdate > X.sal_date GROUP BY A.emp_name, A.maxdate) AS B LEFT OUTER JOIN Salaries AS Y ON B.emp_name = Y.emp_name AND B.maxdate = Y.sal_date LEFT OUTER JOIN Salaries AS Z ON B.emp_name = Z.emp_name AND B.maxdate2 = Z.sal_date; If your SQL product supports common table expressions (CTEs), you can convert some of the subqueries into VIEWs for the table subqueries named A and B. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 64 PUZZLE 15 FIND THE LAST TWO SALARIES Answer #4 Mike Conway came up with an answer in Oracle, which I tried to translate into SQL-92 with mixed results. The problem with the translation was that the Oracle version of SQL did not support the SQL- 92 standard OUTER JOIN syntax, and you have to watch the order of execution to get the right results. Syed Kadir, an associate application engineer at Oracle, sent in an improvement on my answer using the VIEW that was created in the first solution: SELECT S1.emp_name, S1.sal_date, S1.sal_amt, S2.sal_date, S2.sal_amt FROM Salaries1 AS S1, Salaries2 AS S2 use the view WHERE S1.emp_name = S2.emp_name AND S1.sal_date > S2.sal_date UNION ALL SELECT emp_name, MAX(sal_date), MAX(sal_amt), NULL, NULL FROM Salaries1 GROUP BY emp_name HAVING COUNT(*) = 1; You might have to replace the last two columns with the expressions CAST (NULL AS DATE) and CAST(NULL AS DECIMAL(8,2)) to assure that they are of the right datatypes for a UNION. Answer #5 Jack came up with a solution using the relational algebra operators as defined in one of Chris Date’s books on the www.dbdebunk.com Web site, which I am not going to post, since (1) the original problem was to be done in Oracle, and (2) nobody has implemented Relational Algebra. There is an experimental language called Tutorial D based on Relational Algebra, but it is not widely available. The problem with the solution was that it created false data. All employees without previous salary records were assigned a previous salary of 0.00 and a previous salary date of '1900-01-01', even though zero and no value are logically different and the universe did not start in 1900. Fabian Pascal commented that “This was a very long time ago and I do not recall the exact circumstances, and whether my reply was properly represented or understood (particularly coming from Celko). Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. PUZZLE 15 FIND THE LAST TWO SALARIES 65 My guess is that it had something to do with inability to resolve such problems without a precise definition of the tables to which the query is to be applied, the business rules in effect for the tables, and the query at issue. I will let Chris Date to respond to PV’s solution.” Chris Date posted a solution in his private language that was more compact than Jack’s solution, and that he evaluated was “Tedious, but essentially straightforward,” along with the remark “Regarding whether Celko’s solution is correct or not, I neither know, nor care.” A version that replaces the outer join with a COALESCE() by Andrey Odegov: SELECT S1.emp_name_id, S1.sal_date AS curr_date, S1.sal_amt AS curr_amt, CASE WHEN S2.sal_date <> S1.sal_date THEN S2.sal_date END AS prev_date, CASE WHEN S2.sal_date <> S1.sal_date THEN S2.sal_amt END AS prev_amt FROM Salaries AS S1 INNER JOIN Salaries AS S2 ON S2.emp_name_id = S1.emp_name_id AND S2.sal_date = COALESCE((SELECT MAX(S4.sal_date) FROM Salaries AS S4 WHERE S4.emp_name_id = S1.emp_name_id AND S4.sal_date < S1.sal_date), S2.sal_date) WHERE NOT EXISTS(SELECT * FROM Salaries AS S3 WHERE S3.emp_name_id = S1.emp_name_id AND S3.sal_date > S1.sal_date); Answer #6 One approach is to build a VIEW or CTE that gives all possible pairs of salary dates, and then filter them: CREATE VIEW SalaryHistory (curr_date, curr_amt, prev_date, prev_amt) Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 66 PUZZLE 15 FIND THE LAST TWO SALARIES AS SELECT S0.emp_name_id, S0.sal_date AS curr_date, S0.sal_amt AS curr_amt, S1.sal_date AS prev_date, S1.sal_amt AS prev_amt FROM Salaries AS S0 LEFT OUTER JOIN Salaries AS S1 ON S0.emp_name_id = S1.emp_name_id AND S0.sal_date > S1.sal_date; then use it in a self-join query: SELECT S0.emp_name_id, S0.curr_date, S0.curr_amt, S0.prev_date, S0.prev_amt FROM SalaryHistory AS S0 WHERE S0.curr_date = (SELECT MAX(curr_date) FROM SalaryHistory AS S1 WHERE S0.emp_name_id = S1.emp_name_id) AND (S0.prev_date = (SELECT MAX(prev_date) FROM SalaryHistory AS S2 WHERE S0.emp_name_id = S2.emp_name_id) OR S0.prev_date IS NULL) This is still complex, but that view might be useful for computing other statistics. Answer #7 Here is another version of the VIEW approach from MarkC600 on the SQL Server Newsgroup. The OUTER JOIN has been replaced with a RANK() function from SQL:2003. Study this and see how the thought pattern is changing: WITH SalaryRanks(emp_name, sal_date, sal_amt, pos) AS (SELECT emp_name, sal_date, sal_amt, RANK() OVER(PARTITION BY emp_name ORDER BY sal_date DESC) Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. PUZZLE 15 FIND THE LAST TWO SALARIES 67 FROM Salaries) SELECT C.emp_name, C.sal_date AS curr_date, C.sal_amt AS curr_amt, P.sal_date AS prev_date, P.sal_amt AS prev_amt FROM SalaryRanks AS C LEFT OUTER JOIN SalaryRanks AS P ON P.emp_name = C.emp_name AND P.pos = 2 WHERE C.pos = 1; Answer #8 Here is an SQL:2003 version, with OLAP functions and SQL-92 CASE expressions from Dieter Noeth: SELECT S1.emp_name, MAX (CASE WHEN rn = 1 THEN sal_date ELSE NULL END) AS curr_date, MAX (CASE WHEN rn = 1 THEN sal_amt ELSE NULL END) AS curr_amt, MAX (CASE WHEN rn = 2 THEN sal_date ELSE NULL END) AS prev_date, MAX (CASE WHEN rn = 2 THEN sal_amt ELSE NULL END) AS prev_amt, FROM (SELECT emp_name, sal_date, sal_amt, RANK()OVER (PARTITION BY S1.emp_name ORDER BY sal_date DESC) FROM Salaries) AS S1 (emp_name, sal_date, sal_amt, rn) WHERE rn < 3 GROUP BY S1.emp_name; The idea is to number the rows within each employee and then to pull out the two most current values for the employment date. The other approaches build all the target output rows first and then find the ones we want. This query finds the raw rows first and puts them together last. The table is used only once, no self-joins, but a hidden sort will be required for the RANK() function. This is probably not a problem in SQL engines that use contiguous storage or have indexing that will group the employee names together. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 68 PUZZLE 15 FIND THE LAST TWO SALARIES Answer #9 Here is another answer from Dieter Noeth using OLAP/CTE (tested on Teradata, but runs on MS-SQL 2005, too): WITH CTE (emp_name, sal_date, sal_amt, rn) AS (SELECT emp_name, sal_date, sal_amt , ROW_NUMBER() OVER (PARTITION BY emp_name ORDER BY sal_date DESC) AS rn – row numbering FROM Salaries) SELECT O.emp_name, O.sal_date AS curr_date, O.sal_amt AS curr_amt, I.sal_date AS prev_date, I.sal_amt AS prev_amt FROM CTE AS O LEFT OUTER JOIN CTE AS I ON O.emp_name = I.emp_name AND I.rn = 2 WHERE O.rn = 1; Again, SQL:2003 using OLAP functions in Teradata: SELECT emp_name, curr_date, curr_amt, prev_date, prev_amt FROM (SELECT emp_name, sal_date AS curr_date, sal_amt AS curr_amt, MIN(sal_date) OVER (PARTITION BY emp_name ORDER BY sal_date DESC ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) AS prev_date, MIN(sal_amt) OVER (PARTITION BY emp_name ORDER BY sal_date DESC ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) AS prev_amt, ROW_NUMBER() OVER (PARTITION BY emp_name ORDER BY sal_date DESC) AS rn FROM Salaries) AS DT WHERE rn = 1; This query would be easier if Teradata supported the WINDOW clause. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. PUZZLE 16 MECHANICS 69 PUZZLE 16 MECHANICS Gerard Manko at ARI posted this problem on CompuServe in April 1994. ARI had just switched over from Paradox to Watcom SQL (now part of Sybase). The conversion of the legacy database was done by making each Paradox table into a Watcom SQL table, without any thought of normalization or integrity rules—just copy the column names and data types. Yes, I know that as the SQL guru, I should have sent him to that ring of hell reserved for people who do not normalize, but that does not get the job done, and ARI’s approach is something I find in the real world all the time. The system tracks teams of personnel to work on jobs. Each job has a slot for a single primary mechanic and a slot for a single optional assistant mechanic. The tables involved look like this: CREATE TABLE Jobs (job_id INTEGER NOT NULL PRIMARY KEY, start_date DATE NOT NULL, ); CREATE TABLE Personnel (emp_id INTEGER NOT NULL PRIMARY KEY, emp_name CHAR(20) NOT NULL, ); CREATE TABLE Teams (job_id INTEGER NOT NULL, mech_type INTEGER NOT NULL, emp_id INTEGER NOT NULL, ); Your first task is to add some integrity checking into the Teams table. Do not worry about normalization or the other tables for this problem. What you want to do is build a query for a report that lists all the jobs by job_id, the primary mechanic (if any), and the assistant mechanic (if any). Here are some hints: You can get the job_ids from Jobs because that table has all of the current jobs, while the Teams table lists only those jobs for which a team has been assigned. The same person can be assigned as both a primary and assistant mechanic on the same job. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 70 PUZZLE 16 MECHANICS Answer #1 The first problem is to add referential integrity. The Teams table should probably be tied to the others with FOREIGN KEY references, and it is always a good idea to check the codes in the database schema, as follows: CREATE TABLE Teams (job_id INTEGER NOT NULL REFERENCES Jobs(job_id), mech_type CHAR(10) NOT NULL CHECK (mech_type IN ('Primary', 'Assistant')), emp_id INTEGER NOT NULL REFERENCES Personnel(emp_id), ); Experienced SQL people will immediately think of using a LEFT OUTER JOIN , because to get the primary mechanics only, you could write: SELECT Jobs.job_id, Teams.emp_id AS “primary” FROM Jobs LEFT OUTER JOIN Teams ON Jobs.job_id = Teams.job_id WHERE Teams.mech_type = 'Primary'; You can do a similar OUTER JOIN to the Personnel table to tie it to Teams, but the problem here is that you want to do two independent outer joins for each mechanic’s slot on a team, and put the results in one table. It is probably possible to build a horrible, deeply nested self OUTER JOIN all in one SELECT statement, but you would not be able to read or understand it. You could do the report with views for primary and assistant mechanics, and then put them together, but you can avoid all of this mess with the following query: SELECT Jobs.job_id, (SELECT emp_id FROM Teams WHERE Jobs.job_id = Teams.job_id AND Teams.mech_type = 'Primary') AS "primary", (SELECT emp_id FROM Teams WHERE Jobs.job_id = Teams.job_id Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. PUZZLE 16 MECHANICS 71 AND Teams.mech_type = 'Assistant') AS assistant FROM Jobs; The reason that “primary” is in double quotation marks is that it is a reserved word in SQL-92, as in PRIMARY KEY. The double quotation marks make the word into an identifier. When the same word is in single quotation marks, it is treated as a character string. One trick is the ability to use two independent scalar SELECT statements in the outermost SELECT. To add the employee’s name, simply change the innermost SELECT statements. SELECT Jobs.job_id, (SELECT name FROM Teams, Personnel WHERE Jobs.job_id = Teams.job_id AND Personnel.emp_id = Teams.emp_id AND Teams.mech_type = 'Primary') AS “primary", (SELECT name FROM Teams, Personnel WHERE Jobs,job_id = Teams,job_id AND Personnel.emp_id = Teams.emp_id AND Teams.mech_type = 'Assistant') AS Assistant FROM Jobs: If you have an employee acting as both primary and assistant mechanic on a single job, then you will get that employee in both slots. If you have two or more primary mechanics or two or more assistant mechanics on a job, then you will get an error, as you should. If you have no primary or assistant mechanic, then you will get an empty SELECT result, which becomes a NULL. That gives you the outer joins you wanted. Answer #2 Skip Lees of Chico, California, wanted to make the Teams table enforce the rules that: 1. A job_id has zero or one primary mechanics. 2. A job_id has zero or one assistant mechanics. 3. A job_id always has at least one mechanic of some kind. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. [...]... S0.district_nbr = S1.district_nbr AND S0.sales_amt . fam ================================ 'Bob' 'A' 1 NULL 'Joe' 'B' 3 NULL 'Mark' 'C' 5 NULL 'Mary' 'A'. sal_amt ======================================================== 'Tom' '1996-12-20' 900.00 '1996-10-20' 800.00 'Harry' '1996-09-20' 700.00 '1996-07-20'

Ngày đăng: 21/01/2014, 08:20

Từ khóa liên quan

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan