Birkhäuser boston daniel h greene donald e knuth mathematics for the analysis of algorithms

Modern Birkhfiuser C l a s s i c s Many of the original r e s e a r c h and survey m o n o g r a p h s in p u r e and applied m a t h e m a t i c s p u b l i s h e d by Birkh~user in r e c e n t d e c a d e s have b e e n g r o u n d b r e a k i n g a n d have c o m e to be r e g a r d e d as foundational to the subject T h r o u g h the MBC Series, a s e l e c t n u m b e r of t h e s e m o d e r n classics, entirely uncorrected, are being re-released in p a p e r b a c k (and as eBooks) to e n s u r e that these t r e a s u r e s r e m a i n accessible to new g e n e r a t i o n s of students, scholars, a n d r e s e a r c h e r s Mathematics for the Analysis of Algorithms Third Edition Daniel H Greene Donald E Knuth Reprint o f the 9 E d i t i o n Birkh~iuser B o s t o n B a s e l Berlin Daniel H Greene Computer Science Laboratory Xerox Palo Alto Research Center Stanford, CA 94304 U.S.A Donald E Knuth Department of Computer Science Stanford University Stanford, CA 94305 U.S.A Originally published as Volume in the series Progress in Computer Science and Applied Logic Cover design by Alex Gerasev Mathematics Subject Classification: 34E10, 34M30, 41A60, 65Q05, 68Q25, 68w40 Library of Congress Control Number: 2007936766 ISBN-13:978-0-8176-4728-5 Printed on acid-free paper e-ISBN-13:978-0-8176-4729-2 | Birkh~iuser Boston Birkhi~user All rights reserved This work may not be translated or copied in whole or in part without the written permission of the publisher (Birkhauser Boston, c/o Springer Science+Business Media LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden The use in this publication of trade names, trademarks, service marks and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary fights 987654321 www.birkhauser.com (IBT) Daniel H Greene Donald E Knuth Mathematics for the Analysis of Algorithms Third Edition 1990 Birkhauser Boston Basel Berlin Daniel H Greene C o m p u t e r Science L a b o r a t o r y Xerox Palo Alto Research Center Stanford, C A 94304, U.S.A D o n a l d E K n u t h D e p a r t m e n t of C o m p u t e r Science Stanford University Stanford, C A 94305, U.S.A Library of Congress Cataloging-in-Publication Data Greene, Daniel H., 1955Mathematics for the analysis of algorithms / Daniel H Greene, Donald E K n u t h - 3rd ed p c m - (Progress in computer science and applied logic ; no l) Includes bibliographical references and index ISBN 0-8176-3515-7 (alk paper) I Electronic digital computers-Programming Algorithms I Knuth, Donald Ervin, 1938 II Title III Series QA76.6 G7423 1990 005.1 - d c 90-517 Printed on acid-free paper Birkh~luser Boston, 1981, 1982, 1990 All rights reserved No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior permission of the copyright owner Permission to photocopy for internal or personal use, or the internal or personal use of specific clients, is granted by Birkhtuser Boston, Inc., for libraries and other users registered with the Copyright Clearance Center (CCC), provided that the base fee of $0.00 per copy, plus $0.20 per page is paid directly to CCC, 21 Congress Street, Salem, MA 01970, U.S.A Special requests should be addressed directly to Birkhauser Boston, Inc., 675 Massachusetts Avenue, Cambridge, MA 02139, U.S.A ISBN 0-8176-3515-7 ISBN 3-7643-3515-7 Photocomposed copy prepared with TaX using the UL~TS~ONTsystem Printed and bound by R.R Donnelly & Sons Harrisonburg, VA, U.S.A Printed in the U.S.A Preface This monograph is derived from an advanced course in computer science at Stanford University on the analysis of algorithms The course presents examples of the major paradigms used in the precise analysis of algorithms, emphasizing some of the more difficult techniques Much of the material is drawn from the starred sections of The Art of Computer Programming, Volume [Knuth III] Analysis of algorithms, as a discipline, relies heavily on both computer science and mathematics This report is a mathematical look at the synthesis emphasizing the mathematical perspective, but using motivation and examples from computer science It covers binomial identities, recurrence relations, operator methods and asymptotic analysis, hopefully in a format that is terse enough for easy reference and yet detailed enough to be of use to those who have not attended the lectures However, it is assumed that the reader is familiar with the fundamentals of complex variable theory and combinatorial analysis Winter 1980 was the fourth offering of Analysis of Algorithms, and credit is due to the previous teachers and staff Leo Guibas, Scott Drysdale, Sam Bent, Andy Yao, and Phyllis Winkler for their detailed contributions to the documentation of the course Portions of earlier handouts are incorporated in this monograph Harry Mairson, Andrei Broder, Ken Clarkson, and Jeff Vitter contributed helpful comments and corrections, and the preparation of these notes was also aided by the facilities of Xerox corporation and the support of NSF and Hertz graduate fellowships In this third edition we have made a few improvements to the exposition and fixed a variety of minor errors We have also added several new appendices containing exam problems from 1982 and 1988 D.H.G and D.E.K Contents Binomial Identities 1.1 S u m m a r y of U s e f u l I d e n t i t i e s 1.2 D e r i v i n g t h e I d e n t i t i e s 1.3 I n v e r s e R e l a t i o n s 1.4 O p e r a t o r C a l c u l u s 1.5 H y p e r g e o m e t r i c Series 1.6 I d e n t i t i e s w i t h t h e H a r m o n i c 1 10 Recurrence Relations 2.1 L i n e a r R e c u r r e n c e R e l a t i o n s 2.1.1 F i n i t e H i s t o r y 2.1.1.1 C o n s t a n t Coefficients 2.1.1.2 V a r i a b l e Coefficients 2.1.2 F u l l H i s t o r y 2.1.2.1 D i f f e r e n c i n g 2.1.2.2 B y R e p e r t o i r e 2.2 N o n l i n e a r R e c u r r e n c e R e l a t i o n s 2.2.1 R e l a t i o n s w i t h M a x i m u m or M i n i m u m F u n c t i o n s 2.2.2 C o n t i n u e d F r a c t i o n s a n d H i d d e n L i n e a r R e c u r r e n c e s 2.2.3 D o u b l y E x p o n e n t i a l S e q u e n c e s 11 11 12 12 14 17 17 17 21 21 25 27 Operator Methods 3.1 T h e C o o k i e M o n s t e r 3.2 C o a l e s c e d H a s h i n g 3.3 O p e n A d d r e s s i n g : U n i f o r m H a s h i n g 3.4 O p e n A d d r e s s i n g : S e c o n d a r y C l u s t e r i n g 31 31 34 38 39 Numbers 118 APPENDICES We proceed as in [Knuth III; exercise 5.2.2-54] to represent the sum as ( - ) n n! ~( dz f z(z - 1) ( z - n)(Qz _ 1) 2~i where the contour encircles { , , n} and no other poles If we increase the contour to a large rectangle whose upper and lower segments have imaginary part + ~ r ( N + 89 Q where N is an integer, the contour integral approaches zero, so the sum of the residues inside approaches zero T h e residue at is the coefficient of z in (1 z ) ( ~l z ) ( _ ~1 z ) ( + ~1 z l n Q + .)lnQ namely ( H n - ~ l In Q ) / I n Q The sum of residues at 1, ., n is - W And the sum of residues at In Q + ibm and In Q - ibm, where b = ~ / I n Q and m > 1, is / I n Q times twice the real part of n! B ( n + 1, ibm) = F(ibm) n ibm (ibm)(ibm + 1) (ibm + n) = r(ibm) n'bm(1 + (The last estimate comes by expanding numbers; for example, we have o([oj [o]o i a-1 See [GKP; exercise 9.44].) Now W H~ + In Q 5+~ n ibm "+" O(•-1)) in terms of generalized Stirling [o] n- ~-2 Ir(ibm)n'bml = +O_n O(e-~/~), , ,) -a_ so we have ~ ~(r(ibm)n~bm) + O(n-~)" rn>l The sum is a bounded function f(n) t h a t is periodic in t h e s e n s e t h a t f(n) = f(Qn) Tombs Feder used Euler's s u m m a t i o n formula to deduce the remarkable representation f (n) ~o ~176 ( ( log u / n ) ) e -~ du log Q where ((x)) is the sawtooth function [Knuth II; w SOLUTIONS TO FINAL EXAM Ill 119 Solution to Problem Let g ( x ) - (e -z2 - 1)/x and f ( z ) = ( z / v ~ ) Then n S - - nH(n~x = = y]~ 0 + b 2, i f a ( r + a + b 2) > r ( a + b ) ; m i n ( ( ~ - I~l) ~ + b ~, ]b(~ ~ - ~ - b~)]/l~ + ibl), otherwise ( T h e p r o o f is by s e t t i n g z = r e i~ a n d t a k i n g t h e d e r i v a t i v e w i t h r e s p e c t to E x t r e m a o c c u r w h e n s i n = or w h e n we h a v e cos/? = a ( r + a + b ) / ( r ( a + b2)).) U n f o r t u n a t e l y this idea i s n ' t e n o u g h by itself; t h e p r o d u c t of all t h e s e b o u n d s t u r n s out to be less t h a n r 16 B e t t e r b o u n d s are possible if we use t h e i n e q u a l i t y I z - rk] _> I lrk - r I Iz - rl] T h e n if Irk - r i > we c a n c o n c l u d e t h a t Iz - rkl ~_ Irk - r l - 5, w h e n e v e r ] z - r] < 5; similarly if Irk - r I < we c a n c o n c l u d e t h a t [z - rk[ >_ - I r k r[, w h e n e v e r [z - r[ >_ P u t t i n g these ideas t o g e t h e r yields a r i g o r o u s p r o o f t h a t [A(z)[ > JR(z)[ for all z on the circle [z I = r, for a n y choice of r b e t w e e n 59 a n d 68 (See t h e a t t a c h e d M A C S Y M A t r a n s c r i p t T h e c o m p u t e d values rx, , r9 S O L U T I O N S T O F I N A L E X A M III 121 are only a p p r o x i m a t i o n s to the true roots of A(z); b u t t h e fact t h a t the difference (z - r l ) (z - r9) - A(z) has very small coefficients implies t h a t our calculations are p l e n t y accurate when Izl < 1.) Consequently Rouchd's t h e o r e m applies, and Q(z) has e x a c t l y one root po inside Izl = r T h i s root is real, and Newton's m e t h o d converges quickly to Po = 0.57614876914275660229786 T h e contour integral f P(z) dz is O ( r - ' * ) , a n d the s u m of residues inside is fn + P(Po) Q'(po) " Hence we have f , = COPo" + O(r-"), where co = P(po)/(poQ'(po)); numerically po 1.7356628245303472565826 ; co = 0.312363324596741453066279 It turns out t h a t the next largest root of Q(z) is also real; it is pl 81559980; Cl = P(Pl )/ (plQ'(pl )) = 03795269 T h e graph of Q(z) looks like this for _ z _< 9: +0.1 o \ -0.1 T h e r e is a n o t h e r root b e t w e e n 88 and 89 To check, Odlyzko c o m p u t e d f120 = 17002133686539084706594617194, a n d found t h a t f12o - co/p 12~ ~ 1.6 • 109 If we s u b t r a c t cl/p~ 2~ the error goes down to 1.3 x 105 (Odlyzko's work was published in [Odlyzko 88] after this e x a m was given.) 122 APPENDICES T h i s i s M A C S Y M A 304 (C1) t ( k , z ) : = z t ( k t ) / p r o d ( - z ? j (C2) q ( n , z ) : = s u m ( ( - ) f k * t ( k , z ) (C3) a:num(factor(q(3,z))) (D3) Z (C4) a l l r o o t s ( a ) ; ,k,O,n) ; ; + Z , j ,1 , k ) ; + Z - Z - Z - Z + Z + Z - (C5) f o r n t h r u p r i n t ( n , r [ n ] : r h s ( p a r t ( d , n ) ) , a b s ( r [ n ] ) ) ; 0.575774066 0.575774066 ~I - 9 6 3 - 0.81792161 ~I - 0.469966464 0.94332615 2 ~I + 4 9 0.74832744 - 0.07522564 ~I 0.75209896 ~I - 1 - 0.36716983 ~I - 1.05926119 1.1210923 1.58184962 ~I + 0.493013173 1.65689777 0.493013173 - 1.58184962 ~I 1.65689777 (C6) r m a x ( r ) : = r f l / ( - r f ) / ( - r t / ( - r t ) ) ; (C7) b o u n d l ( a , b , r ) : = b l o c k ( I t , s ] , s : a t + b ? , t : ( r - a b s ( a ) ) f + b $ , if a*(rf2+s)>2*r*s then t else min(t,abs(b,(rt2-s))/sqrt(s))); (C8) b o u n d ( a , b , r ) : = b l o c k ( if s.5 then t else max(t, (s-.5)$2)) ; (CIO) a m i n l ( r ) : = ( r - r [1] )*prodCbound2 ( r e a l p a r t ( r [ * k ] ) , imagpart ( r [ * k ] ) , r ) , k , l , ) + O*"a l o w e r b o u n d f o r a l l z s u c h t h a t I z - r l > = " ; (ell) amin2 ( r ) : = ( r - r [1] ) * p r o d ( b o u n d ( r e a l p a r t ( r [2*k] ) , i m a g p a r t ( r [2=k] ) , r ) , k , , ) + O*"a l o w e r b o u n d f o r a l l z s u c h t h a t [ z - r [ < = " ; (C12) a m i n ( r ) :=minCaminl ( r ) , a m i n C r ) ) ; (C13) f o r n : t h r u 70 p r i n t ( n , r m a x ( n * O ) , a m i n ( n , ) ) ; 58 8 E - 0 4 E - 59 2.4762821E-4 4.7992996E-4 60 3.2769739E-4 8.2895893E-4 61 4.320998E-4 1.18362144E-3 62 5.6784198E-4 1.54014562E-3 63 7.438718E-4 1.89452055E-3 64 9.7160927E-4 2.24249464E-3 65 1.26562865E-3 2.57957187E-3 66 1.6445353E-3 2.90100428E-3 123 SOLUTIONS TO FINAL EXAM III 67 68 69 70 2.13209912E-3 3.19357002E-3 2.75873208E-3 3.17922997E-3 3.56342027E-3 3.1048643E-3 4.596277E-3 2.92984536E-3 (C14) q p r i m e ( n , z ) : = s u m ( ( - ) t k e t ( k , z ) e l o g t p r i m e ( k , z ) ,k,O,n) ; (C15) l o g t p r i m e ( k , z ) : = k t / z + s u m ( j * z t ( j - 1) / ( - z t j ) , j , ,k) ; (C16) l o o p ( z ) : = b l o c k ( [ z o , z n ] , z o : O , z n : z , while a b s ( z o - z n ) > l O t - l O d o ( z o : z n , p r i n t ( z n : i t e r a t e ( z o ) ) ) ,zo) ; (C17) t ( , ) + O * " a n upper bound on the a l t e r n a t i n g sum Q ( ) - Q ( , ) " ; 1.3545675E-14 (D17) (C18) i t e r a t e ( z ) : = b f l o a t ( z - q ( , z ) / q p r i m e ( , z ) ) ; (C19) l o o p ( S B - ) ; 5.761132798756077B-1 5.761487662923891B-1 5.761487691427566B-1 5.761487691427566B-1 (D19) 5.761487691427566B-1 (C20) p ( n , z ) : = s u m ( ( - ) t k * t ( k , z ) * z t k , k , O , n ) ; (C21) c ( r h o ) : = - p ( , r h o ) / ( r h o * q p r i m e ( , r h o ) ) ; (C22) c(d19) ; (D18) 3.123633245967415B-1 (C23) expand (prod ( z - r [k] , k , l , g ) - d ) ; (D23) - 1.49011612E-8 7,I Z - 7.4505806E-9 + 1.49011612E-8 7,I + 1.63912773E-7 Z - 1.49011612E-8 7,I Z + 8.9406967E-8 Z + 1.1920929E-7 - 4.47034836E-8 ~I Z - Z Z + 2 E- 7,I - 1.78813934E-7 Z 7,1 Z Z 3 - 2.98023224E-7 Z Z - 1.04308128E-7 ~I Z 2.01165676E-7 (C24) an the "The upper sum of bound polynomial the on that 5.2154064E-8 absolute the 7,1 values difference is bounded Z + of between by amin"; 1.49011612E-7 those the Z + coefficients true A(z) 7.4505806E-9 is an d 7,I 124 Appendix APPENDICES I: A Q u a l i f y i n g E x a m P r o b l e m and Solution Qual Problem T h e result of a recent midterm problem was to analyze LBTs and to show t h a t their average p a t h length is about the same as that of ordinary binary search trees B u t shortly after the midterm was graded, our sources discovered that Quick was undaunted by that analysis According to reliable reports, he has recently decided to try salvaging his idea by including new information in each node T h e nodes in Quick's new d a t a structures, which he calls I L B T s (Improved Late Binding Trees), contain a size field that tells how m a n y leaves are in the subtree rooted at that node Step (4) on page 102 is now replaced by a new step: When a branch node is being split, the insertion continues in whichever subtree is currently smaller (If the subtree sizes are equal, a r a n d o m decision is made as before.) T h e purpose of this problem is to carry out a "top level" analysis of Quick's new algorithm Let Pnk be the probability that the root is ( k k + l ) after inserting a random p e r m u t a t i o n of { , , n} (We assume t h a t all p e r m u t a t i o n s of the x's are equally likely; first x l is made into an I L B T by itself, then x2 through xn are inserted one by one.) Let Pnk n! Pnk T h e n it can be verified that we have the following values of Pnk for < k < n a n d l _ n - - k or (k - n - - k a n d a r a n d o m coin flip comes up tails) (4) xn > k + and X l x n - leads to the root ( k k + 1) Therefore we find, for _< k < n and n > 2, Pnk P(n-1)(k-1)(k - + [n + > 2k] + l[n + = 2k]) + P(n-1)k (n k - + I n - < 2k] + = 2k]) (b) It is easy to see t h a t Pnk = Pn(n-k), SO Qnk Qn(n-k) T h u s it suffices to consider k