![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
Thanks to everyone for the informative comments on my previous post.
I found a bug in my simulation code which meant that many of the tests were performed on arrays that were half-full of zeroes (oops). This distorted the stats a little, and more for larger N because I tested fewer different arrays for large N. This bug caused the increase in s.d. for large N: the corrected code has the same s.d. for all N. Another effect was to increase the maximum number of iterations required to find an entry. After the fix the maximum number of iterations goes down to log(log(N))+12 for the pure secant method, and (log(N)+14)/2 for Junio's mixed secant/bisection method. Altogether much more well-behaved.
no subject
Date: 2009-07-30 19:17 (UTC)Does your simulation bear this out? (The relative costs of division and memory lookup have changed since Knuth was writing §6.2.1, so it might not be true any more.)
no subject
Date: 2009-07-30 23:19 (UTC)no subject
Date: 2009-07-31 11:30 (UTC)In git, there's a 256-entry meta-index which reduces the number of probes in a bisection search by 8, but is probably less useful than a single iteration of the secant method when N > 2^16. Each index entry is 24 bytes, so 168 of them fit in a page. This implies that git won't see a clear benefit from the secant method until you get packs of 350 million objects, which is not really feasible when pack files are limited to 4GB :-) But this does explain why it isn't a win when large repositories currently have one or two million objects.
no subject
Date: 2009-07-30 21:34 (UTC)no subject
Date: 2009-07-30 22:54 (UTC)