benf.org : excel stuff : registration cost |

I've recently been registering large amounts of addin functions using the excel4/12 C api (xlfRegister)

It's become apparent that registration, once you're over about 15,000 functions, starts to become expensive, so I've done a little digging to determine why, and what (if anything can be done about it)

- Function name
- Function pointer

The below code snippet registers 300,000 addin functions (all pointing to the same symbol for simplicity) - In the example below, the function names alternate lower/upper case, and are registered in reverse case-insensitive lexicographic order, it should be simple to adapt the code for the various cases I describe below if you care to verify.

static LPSTR rgFuncs |

Playing with various permutations of this registration, I've produced the following graphs, where each data point (as can be seen from the code) is the cost of registering 100 functions, in seconds. Note that the data's so striking I've not bothered providing data from multiple runs, though I did verify it. ;)

Also, the figures I've provided are for excel 2007. The data had an identical shape on 2003, though costs were slightly different. (2003 was cheaper, for me.)

Total time 1285.2s. *(thus total cost is O(N^2)).*

Total time 6.4s. *(thus total cost is O(N)).*

Total time 430.5s

Total time 6.6s

The fact that (2) and (4) have identical performance characteristics strongly implies that excel is, upon registration, storing these functions in ascending case insensitive order. As there is no increase in cost for larger numbers of functions, this implies no buffer move, hence a linked list. (Though there are spikes every 32k-ish functions, which might indicate a slab alloc?)

The profile of (3) bears this out - for the first half, the insertion is done at the front of the list (cheap), for the second half, the insertion is done moving from the back to the front of the list, i.e. scanning less and less.

I briefly verifed that categories are (again, in 2k7) stored *unsorted*, and a linear scan is performed to lookup the internal category id. However, categories are clamped at 255, so this isn't such an issue.

The above was all performed by registering pointers to *the same* target pointer. Unfortunately, that's not liable to be very useful in the real world...

As of excel 2007, Excel performs a linear scan of previously registered functions to find if the function you are registering has been addressed before.

This makes function registration *significantly* more expensive! Below are a few graphs to demonstrate the cost. (30k functions, y (times) are in seconds).

*Note also that I've graphed cumulative times (rather than individual as above), so (of course) a graph of Y=nX indicates O(1) cost.*

In each case, I register 30,000 functions, always with the names dyn_99999 -> dyn_70000. (so optimally named, as determined above.)

We see the *massive* difference between excel 2003 and 2007. For sake of completion (they were available to me), I've show numbers for excel 95, and 2010 also.

2010 takes *almost 10 times* longer to register functions than excel 95 - approximately 2 minutes!

Here, I'm verifying that we see the same behaviour as in the incrementing case - i.e. that the actual order of pointers doesn't matter. I infer that this is because the internal function list is stored once, sorted on function name, therefore the pointer order is not relevant.

As we might expect, we see that the cost of registering a function which is already *at the front of*
the list (as the list is sorted alphabetically) is the time it takes to find it, i.e. 0(1). Therefore total cost
increases linearly with number of functions registered, as soon as we start repeating. Excel 2007 cost is the same as 2010, incase you can't see the plot!

I performed this test twice, once repeating the *first* address registred, once repeating the *last*
unique address (i.e. the 10,000th registered).
Essentially they were the same, since as soon as we've re-registered one function, it is at the start of the function list.

This is the more interesting/depressing one. I would have expected the cost of this to only involve searching the first two elements of the list of functions,
as alternating points means we are never searching beyond the first two. However, unfortunately this has identical behaviour to adding new functions, which indicates a full list scan, *i.e.* only if a pointer is the same as *the last registered* function do we optimise away the list scan.

While the costs involved are not particularly painful for excel <=2003, registering functions rapidly becomes extremely expensive in 2007+, with each successive version of excel appearing to become more expensive.

Unfortunately, it does not appear that there are any orderings for registration which will ameliorate this; As a personal opinion, I have trouble seeing what benefit there is to aliasing multiple registrations of the same pointer - it seems like a premature optimisation to me, which for large scales causes a lot more problems than it might fix.

Please note of course that this is using entirely synthetic data, and your mileage is sure to vary in the real world.

I have used the excel4 sdk, however the excel12 sdk does not improve matters in excel 2007+, as the interfacing costs for backwards compatibility are extremely cheap.

I'd be happy to discuss this; feel free to drop me an email.

Last updated 10/2011 |