Save performance of spill prevention (Excel 365)

benf.org : excel stuff : Save performance of spill prevention (Excel 365)

TL;DR;

Save performance in excel 365 can be terrible, because the parser uses an O(n) lookup, and the parser is invoked when saving spill restrictions.

We can fix this, by introducing a memoizer!

Overview

Prior to Excel 365, (added in a release some time in 2020), if you returned a range from an addin function, you would have to use an array formula to get it to display. Excel 365 now has 'spilling' functionality. This will, when given an array return, automatically consume as many cells as required to display all of the result. (or fail, if there's not enough space available :( )

If you use the 'implicit intersection' or 'spill prevention' operator '@' in Excel365 (to disable spilling, and return to the behaviour previously had), the performance of saving sheets which involve calls to UDFs is.... terrible. I've seen spreadsheets take over an hour to save.....

It's clear that when using spill prevention, excel invokes parsing code, which does a linear scan of the list of registered UDFs, meaning the time taken to save your formula depends on how many type UDFs are registered, and where the one you're calling comes, alphabetically!

Performance

I use an Excel addin which registers 10,000 functions, named dummyFunc00000 to named dummyFunc09999 (see resources at bottom).

These functions are dynamically generated, but are equivalent to

__declspec(cdecl) extern "C" LPXLOPER12 dummyFunc00001(LPXLOPER12* pargs) {
	static XLOPER12 res;
	res.xltype = xltypeInt;
	res.val.w = 1;
	return &res;
}

.... and so on. (This isn't strictly necessary - we could just have 10000 functions all pointing to the same code.)

I then generate a simple sheet, which calls a given function 8192 times, either using spilling....

....or using the intersection operator '@', which disables spilling, and restores pre-spill behaviour

Note that none of my functions ACTUALLY spill. Indeed, (see below), you don't even have to use a type that could spill. If you return an int, but mark it with the spill restriction operator, you still pay the cost.

In these screenshots, I've used dummyFunc09500. Below is the cost, in milliseconds, to save a sheet composed of 8192 calls to dummyFuncXXXXX, where I measure XXXXX every 500.

That is, we save once with 8k calls to dummyFunc00000, once with 8k calls to dummyFunc00500, and so on.

The sheet to generate these timings is in the resources below, but, the key bit is literally just the below:

   Dim strfrm As String
   For x = 0 To 9999 Step 500
        strfrm = "=@dummyFunc" & Format(Str(x), "00000") & "(1,2)"
        ' Use /formula2/ because formula prefixes @ implicitly as it's a compatibility property
        rng.Formula2 = strfrm
        Call qpc(startTime)
        Call ActiveWorkbook.Save
        Call qpc(endTime)
        
        elapsed = (endTime - startTime) / perSecond
        Debug.Print "" & x & "|" & (elapsed * 1000)
    Next x

As you can see, the cost to save increases perfectly linearly with the name of the function being used. When we're saving 8k calls to @dummyFunc00000, it costs 189ms. When we're saving 8k calls to @dummyFunc9500, it costs 4148ms.

This is a toy sheet. In practice, it's easy to make a sheet which takes many minutes/hours to save. :(

Is it all UDFs?

Unfortuantely, yes!

The only type of UDF which could conceivably spill is one returning an XLOPER type (Q/R) (i.e. it could return a range.) An addin function returning a single int couldn't spill, so the spill prevention is meaningless. If our addin function looks like this:

__declspec(cdecl) extern "C" int myIdentity(int x) {
	return x;
}

There's simply no way it could spill. However, it's still legal to mark it with a spill prevention operator.

So.... why?

Given the graph above, we're reasonably confident there's a linear scan going on. At this point we trot out IDA Free.....

If we attach to a process with the sheet/addin described above, and search for "dummyFunc09000", assuming that we're using 16bit characters....

We find two instances in memory; looking at one, we can see it's a length (circled) prefixed wide string - let's set a hardware breakpoint on each.

We then try to save a simple sheet containing only =@dummyFunc09999(1,2) in a single cell.

Bang - we access our unrelated function name. And we can see, that, oddly, it's being considered vs the name we're trying to save....

Ok, that's pretty odd - why on earth would we need to look at dummyFunc09000 when we're saving a sheet that doesn't go near it? And what else are we searching for?

If we repeat this process with dummyFunc00000, and look around the area of the comparison, we quickly see this...

I've commented the line, but it's clearly advancing to the next item in a linked list. In fact, the type of the data that's being walked looks like this

struct func_item {
  int flags;
  func_item *pNext;
  BSTR name; // length prefixed, wide string.
  ... others.
}

We can try this out in the IDC REPL.

... but it's nicer to usa a tiny little IDC script (no IDA Python, we're using IDA Free!)

static printAddinFunctions(ea) {
   auto curr = ea;
   while (curr) {
     print(get_strlit_contents(read_dbg_dword(curr+8), -1, 9));
     curr =  read_dbg_dword(curr+4);  
   }
}

And run it over our register containing the start of the linked list (ebx).. And, lo.....

IDC>printAddinFunctions(ebx)
"dummyFunc00000"
"dummyFunc00001"
"dummyFunc00002"
"dummyFunc00003"
"dummyFunc00004"
"dummyFunc00005"
"dummyFunc00006"
....

... yup. This is a sorted linked list containing all our functions. (I previously discussed the ordering of registered functions here).

Ok, so why is it fast when not spilling?

At this point, with my breakpoints in place, I switched to saving a sheet which didn't use spill restriction. The code path wasn't called at all.

But. I could trigger the same code paths when editing formulae, and again observe the linear scan of functions.

It's clear that the problem is the parsing code, which is O(N), due to using a linked list of addin functions. The parsing code (for some reason!) is invoked when we save cells that use '@'.

.... Ok - can we fix this?

at this point, I've switched to the 64 bit version of Excel 365

It's hard to improve the data structure Excel's devs have (for some reason) chosen to use, given that all we have is the binary....

But given that we've found a region in the binary where we navigate a linked list to find a pointer to a (excel) function, what we can do is memoize it!

Essentially, we want to replace (arguments tweaked for brevity)

int lookup_addin_function(char *search_term, func_item *plist_head, func_item **ppresult) {
	// do horribly expensive thing.
}

with (replacing all calls to lookup_addin_function with calls to cached_lookup_addin_function)

int cached_lookup_addin_function(char *search_term, func_item *plist_head, func_item **ppresult) {
	.... look up in cache
	.... return if found
	auto res = lookup_addin_function(search_term, plist_head, ppresult);
	.... cache result.
	return res;
}

int lookup_addin_function(char *search_term, func_item *plist_head, func_item **ppresult) {
	// do horribly expensive thing.
}

We can do this by adding a trampoline at the start of the original 'lookup_addin_function', which redirects to our replacement, meaning what we produce is more like this:

int lookup_addin_function(char *search_term, func_item *plist_head, func_item **ppresult) {
    goto cached_lookup_addin_function
original:
	// do horribly expensive thing.
}


int cached_lookup_addin_function(char *search_term, func_item *plist_head, func_item **ppresult) {
	.... look up in cache
	.... return if found
	//auto res = original(search_term, plist_head, ppresult);
	call original
	.... cache result.
	return res;
}

Building the trampoline can be a pain, but fortunately microsoft research published (20+ years ago!) the excellent Detours package, which will do this all for us! (and the original detours paper gives a way better description of this than I have....) so we just need to find the right function, and replace it.

Finding our function

All we need to do here is find a distinctive assembly pattern in the function that walks the linked list (the one we want to detour). Fortunately, there's a very distinctive pattern...

.... meaning we just need to walk all the pages of excel.exe looking for our needle. yara is great for binary pattern matching, but what I'm looking for is so simple I've just written a tiny matcher against the raw opcodes - eg 8b c2 is 'mov eax, edx' (try disassembling it here)

mov     eax, edx              // 0x8b, 0xc2,
and     eax, 2                // 0x83, 0xe0, 0x02
mov     [rbp+60h+var_D0], eax // 0x89, ? ?  // don't hardcode the address, use wildcard
mov     eax, edx              // 0x8b, 0xc2
and     eax, 4                // 0x83, 0xe0, 0x04
mov     [rbp+60h+var_D8], eax // 0x89, ? ?

this finds us our function, then we walk back to the function prelude, and bang, we've got an address to detour!

Ok - let's do it

Here's the code...... so - what does it do to save times?

This gives us two addin functions (because I might as well deliver it as an addin! - _spillFix(enable, debug, verify), and _spillFixStats()

Debug, verification info, and stats are dumped to the debug console, so if you haven't got debugview (why?) get it now!

We run the simple bit of VBA again (just against the expensive '@' functions), turning the patch on and off....

That'll do pig.

Resources

The xll used dynamically generates dummyFuncXXXXX entry points dummyAddin.xll.
Sheet (with simple vba to generate and time saves of different functions: dummyFunc_sheet.xlsm.
Tiny IDC script to dump addin functions.
A git repository containing the icky workaround described here.
Binaries (compiled in debug mode) fixExcelSave_32.xll.
... and fixExcelSave_64.xll.

Last updated 06/2021