|Matthew Garrett (mjg59) wrote,|
@ 2012-07-02 06:05 pm UTC
|Entry tags:||advogato, fedora|
The red zone is part of the AMD64 System V ABI and consists of 128 bytes at the bottom of the stack frame. This region is guaranteed to be left untouched by any interrupt or signal handlers, so it's available for functions to use as scratch space. For the code in question, gcc was copying the registers onto the stack before executing the function. Something then trod on those registers. This also explained why I wasn't seeing the problem when I added debug print statements - signal and interrupt handlers won't touch it, but other functions might, so it tends to only be used by functions that don't call any other functions. Adding a print statement meant that the compiler left the red zone alone.
But why was it getting corrupted? UEFI uses the Microsoft AMD64 ABI rather than the System V one, and the Microsoft ABI doesn't define a red zone. UEFI executables built on Linux tend to use System V because that means we can link in static libraries built for Linux, rather than needing an entirely separate toolchain and libraries to build UEFI executables. The problem arises when we run one of these executables and there's a UEFI interrupt handler still running. Take an interrupt, Microsoft ABI interrupt handler gets called and the red zone gets overwritten.
There's a simple fix - we can just tell gcc not to assume that the red zone is there by passing -mno-red-zone. I did that and suddenly everything was stable. Frustrating, but at least it works now.