[personal profile] mjg59
I spent a bunch of the past week trying to figure out why my UEFI bootloader code was crashing in bizarre ways. It turns out to be a known issue that various people have hit in the past. After a lot of staring (and swearing) I finally got to the point where I could run a debugger against a running UEFI binary under qemu and found that I was entering a function with valid arguments but executing it with invalid ones. After spending a while crying on IRC people suggested that maybe the red zone was getting corrupted, and it turned out that this was the problem.

The red zone is part of the AMD64 System V ABI and consists of 128 bytes at the bottom of the stack frame. This region is guaranteed to be left untouched by any interrupt or signal handlers, so it's available for functions to use as scratch space. For the code in question, gcc was copying the registers onto the stack before executing the function. Something then trod on those registers. This also explained why I wasn't seeing the problem when I added debug print statements - signal and interrupt handlers won't touch it, but other functions might, so it tends to only be used by functions that don't call any other functions. Adding a print statement meant that the compiler left the red zone alone.

But why was it getting corrupted? UEFI uses the Microsoft AMD64 ABI rather than the System V one, and the Microsoft ABI doesn't define a red zone. UEFI executables built on Linux tend to use System V because that means we can link in static libraries built for Linux, rather than needing an entirely separate toolchain and libraries to build UEFI executables. The problem arises when we run one of these executables and there's a UEFI interrupt handler still running. Take an interrupt, Microsoft ABI interrupt handler gets called and the red zone gets overwritten.

There's a simple fix - we can just tell gcc not to assume that the red zone is there by passing -mno-red-zone. I did that and suddenly everything was stable. Frustrating, but at least it works now.

Assuming...

Date: 2012-07-03 04:05 am (UTC)
From: (Anonymous)
Assuming you don't link in any libraries compiled without that flag...

Date: 2012-07-03 08:20 am (UTC)
From: (Anonymous)
i love your work man!

Date: 2012-07-03 07:07 pm (UTC)
From: (Anonymous)
I remember that scene from Airplane:

System V AMD64 ABI: The red zone is for immediate loading and unloading only. There is no stomping on the red zone.

Microsoft AMD64 ABI: Listen SysV, don't start up with your red zone shit again.

System V AMD64 ABI: Oh, really, Microsoft? Why pretend, when we both know perfectly well what this is about. You want me to have data corruption.

Profile

Matthew Garrett

About Matthew

Power management, mobile and firmware developer on Linux. Security developer at Aurora. Ex-biologist. [personal profile] mjg59 on Twitter. Content here should not be interpreted as the opinion of my employer. Also on Mastodon.

Page Summary

Expand Cut Tags

No cut tags