May. 31st, 2011 12:43 pm
[personal profile] mjg59
You'd think it'd be easy to reboot a PC, wouldn't you? But then you'd also think that it'd be straightforward to convince people that at least making some effort to be nice to each other would be a mutually beneficial proposal, and look how well that's worked for us.

Linux has a bunch of different ways to reset an x86. Some of them are 32-bit only and so I'm just going to ignore them because honestly just what are you doing with your life. Also, they're horrible. So, that leaves us with five of them.
  • kbd - reboot via the keyboard controller. The original IBM PC had the CPU reset line tied to the keyboard controller. Writing the appropriate magic value pulses the line and the machine resets. This is all very straightforward, except for the fact that modern machines don't have keyboard controllers (they're actually part of the embedded controller) and even more modern machines don't even pretend to have a keyboard controller. Now, embedded controllers run software. And, as we all know, software is dreadful. But, worse, the software on the embedded controller has been written by BIOS authors. So clearly any pretence that this ever works is some kind of elaborate fiction. Some machines are very picky about hardware being in the exact state that Windows would program. Some machines work 9 times out of 10 and then lock up due to some odd timing issue. And others simply don't work at all. Hurrah!
  • triple - attempt to generate a triple fault. This is done by loading an empty interrupt descriptor table and then calling int(3). The interrupt fails (there's no IDT), the fault handler fails (there's no IDT) and the CPU enters a condition which should, in theory, then trigger a reset. Except there doesn't seem to be a requirement that this happen and it just doesn't work on a bunch of machines.
  • pci - not actually pci. Traditional PCI config space access is achieved by writing a 32 bit value to io port 0xcf8 to identify the bus, device, function and config register. Port 0xcfc then contains the register in question. But if you write the appropriate pair of magic values to 0xcf9, the machine will reboot. Spectacular! And not standardised in any way (certainly not part of the PCI spec), so different chipsets may have different requirements. Booo.
  • efi - EFI runtime services provide an entry point to reboot the machine. It usually even works! As long as EFI runtime services are working at all, which may be a stretch.
  • acpi - Recent versions of the ACPI spec let you provide an address (typically memory or system IO space) and a value to write there. The idea is that writing the value to the address resets the system. It turns out that doing so often fails. It's also impossible to represent the PCI reboot method via ACPI, because the PCI reboot method requires a pair of values and ACPI only gives you one.

Now, I'll admit that this all sounds pretty depressing. But people clearly sell computers with the expectation that they'll reboot correctly, so what's going on here?

A while back I did some tests with Windows running on top of qemu. This is a great way to evaluate OS behaviour, because you've got complete control of what's handed to the OS and what the OS tries to do to the hardware. And what I discovered was a little surprising. In the absence of an ACPI reboot vector, Windows will hit the keyboard controller, wait a while, hit it again and then give up. If an ACPI reboot vector is present, windows will poke it, try the keyboard controller, poke the ACPI vector again and try the keyboard controller one more time.

This turns out to be important. The first thing it means is that it generates two writes to the ACPI reboot vector. The second is that it leaves a gap between them while it's fiddling with the keyboard controller. And, shockingly, it turns out that on most systems the ACPI reboot vector points at 0xcf9 in system IO space. Even though most implementations nominally require two different values be written, it seems that this isn't a strict requirement and the ACPI method works.

3.0 will ship with this behaviour by default. It makes various machines work (some Apples, for instance), improves things on some others (some Thinkpads seem to sit around for extended periods of time otherwise) and hopefully avoids the need to add any more machine-specific quirks to the reboot code. There's still some divergence between us and Windows (mostly in how often we write to the keyboard controller), which can be cleaned up if it turns out to make a difference anywhere.

Now. Back to EFI bugs.


Date: 2011-05-31 08:07 pm (UTC)
From: (Anonymous)
What about programming the watchdog to very short timeout and disabling the polling? It would trigger a quite safe reboot, no?

Re: Watchdog?

Date: 2011-05-31 11:46 pm (UTC)
From: (Anonymous)
And it would hang some other boxes that deeply dislike whatever housekeeping the BIOS does during shutdown/reboot (e.g. some IBM ThinkPads) :-)

Links to patches?

Date: 2011-05-31 09:49 pm (UTC)
From: (Anonymous)
I honestly spent at least 10 minutes trying to locate your patches in Linus' tree but failed (git/gitweb makes history reading impossible) and searching for your name at gitweb didn't result anything - do you have any hints how to locate gitweb links to the patches?



Date: 2011-05-31 11:43 pm (UTC)
From: (Anonymous)
So, what does Linux in its current incarnations do when you run the "reboot" command? It generally seems to work well, but I haven't munged into the source to see what it does, and how it interacts with the hardware.
From: (Anonymous)

YAY maybe then my hp 6920p will stop hanging in the bios on reboot.
tho I probably should upgrade the bios anyway.
From: (Anonymous)
FWIW, my HP 8510w does the same thing...the last BIOS message is something about network controllers and then it hangs (only on reboot).

Date: 2011-06-01 12:29 pm (UTC)
From: (Anonymous)
Linux copying from Windows? I'm not sure what to make of this ;)

Date: 2011-06-01 12:52 pm (UTC)
From: [identity profile]
Windows is the only thing OEMs test with before shipping. You can either do what Windows does, or not work. Your call.

Date: 2011-06-03 07:51 pm (UTC)
From: (Anonymous)
Not entirely true. The last place I worked, every BIOS revision was tested on a few specific versions of Red Hat and openSUSE that we claimed to support. Old ones, sure, but better than nothing.

cf9 resets

Date: 2011-06-01 12:56 pm (UTC)
From: (Anonymous)
It depends on chipsets. Common usage of cf9:

AMD SB7xxx:
bit2 is trigger, bit1 and bit3 considered
bit1 - send HT INIT if 0 if 1 then bit3
bit3 - 0 assert resets, 1 put system to S5 for few seconds

VIA has a similar mechanism, bit2 is reset trigger, bit1 selects a PCIRST or INIT.

Look to sources in for the hard_reset and soft reset sequences.

I think those two phase settings is because PIIX4 had some bug, where there had to be a transition. New chipsets seems to require just only one write.

How about Halt ?

Date: 2011-06-01 02:12 pm (UTC)
From: (Anonymous)
How about just powering off the machine ? Are there many different ways just like rebooting ?

I was thinking with regard to this bug :

Re: How about Halt ?

Date: 2011-06-01 04:36 pm (UTC)
From: (Anonymous)
The funny thing is, Windows has the exact same probem. Some HP EliteBooks here come back to life after "shutdown". And since we have a PXE boot on the DHCP the boot Gentoo. You should see the faces :-)

Re: How about Halt ?

Date: 2011-06-02 09:53 am (UTC)
From: (Anonymous)
One way to powerdown a PC is by setting the SLP_TYP bits in the PM1a_CNT_BLK. This has been described here:

Dont forget the jmp far ffff:0000 in 16-bit mode

Date: 2011-06-01 03:52 pm (UTC)
From: (Anonymous)
In the days of windows 3.1 in 16-bit non-protected mode dropping to a DOS shell and typing the following would restart the computer as it instructed execution to begin where the machines address lines default to on power start.

C:\> debug
- jmp far ffff:0000

From: [identity profile]
Is there any particular reason a 64-bit kernel couldn't switch back to 32-bit mode and trigger the same vector?

Rebooting a PC

Date: 2011-06-01 04:12 pm (UTC)
From: (Anonymous)
This is also documented here:

Date: 2011-06-01 04:21 pm (UTC)
From: (Anonymous)
On topic of EFI reboot bugs ;)

Date: 2011-06-01 06:10 pm (UTC)
From: (Anonymous)
Just out of curiosity, does anyone know how FreeBSD does this?

FreeBSD's x86 reboot code

Date: 2011-06-27 12:58 pm (UTC)
From: (Anonymous)
FreeBSD does the following:

- ACPI reset
- Keyboard reset, then wait half a second to see if that worked.
- Two writes to the reset control register (0xcf9), first trying a "soft" reset, then trying a "hard" reset, and then wait half a second to see if that worked.
- Use the Fast A20 and Init register (0x92) if it exists and give it a half-second.
- Triple fault.
- Hang.

The ACPI bits are in the ACPI code, the rest is in cpu_reset_real() in sys/{amd64,i386}/{amd64,i386}/vm_machdep.c:

Can't reboot HP6420

Date: 2011-06-01 06:55 pm (UTC)
From: [personal profile] rogercruz
I'm having the same problem with reboot hanging on an HP6420 when running Ubuntu 11.04 and Xen 4.0.1. Both hang after the ACPI I/O port write is made. Switching to KBD or TRIPLE has no effect. Is there any other way to get these systems to reboot? Windows has no problems on it.

Roger R. Cruz

Date: 2011-06-01 07:07 pm (UTC)
From: [personal profile] rogercruz
I'll see what I can do about building and installing 3.0 but I'm not holding out hope for a fix. I downloaded the code already and looked at the reboot path and is pretty much the same.

I've added debug code to Xen and I can see the BIOS is reporting I/O port address 0x64 is the reset port with a value of 0xFE. however, once that port write is issued, the system does not return. I also tried keyboard and triple fault and neither caused a reset. Shutdown works fine.

(XEN) Virgin FADT table
(XEN) 0000: 46 41 43 50 F4 00 00 00 04 6F 44 45 4C 4C 20 20
(XEN) 0010: 43 42 58 33 20 20 20 00 04 20 22 06 4D 53 46 54
(XEN) 0020: 13 00 01 00 40 4E FE 78 18 D0 F6 78 01 02 09 00
(XEN) 0030: B2 00 00 00 A0 A1 00 80 00 04 00 00 00 00 00 00
(XEN) 0040: 04 04 00 00 00 00 00 00 50 04 00 00 08 04 00 00
(XEN) 0050: 20 04 00 00 00 00 00 00 04 02 01 04 10 00 00 85
(XEN) 0060: 65 00 E9 03 00 04 10 00 01 03 7D 7E 32 13 00 00
(XEN) 0070: A5 86 03 00 01 08 00 00 64 00 00 00 00 00 00 00
(XEN) 0080: FE 00 00 00 40 4D FE 78 00 00 00 00 18 D0 F6 78
(XEN) 0090: 00 00 00 00 01 20 00 00 00 04 00 00 00 00 00 00
(XEN) 00A0: 01 00 00 00 00 00 00 00 00 00 00 00 01 10 00 00
(XEN) 00B0: 04 04 00 00 00 00 00 00 01 00 00 00 00 00 00 00
(XEN) 00C0: 00 00 00 00 01 08 00 00 50 04 00 00 00 00 00 00
(XEN) 00D0: 01 20 00 00 08 04 00 00 00 00 00 00 01 80 00 00
(XEN) 00E0: 20 04 00 00 00 00 00 00 01 00 00 00 00 00 00 00
(XEN) 00F0: 00 00 00 00
(XEN) signature: FACP
(XEN) length = 0x000000f4
(XEN) revision = 0x04
(XEN) checksum = 0x6foem_id: DELL oem_table_id: CBX3
(XEN) oem_revision = 0x06222004MSFT
(XEN) asl_compiler_revision = 0x00010013
(XEN) facs = 0x78fe4e40
(XEN) dsdt = 0x78f6d018
(XEN) model = 0x01
(XEN) preferred_profile = 0x02
(XEN) sci_interrupt = 0x0009
(XEN) smi_command = 0x000000b2
(XEN) acpi_enable = 0xa0
(XEN) acpi_disable = 0xa1
(XEN) S4bios_request = 0x00
(XEN) pstate_control = 0x80
(XEN) pm1a_event_block = 0x00000400
(XEN) pm1b_event_block = 0x00000000
(XEN) pm1a_control_block = 0x00000404
(XEN) pm1b_control_block = 0x00000000
(XEN) pm_timer_block = 0x00000408
(XEN) gpe0_block = 0x00000420
(XEN) gpe1_block = 0x00000000
(XEN) pm1_event_length = 0x04
(XEN) pm1_control_length = 0x02
(XEN) pm2_control_length = 0x01
(XEN) pm_timer_length = 0x04
(XEN) gpe0_block_length = 0x10
(XEN) gpe1_block_length = 0x00
(XEN) gpe1_base = 0x00
(XEN) cst_control = 0x85
(XEN) C2latency = 0x0065
(XEN) C3latency = 0x03e9
(XEN) flush_size = 0x0400
(XEN) flush_stride = 0x0010
(XEN) duty_offset = 0x01
(XEN) duty_width = 0x03
(XEN) day_alarm = 0x7d
(XEN) month_alarm = 0x7e
(XEN) century = 0x32
(XEN) boot_flags = 0x0013
(XEN) reserved = 0x00
(XEN) flags = 0x000386a5
(XEN) space_id = 0x01
(XEN) bit_width = 0x08
(XEN) bit_offset = 0x00
(XEN) access_width = 0x00
(XEN) address = 0x0000000000000064
(XEN) reset_value = 0xfe
(XEN) reserved4[0] = 0x00
(XEN) reserved4[1] = 0x00
(XEN) reserved4[2] = 0x00
(XEN) Xfacs = 0x0000000078fe4d40
(XEN) Xdsdt = 0x0000000078f6d018

Date: 2011-06-01 07:08 pm (UTC)
From: [personal profile] rogercruz
This is my debug code in Xen when the reboot command is issued.

(XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
(XEN) acpi_hw_low_level_write: w=8, value=254
(XEN) acpi_hw_low_level_write: address=0000000000000064
(XEN) acpi_hw_low_level_write: address space is sytem io
(XEN) acpi_os_write_port: port 100, value=254, width=8

Date: 2011-06-01 07:34 pm (UTC)
From: [personal profile] rogercruz
oh.. that is a good hint about the controller being in a particular state. Let me see if I can trace what Windows does during its reboot.

Date: 2011-06-01 09:31 pm (UTC)
From: [personal profile] rogercruz

I traced what Windows does using XenTrace's capability (not the same as a debugger but close enough for most debugging). This is what I see as the last instructions issued by Win7. These addresses appear to be the Local APIC

* Timer (0xfee00320)
* Thermal (0xfee00330)
* PMC (0xfee00340)
* LINT0 (0xfee00350)
* LINT1 (0xfee00360)
* Error (0xfee00370)

I need to get the local APIC spec to figure out if any of these writes cause a system reset.

CPU1 6095373341843 (+ 0) MMIO_WRITE [ addr = 0xfee00320, data = 0x3001f ]
CCPU1 6095373366469 (+ 0) MMIO_WRITE [ addr = 0xfee00350, data = 0x1001f ]
CPU1 6095373372685 (+ 0) MMIO_WRITE [ addr = 0xfee00360, data = 0x184ff ]
CPU1 6095373376701 (+ 0) MMIO_WRITE [ addr = 0xfee00370, data = 0x100e3 ]
CPU1 6095373380580 (+ 0) MMIO_WRITE [ addr = 0xfee000f0, data = 0x001f ]

Date: 2011-06-01 11:48 pm (UTC)
From: [personal profile] rogercruz
I think I may have missed the Windows reset in my previous trace. A new one shows subsequent activity to the above APIC writes. This is on a VM guest running on top of Xen. The last write is the IOPORT write to 0x64.

CPU0 857940486111 (+ 0) CPUID [ func = 0x00000001, eax = 0x000206a7, ebx = 00200800, ecx = 8e982201, edx = 0x1789fbff ]
CPU0 857940488848 (+ 0) CPUID [ func = 0x40000001, eax = 0x31237648, ebx = 00000000, ecx = 00000000, edx = 0x00000000 ]
CCPU0 857940491234 (+ 0) CPUID [ func = 0x40000004, eax = 0x00000028, ebx = 000007ff, ecx = 00000000, edx = 0x00000000 ]
CPU0 857940493712 (+ 0) IOPORT_WRT [ port = 0x00000064, data = 0x00fe ]

Date: 2011-06-01 10:24 pm (UTC)
reddragdiva: (geek)
From: [personal profile] reddragdiva
Now I want to hear about 32 bit just to hear you telling horror stories.

what about...

Date: 2011-06-02 01:50 pm (UTC)
From: (Anonymous)

Date: 2011-06-02 07:20 pm (UTC)
reddragdiva: (Default)
From: [personal profile] reddragdiva
You know, I've only just been reminded that your first LiveJournal name was [ profile] sys64738 :-D


Date: 2011-06-06 03:12 pm (UTC)
From: (Anonymous)
I designed a motherboard and had to add hardware to make the keyboard controller reset work. The problem was that was that the reset pulse from the keyboard controller was on only 6uS wide, whereas most modern chipsets have denounce circuitry (filters) on their reset pins to make sure noise on the line doesn't trigger reset. The denounce circuitry on the chipset of the board I designed ignored anything less than 14mS. Therefore I added an edge-detector/pulse-stretcher.
It sure would be nice if industry would recognize the need for a standard here.

original IBM PC...

Date: 2011-06-11 01:13 pm (UTC)
From: [identity profile]
A side note:

The original IBM PC didn't have a keyboard controller.
Many things changed when the second generation PC's came out, the IBM AT with it's 286 CPU introduced the keyboard controller and that was IBM's hack for switching from protected to real mode. Communication between the keyboard itself and the computer also changed. Before the AT/286 there wasn't any possibility for the computer to control keyboard LED's. Some third party 100+ key keyboards for PC/XT/8088's had problems with loosing track of numlock state sometimes, ending up with the cursor keys and pgup/pgdn/end/home/delete/insert always sending numerical keypad numbers no matter what state numlock was in...
If you find an old keyboard there is a good chance it has a "XT/AT" switch somewhere...

Date: 2011-06-27 01:02 pm (UTC)
From: (Anonymous)
Very Old ex bare-metal coder & chip design guy here :-)

Wayyy Back to DOS / cleanroom non-IBM BIOs "PC-Compatibles"
The most assured way to get a GENUINE power up reset state for Intel 80x86
class chips was to collapse the stack pointer to zero and hit NMI

Got you a power-up assumed state full reset without the electical stress
ad delay of the Big Red Switch. 16 bytes or so coded directly to a .com
file in DOS debug.

Does this still work on modern Intel ix86 arch chips? It was in the specs.

How to rebooting

Date: 2011-08-03 04:58 am (UTC)
From: (Anonymous)
Great posting ,nice sharing of this article.

On keyboard controllers...

Date: 2013-04-11 02:17 am (UTC)
From: [identity profile]
On keyboard controllers, Ray Trent of Synaptics has a comment on the oldnewthing blog about this exact topic:

How do you change this behavior?

Date: 2017-06-17 04:08 pm (UTC)
From: (Anonymous)
How do you change the reboot behavior of FreeBSD? I have a system that hangs on reboot (won't reboot, will not reboot) and I'm sure this is the issue.

Re: How do you change this behavior?

Date: 2017-06-17 04:10 pm (UTC)
From: (Anonymous)
Here is how it is done on Ubuntu


Matthew Garrett

About Matthew

Power management, mobile and firmware developer on Linux. Security developer at Google. Ex-biologist. @mjg59 on Twitter. Content here should not be interpreted as the opinion of my employer.

Expand Cut Tags

No cut tags