![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
You'd think it'd be easy to reboot a PC, wouldn't you? But then you'd also think that it'd be straightforward to convince people that at least making some effort to be nice to each other would be a mutually beneficial proposal, and look how well that's worked for us.
Linux has a bunch of different ways to reset an x86. Some of them are 32-bit only and so I'm just going to ignore them because honestly just what are you doing with your life. Also, they're horrible. So, that leaves us with five of them.
Now, I'll admit that this all sounds pretty depressing. But people clearly sell computers with the expectation that they'll reboot correctly, so what's going on here?
A while back I did some tests with Windows running on top of qemu. This is a great way to evaluate OS behaviour, because you've got complete control of what's handed to the OS and what the OS tries to do to the hardware. And what I discovered was a little surprising. In the absence of an ACPI reboot vector, Windows will hit the keyboard controller, wait a while, hit it again and then give up. If an ACPI reboot vector is present, windows will poke it, try the keyboard controller, poke the ACPI vector again and try the keyboard controller one more time.
This turns out to be important. The first thing it means is that it generates two writes to the ACPI reboot vector. The second is that it leaves a gap between them while it's fiddling with the keyboard controller. And, shockingly, it turns out that on most systems the ACPI reboot vector points at 0xcf9 in system IO space. Even though most implementations nominally require two different values be written, it seems that this isn't a strict requirement and the ACPI method works.
3.0 will ship with this behaviour by default. It makes various machines work (some Apples, for instance), improves things on some others (some Thinkpads seem to sit around for extended periods of time otherwise) and hopefully avoids the need to add any more machine-specific quirks to the reboot code. There's still some divergence between us and Windows (mostly in how often we write to the keyboard controller), which can be cleaned up if it turns out to make a difference anywhere.
Now. Back to EFI bugs.
Linux has a bunch of different ways to reset an x86. Some of them are 32-bit only and so I'm just going to ignore them because honestly just what are you doing with your life. Also, they're horrible. So, that leaves us with five of them.
- kbd - reboot via the keyboard controller. The original IBM PC had the CPU reset line tied to the keyboard controller. Writing the appropriate magic value pulses the line and the machine resets. This is all very straightforward, except for the fact that modern machines don't have keyboard controllers (they're actually part of the embedded controller) and even more modern machines don't even pretend to have a keyboard controller. Now, embedded controllers run software. And, as we all know, software is dreadful. But, worse, the software on the embedded controller has been written by BIOS authors. So clearly any pretence that this ever works is some kind of elaborate fiction. Some machines are very picky about hardware being in the exact state that Windows would program. Some machines work 9 times out of 10 and then lock up due to some odd timing issue. And others simply don't work at all. Hurrah!
- triple - attempt to generate a triple fault. This is done by loading an empty interrupt descriptor table and then calling int(3). The interrupt fails (there's no IDT), the fault handler fails (there's no IDT) and the CPU enters a condition which should, in theory, then trigger a reset. Except there doesn't seem to be a requirement that this happen and it just doesn't work on a bunch of machines.
- pci - not actually pci. Traditional PCI config space access is achieved by writing a 32 bit value to io port 0xcf8 to identify the bus, device, function and config register. Port 0xcfc then contains the register in question. But if you write the appropriate pair of magic values to 0xcf9, the machine will reboot. Spectacular! And not standardised in any way (certainly not part of the PCI spec), so different chipsets may have different requirements. Booo.
- efi - EFI runtime services provide an entry point to reboot the machine. It usually even works! As long as EFI runtime services are working at all, which may be a stretch.
- acpi - Recent versions of the ACPI spec let you provide an address (typically memory or system IO space) and a value to write there. The idea is that writing the value to the address resets the system. It turns out that doing so often fails. It's also impossible to represent the PCI reboot method via ACPI, because the PCI reboot method requires a pair of values and ACPI only gives you one.
Now, I'll admit that this all sounds pretty depressing. But people clearly sell computers with the expectation that they'll reboot correctly, so what's going on here?
A while back I did some tests with Windows running on top of qemu. This is a great way to evaluate OS behaviour, because you've got complete control of what's handed to the OS and what the OS tries to do to the hardware. And what I discovered was a little surprising. In the absence of an ACPI reboot vector, Windows will hit the keyboard controller, wait a while, hit it again and then give up. If an ACPI reboot vector is present, windows will poke it, try the keyboard controller, poke the ACPI vector again and try the keyboard controller one more time.
This turns out to be important. The first thing it means is that it generates two writes to the ACPI reboot vector. The second is that it leaves a gap between them while it's fiddling with the keyboard controller. And, shockingly, it turns out that on most systems the ACPI reboot vector points at 0xcf9 in system IO space. Even though most implementations nominally require two different values be written, it seems that this isn't a strict requirement and the ACPI method works.
3.0 will ship with this behaviour by default. It makes various machines work (some Apples, for instance), improves things on some others (some Thinkpads seem to sit around for extended periods of time otherwise) and hopefully avoids the need to add any more machine-specific quirks to the reboot code. There's still some divergence between us and Windows (mostly in how often we write to the keyboard controller), which can be cleaned up if it turns out to make a difference anywhere.
Now. Back to EFI bugs.
Watchdog?
Date: 2011-05-31 08:07 pm (UTC)Re: Watchdog?
Date: 2011-05-31 08:11 pm (UTC)Re: Watchdog?
Date: 2011-05-31 11:46 pm (UTC)Links to patches?
Date: 2011-05-31 09:49 pm (UTC)Thanks.
Re: Links to patches?
Date: 2011-05-31 10:05 pm (UTC)f17d9cbf20c4734c4199caa6dee87047f2f8278f
6734fe57a07b2dd23ef1ef2ac1f790747e53eefc
95cf3e12e7f659e536215b37c67d46f3e2ce95cc
660e34cebf0a11d54f2d5dd8838607452355f321
Just sticking them into gitweb should bring the patches up.
Rebooting
Date: 2011-05-31 11:43 pm (UTC)Re: Rebooting
Date: 2011-06-01 12:24 am (UTC)My laptop hangs at the bios splash when I run reboot.
Date: 2011-06-01 05:58 am (UTC)YAY maybe then my hp 6920p will stop hanging in the bios on reboot.
tho I probably should upgrade the bios anyway.
Re: My laptop hangs at the bios splash when I run reboot.
Date: 2011-06-01 08:25 am (UTC)no subject
Date: 2011-06-01 12:29 pm (UTC)no subject
Date: 2011-06-01 12:52 pm (UTC)no subject
Date: 2011-06-03 07:51 pm (UTC)cf9 resets
Date: 2011-06-01 12:56 pm (UTC)AMD SB7xxx:
bit2 is trigger, bit1 and bit3 considered
bit1 - send HT INIT if 0 if 1 then bit3
bit3 - 0 assert resets, 1 put system to S5 for few seconds
VIA has a similar mechanism, bit2 is reset trigger, bit1 selects a PCIRST or INIT.
Look to sources in coreboot.org for the hard_reset and soft reset sequences.
I think those two phase settings is because PIIX4 had some bug, where there had to be a transition. New chipsets seems to require just only one write.
Re: cf9 resets
Date: 2011-06-01 01:09 pm (UTC)How about Halt ?
Date: 2011-06-01 02:12 pm (UTC)I was thinking with regard to this bug : https://bugzilla.kernel.org/show_bug.cgi?id=35262
Re: How about Halt ?
Date: 2011-06-01 02:18 pm (UTC)Unless you're booting via EFI (which you're almost certainly not), we'll be using ACPI to shut down. The problem here isn't that we're failing to shut down properly, the problem is that somehow we're leaving a wakeup source active and the machine powers itself back up the moment it's been shut down. The GPE rework generated a few of these bugs, but I'd hoped they were fixed now. Apparently not.
Re: How about Halt ?
Date: 2011-06-01 04:36 pm (UTC)Re: How about Halt ?
Date: 2011-06-02 09:53 am (UTC)http://smackerelofopinion.blogspot.com/2011/02/turning-off-pc-using-intel-82801-io.html
Re: How about Halt ?
Date: 2011-06-02 12:54 pm (UTC)Dont forget the jmp far ffff:0000 in 16-bit mode
Date: 2011-06-01 03:52 pm (UTC)C:\> debug
- jmp far ffff:0000
Re: Dont forget the jmp far ffff:0000 in 16-bit mode
Date: 2011-06-01 03:54 pm (UTC)Re: Dont forget the jmp far ffff:0000 in 16-bit mode
Date: 2011-06-01 08:40 pm (UTC)Re: Dont forget the jmp far ffff:0000 in 16-bit mode
Date: 2011-06-01 08:57 pm (UTC)Rebooting a PC
Date: 2011-06-01 04:12 pm (UTC)http://smackerelofopinion.blogspot.com/2009/06/rebooting-pc.html
no subject
Date: 2011-06-01 04:21 pm (UTC)https://bugs.launchpad.net/ubuntu/+source/linux/+bug/721576
no subject
Date: 2011-06-01 06:10 pm (UTC)FreeBSD's x86 reboot code
Date: 2011-06-27 12:58 pm (UTC)- ACPI reset
- Keyboard reset, then wait half a second to see if that worked.
- Two writes to the reset control register (0xcf9), first trying a "soft" reset, then trying a "hard" reset, and then wait half a second to see if that worked.
- Use the Fast A20 and Init register (0x92) if it exists and give it a half-second.
- Triple fault.
- Hang.
The ACPI bits are in the ACPI code, the rest is in cpu_reset_real() in sys/{amd64,i386}/{amd64,i386}/vm_machdep.c:
http://svnweb.freebsd.org/base/head/sys/amd64/amd64/vm_machdep.c?revision=222813&view=markup#l577
Can't reboot HP6420
Date: 2011-06-01 06:55 pm (UTC)Roger R. Cruz
Re: Can't reboot HP6420
Date: 2011-06-01 06:58 pm (UTC)no subject
Date: 2011-06-01 07:07 pm (UTC)I've added debug code to Xen and I can see the BIOS is reporting I/O port address 0x64 is the reset port with a value of 0xFE. however, once that port write is issued, the system does not return. I also tried keyboard and triple fault and neither caused a reset. Shutdown works fine.
(XEN) Virgin FADT table
(XEN) 0000: 46 41 43 50 F4 00 00 00 04 6F 44 45 4C 4C 20 20
(XEN) 0010: 43 42 58 33 20 20 20 00 04 20 22 06 4D 53 46 54
(XEN) 0020: 13 00 01 00 40 4E FE 78 18 D0 F6 78 01 02 09 00
(XEN) 0030: B2 00 00 00 A0 A1 00 80 00 04 00 00 00 00 00 00
(XEN) 0040: 04 04 00 00 00 00 00 00 50 04 00 00 08 04 00 00
(XEN) 0050: 20 04 00 00 00 00 00 00 04 02 01 04 10 00 00 85
(XEN) 0060: 65 00 E9 03 00 04 10 00 01 03 7D 7E 32 13 00 00
(XEN) 0070: A5 86 03 00 01 08 00 00 64 00 00 00 00 00 00 00
(XEN) 0080: FE 00 00 00 40 4D FE 78 00 00 00 00 18 D0 F6 78
(XEN) 0090: 00 00 00 00 01 20 00 00 00 04 00 00 00 00 00 00
(XEN) 00A0: 01 00 00 00 00 00 00 00 00 00 00 00 01 10 00 00
(XEN) 00B0: 04 04 00 00 00 00 00 00 01 00 00 00 00 00 00 00
(XEN) 00C0: 00 00 00 00 01 08 00 00 50 04 00 00 00 00 00 00
(XEN) 00D0: 01 20 00 00 08 04 00 00 00 00 00 00 01 80 00 00
(XEN) 00E0: 20 04 00 00 00 00 00 00 01 00 00 00 00 00 00 00
(XEN) 00F0: 00 00 00 00
(XEN)
(XEN) signature: FACP
(XEN) length = 0x000000f4
(XEN) revision = 0x04
(XEN) checksum = 0x6foem_id: DELL oem_table_id: CBX3
(XEN) oem_revision = 0x06222004MSFT
(XEN) asl_compiler_revision = 0x00010013
(XEN) facs = 0x78fe4e40
(XEN) dsdt = 0x78f6d018
(XEN) model = 0x01
(XEN) preferred_profile = 0x02
(XEN) sci_interrupt = 0x0009
(XEN) smi_command = 0x000000b2
(XEN) acpi_enable = 0xa0
(XEN) acpi_disable = 0xa1
(XEN) S4bios_request = 0x00
(XEN) pstate_control = 0x80
(XEN) pm1a_event_block = 0x00000400
(XEN) pm1b_event_block = 0x00000000
(XEN) pm1a_control_block = 0x00000404
(XEN) pm1b_control_block = 0x00000000
(XEN) pm_timer_block = 0x00000408
(XEN) gpe0_block = 0x00000420
(XEN) gpe1_block = 0x00000000
(XEN) pm1_event_length = 0x04
(XEN) pm1_control_length = 0x02
(XEN) pm2_control_length = 0x01
(XEN) pm_timer_length = 0x04
(XEN) gpe0_block_length = 0x10
(XEN) gpe1_block_length = 0x00
(XEN) gpe1_base = 0x00
(XEN) cst_control = 0x85
(XEN) C2latency = 0x0065
(XEN) C3latency = 0x03e9
(XEN) flush_size = 0x0400
(XEN) flush_stride = 0x0010
(XEN) duty_offset = 0x01
(XEN) duty_width = 0x03
(XEN) day_alarm = 0x7d
(XEN) month_alarm = 0x7e
(XEN) century = 0x32
(XEN) boot_flags = 0x0013
(XEN) reserved = 0x00
(XEN) flags = 0x000386a5
(XEN) space_id = 0x01
(XEN) bit_width = 0x08
(XEN) bit_offset = 0x00
(XEN) access_width = 0x00
(XEN) address = 0x0000000000000064
(XEN) reset_value = 0xfe
(XEN) reserved4[0] = 0x00
(XEN) reserved4[1] = 0x00
(XEN) reserved4[2] = 0x00
(XEN) Xfacs = 0x0000000078fe4d40
(XEN) Xdsdt = 0x0000000078f6d018
no subject
Date: 2011-06-01 07:08 pm (UTC)(XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
(XEN) acpi_hw_low_level_write: w=8, value=254
(XEN) acpi_hw_low_level_write: address=0000000000000064
(XEN) acpi_hw_low_level_write: address space is sytem io
(XEN) acpi_os_write_port: port 100, value=254, width=8
no subject
Date: 2011-06-01 07:13 pm (UTC)(XEN) reset_value = 0xfe
I swear to all that is holy.
What I'd suspect in this case is that the keyboard controller needs to be in a specific state before this works. I've got no idea what that state is. I'll see if I can pull up the Windows config I have again and see what it does to the i8042 on shutdown.
no subject
Date: 2011-06-01 07:34 pm (UTC)no subject
Date: 2011-06-01 09:31 pm (UTC)I traced what Windows does using XenTrace's capability (not the same as a debugger but close enough for most debugging). This is what I see as the last instructions issued by Win7. These addresses appear to be the Local APIC
* Timer (0xfee00320)
* Thermal (0xfee00330)
* PMC (0xfee00340)
* LINT0 (0xfee00350)
* LINT1 (0xfee00360)
* Error (0xfee00370)
I need to get the local APIC spec to figure out if any of these writes cause a system reset.
CPU1 6095373341843 (+ 0) MMIO_WRITE [ addr = 0xfee00320, data = 0x3001f ]
CCPU1 6095373366469 (+ 0) MMIO_WRITE [ addr = 0xfee00350, data = 0x1001f ]
CPU1 6095373372685 (+ 0) MMIO_WRITE [ addr = 0xfee00360, data = 0x184ff ]
CPU1 6095373376701 (+ 0) MMIO_WRITE [ addr = 0xfee00370, data = 0x100e3 ]
CPU1 6095373380580 (+ 0) MMIO_WRITE [ addr = 0xfee000f0, data = 0x001f ]
no subject
Date: 2011-06-01 11:48 pm (UTC)CPU0 857940486111 (+ 0) CPUID [ func = 0x00000001, eax = 0x000206a7, ebx = 00200800, ecx = 8e982201, edx = 0x1789fbff ]
CPU0 857940488848 (+ 0) CPUID [ func = 0x40000001, eax = 0x31237648, ebx = 00000000, ecx = 00000000, edx = 0x00000000 ]
CCPU0 857940491234 (+ 0) CPUID [ func = 0x40000004, eax = 0x00000028, ebx = 000007ff, ecx = 00000000, edx = 0x00000000 ]
CPU0 857940493712 (+ 0) IOPORT_WRT [ port = 0x00000064, data = 0x00fe ]
no subject
Date: 2011-06-01 10:24 pm (UTC)what about...
Date: 2011-06-02 01:50 pm (UTC)Re: what about...
Date: 2011-06-02 02:17 pm (UTC)no subject
Date: 2011-06-02 07:20 pm (UTC)kbd
Date: 2011-06-06 03:12 pm (UTC)It sure would be nice if industry would recognize the need for a standard here.
original IBM PC...
Date: 2011-06-11 01:13 pm (UTC)The original IBM PC didn't have a keyboard controller.
Many things changed when the second generation PC's came out, the IBM AT with it's 286 CPU introduced the keyboard controller and that was IBM's hack for switching from protected to real mode. Communication between the keyboard itself and the computer also changed. Before the AT/286 there wasn't any possibility for the computer to control keyboard LED's. Some third party 100+ key keyboards for PC/XT/8088's had problems with loosing track of numlock state sometimes, ending up with the cursor keys and pgup/pgdn/end/home/delete/insert always sending numerical keypad numbers no matter what state numlock was in...
If you find an old keyboard there is a good chance it has a "XT/AT" switch somewhere...
no subject
Date: 2011-06-27 01:02 pm (UTC)Wayyy Back to DOS / cleanroom non-IBM BIOs "PC-Compatibles"
The most assured way to get a GENUINE power up reset state for Intel 80x86
class chips was to collapse the stack pointer to zero and hit NMI
Got you a power-up assumed state full reset without the electical stress
ad delay of the Big Red Switch. 16 bytes or so coded directly to a .com
file in DOS debug.
Does this still work on modern Intel ix86 arch chips? It was in the specs.
How to rebooting
Date: 2011-08-03 04:58 am (UTC)On keyboard controllers...
Date: 2013-04-11 02:17 am (UTC)http://blogs.msdn.com/b/oldnewthing/archive/2004/09/17/230839.aspx#230905
How do you change this behavior?
Date: 2017-06-17 04:08 pm (UTC)Re: How do you change this behavior?
Date: 2017-06-17 04:10 pm (UTC)http://michalorman.com/2013/10/fix-ubuntu-freeze-during-restart/