Samsung laptop bug is not Linux specific
Feb. 8th, 2013 10:41 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
I bricked a Samsung laptop today. Unlike most of the reported cases of Samsung laptops refusing to boot, I never booted Linux on it - all experimentation was performed under Windows. It seems that the bug we've been seeing is simultaneously simpler in some ways and more complicated in others than we'd previously realised.
So, some background. The original belief was that the samsung-laptop driver was doing something that caused the system to stop working. This driver was coded to a Samsung specification in order to support certain laptop features that weren't accessible via any standardised mechanism. It works by searching a specific area of memory for a Samsung-specific signature. If it finds it, it follows a pointer to a table that contains various magic values that need to be written in order to trigger some system management code that actually performs the requested change. This is unusual in this day and age, but not unique. The problem is that the magic signature is still present on UEFI systems, but attempting to use the data contained in the table causes problems.
We're not quite sure what those problems are yet. Originally we assumed that the magic values we wrote were causing the problem, so the samsung-laptop driver was patched to disable it on UEFI systems. Unfortunately, this doesn't actually fix the problem - it just avoids the easiest way of triggering it. It turns out that it wasn't the writes that caused the problem, it was what happened next. Performing the writes triggered a hardware error of some description. The Linux kernel caught and logged this. In the old days, people would often never see these logs - the system would then be frozen and it would be impossible to access the hard drive, so they never got written to disk. There's code in the kernel to make this easier on UEFI systems. Whenever a severe error is encountered, the kernel copies recent messages to the UEFI variable storage space. They're then available to userspace after a reboot, allowing more accurate diagnostics of what caused the crash.
That crash dump takes about 10K of UEFI storage space. Microsoft require that Windows 8 systems have at least 64K of storage space available. We only keep one crash dump - if the system crashes again it'll simply overwrite the existing one rather than creating another. This is all completely compatible with the UEFI specification, and Apple actually do something very similar on their hardware. Unfortunately, it turns out that some Samsung laptops will fail to boot if too much of the variable storage space is used. We don't know what "too much" is yet, but writing a bunch of variables from Windows is enough to trigger it. I put some sample code here - it writes out 36 variables each containing a kilobyte of random data. I ran this as an administrator under Windows and then rebooted the system. It never came back.
This is pretty obviously a firmware bug. Writing UEFI variables is expressly permitted by the specification, and there should never be a situation in which an OS can fill the variable store in such a way that the firmware refuses to boot the system. We've seen similar bugs in Intel's reference code in the past, but they were all fixed early last year. For now the safest thing to do is not to use UEFI on any Samsung laptops. Unfortunately, if you're using Windows, that'll require you to reinstall it from scratch.
So, some background. The original belief was that the samsung-laptop driver was doing something that caused the system to stop working. This driver was coded to a Samsung specification in order to support certain laptop features that weren't accessible via any standardised mechanism. It works by searching a specific area of memory for a Samsung-specific signature. If it finds it, it follows a pointer to a table that contains various magic values that need to be written in order to trigger some system management code that actually performs the requested change. This is unusual in this day and age, but not unique. The problem is that the magic signature is still present on UEFI systems, but attempting to use the data contained in the table causes problems.
We're not quite sure what those problems are yet. Originally we assumed that the magic values we wrote were causing the problem, so the samsung-laptop driver was patched to disable it on UEFI systems. Unfortunately, this doesn't actually fix the problem - it just avoids the easiest way of triggering it. It turns out that it wasn't the writes that caused the problem, it was what happened next. Performing the writes triggered a hardware error of some description. The Linux kernel caught and logged this. In the old days, people would often never see these logs - the system would then be frozen and it would be impossible to access the hard drive, so they never got written to disk. There's code in the kernel to make this easier on UEFI systems. Whenever a severe error is encountered, the kernel copies recent messages to the UEFI variable storage space. They're then available to userspace after a reboot, allowing more accurate diagnostics of what caused the crash.
That crash dump takes about 10K of UEFI storage space. Microsoft require that Windows 8 systems have at least 64K of storage space available. We only keep one crash dump - if the system crashes again it'll simply overwrite the existing one rather than creating another. This is all completely compatible with the UEFI specification, and Apple actually do something very similar on their hardware. Unfortunately, it turns out that some Samsung laptops will fail to boot if too much of the variable storage space is used. We don't know what "too much" is yet, but writing a bunch of variables from Windows is enough to trigger it. I put some sample code here - it writes out 36 variables each containing a kilobyte of random data. I ran this as an administrator under Windows and then rebooted the system. It never came back.
This is pretty obviously a firmware bug. Writing UEFI variables is expressly permitted by the specification, and there should never be a situation in which an OS can fill the variable store in such a way that the firmware refuses to boot the system. We've seen similar bugs in Intel's reference code in the past, but they were all fixed early last year. For now the safest thing to do is not to use UEFI on any Samsung laptops. Unfortunately, if you're using Windows, that'll require you to reinstall it from scratch.
no subject
Date: 2013-02-09 06:00 am (UTC)no subject
Date: 2013-02-09 08:56 am (UTC)UEFI data in NAND Flash on motherboard
Date: 2013-02-09 09:32 am (UTC)no subject
Date: 2013-02-09 02:29 pm (UTC)CMOS Battery?
Date: 2013-02-09 11:11 am (UTC)Great writeup as always. Can you comment on the rumors going around that removing the CMOS NVRAM battery will make the board bootable again? Obviously that would mean taking apart the laptop to get at the motherboard, which voids the warranty. But for testing the fix, a developer might be willing to give up their warranty to iterate faster.
https://bugs.launchpad.net/ubuntu-cdimage/+bug/1040557/comments/23 on https://bugs.launchpad.net/ubuntu-cdimage/+bug/1040557 is one example of this.
UEFI to BIOS
Date: 2013-02-09 11:56 am (UTC)Re: UEFI to BIOS
Date: 2013-02-09 02:31 pm (UTC)Re: UEFI to BIOS
From: (Anonymous) - Date: 2013-02-13 09:49 am (UTC) - ExpandRe: UEFI to BIOS
From: (Anonymous) - Date: 2013-02-09 04:30 pm (UTC) - ExpandRe: UEFI to BIOS
From: (Anonymous) - Date: 2013-02-09 04:31 pm (UTC) - ExpandRe: UEFI to BIOS
From: (Anonymous) - Date: 2014-01-10 03:18 am (UTC) - ExpandSeriously, why?
Date: 2013-02-09 01:02 pm (UTC)Seriously. If it's price you are worried about, I'll gladly sell you a brick of wood for a VERY good price if you are willing to believe that it's an equivalent piece of hardware and the only difference is price. You get what you pay for, people.
OK was that predictable enough? Well I'm sorry about that. But come on, when are people going to learn?
Re: Seriously, why?
Date: 2013-02-09 01:44 pm (UTC)Re: Seriously, why?
From: (Anonymous) - Date: 2013-02-09 01:50 pm (UTC) - ExpandRe: Seriously, why?
From: (Anonymous) - Date: 2013-02-09 03:24 pm (UTC) - ExpandRe: Seriously, why?
From: (Anonymous) - Date: 2013-02-09 10:26 pm (UTC) - ExpandRe: Seriously, why?
From: (Anonymous) - Date: 2013-02-10 12:12 am (UTC) - ExpandRe: Seriously, why?
From: (Anonymous) - Date: 2013-02-09 09:57 pm (UTC) - ExpandRe: Seriously, why?
From: (Anonymous) - Date: 2013-02-09 10:29 pm (UTC) - ExpandRe: Seriously, why?
From: (Anonymous) - Date: 2013-02-09 11:59 pm (UTC) - ExpandRe: Seriously, why?
From: (Anonymous) - Date: 2013-02-12 08:05 pm (UTC) - ExpandRe: Seriously, why?
From: (Anonymous) - Date: 2013-02-18 06:49 am (UTC) - Expandno subject
Date: 2013-02-09 03:16 pm (UTC)Regarding Windows, I recently discovered that you can actually migrate a Windows 8 install from MBR+BIOS to GPT+UEFI. It's not straightforward, but it's possible (using bootrec and bcdboot). I haven't tried the other way around or with Windows 7, but I think that should be doable as well.
no subject
Date: 2013-02-09 03:28 pm (UTC)That doesn´t change anything
From: (Anonymous) - Date: 2013-02-10 11:54 pm (UTC) - ExpandFix for windows 8
From: (Anonymous) - Date: 2013-05-23 03:11 am (UTC) - ExpandDo you know *what* is broken?
Date: 2013-02-09 05:38 pm (UTC)QueryVariableInfo()?
Date: 2013-02-09 09:51 pm (UTC)Re: QueryVariableInfo()?
Date: 2013-02-09 09:53 pm (UTC)Re: QueryVariableInfo()?
From: (Anonymous) - Date: 2013-02-10 07:36 am (UTC) - Expandgeez what a surprise........
Date: 2013-02-10 04:12 am (UTC)How to prevent bricking laptop?
Date: 2013-02-10 07:53 am (UTC)Is there any setting to prevent running into this bug accidentally?
I precise that I run Linux (xubuntu-12.10) exclusively (Windows8 has been wiped out) and UEFI is disabled in BIOS.
Re: How to prevent bricking laptop?
Date: 2013-02-12 02:51 pm (UTC)I would do the same, i.e. wipe Windows if I didn't need it sometimes. But I guess there's no way to install Ubuntu using CM mode without removing Windows partitions first that occupy the beginning of the hard drive as it uses the new GTP partition table.
36?
Date: 2013-02-10 02:18 pm (UTC)thanks for the insights! One small and insignificant question: how do you figure that your code writes 36 variables? The maximum of the loop is 48 and I don't see how you could notice from the error code that it stopped in iteration 36.
Have a great day!
Re: 36?
Date: 2013-02-10 03:52 pm (UTC)Bios mode may also have problems
Date: 2013-02-10 04:37 pm (UTC)no subject
Date: 2013-02-10 09:54 pm (UTC)Worst case scenario take legal action against the creator.
Can the system boot or show a bios splash screen upon powering it on?
no subject
Date: 2013-02-10 09:56 pm (UTC)No. It's bricked.
My Samsung Chronos returns to life
From: (Anonymous) - Date: 2013-02-14 09:31 am (UTC) - Expandhello
From: (Anonymous) - Date: 2014-11-19 04:30 pm (UTC) - ExpandWay to recover
Date: 2013-02-11 08:21 am (UTC)http://flashrom.org/FT2232SPI_Programmer
If you need to send it back to Samsung for each test you want to run it would take most likely a bit long.
Re: Way to recover
Date: 2013-02-11 03:56 pm (UTC)How does one recover from this?
Date: 2013-02-24 05:19 pm (UTC)Re: How does one recover from this?
Date: 2013-06-21 03:13 am (UTC)no subject
Date: 2013-02-26 06:25 pm (UTC)Should the proper solution to using Linux in UEFI/Secure boot mode in such a samsung laptop be to use a Linux kernel with the below Bug fix i.e
commit 266c43c175a51002b04c18a453a39708d1775ced
Author: Satoru Takeuchi
Date: Thu Feb 14 09:12:52 2013 +0900
efi: Clear EFI_RUNTIME_SERVICES rather than EFI_BOOT by "noefi" boot parameter
And in turn pass the noefi boot param to the kernel while booting into Linux.
The reason for my above assumption being that passing noefi to linux kernel as a boot param will I assume disable the use of efi runtime services by the kernel and its modules. And thus in no circumstance (including kernel crash) the Linux kernel will use the efi runtime service to write to the efi storage (I am also assuming that it will not allow any other logic to use efi service, by telling efi that it is relinquishing use of efi runtime service for this instance of the boot) And that is the 100% sure way of ensuring that under linux one cann't trigger this bug in the normal sense (Still is it 100% safe from a security perspective I am not sure if Samsung efi logic doesn't have any loop holes which allows one to call efi services even if one has already relinquished it - I am talking logically here, because I haven't looked into efi in detail so am making some/many assumptions).
So if one wants to dual boot a system with win8 already installed in Secure boot UEFI mode and Linux (in Secureboot/UEFI mode) THEN one should use a distro of linux which is using linux kernels later than Feb 15 with the above mentioned noefi bug fix included and in turn one should boot such a linux distro with noefi boot param to ensure that the Samsung laptops with this efi bug cann't be triggered from Linux during that boot.
Is my above understanding correct.
NOTE: I am not sure the linux kernel handles the transition from efi to no efi runtime mode gracefully if noefi is passed as a argument and the system is already in uefi boot mode. But I am assuming for now that the kernel handles this situation properly as well as that it is required to handle this in a specific manner, which it does. This is my assumption currently because I haven't looked into EFI specs at any level currently.
NOTE: A related query I have posted in the ubuntu launchpad tracked Bug related to this.
Also does anyone know when Samsung will release a fixed efi firmware.
測試
Date: 2013-02-27 03:11 pm (UTC)New Samsung BIOS ver P05ABK
Date: 2013-03-01 09:49 pm (UTC)I just cant find release notes regarding the update in samsung site, and this means that i stil dont have the courage to install linux in this machine.
Is there any chance taht this firmware upgrade could correct the error reported?
Re: New Samsung BIOS ver P05ABK
Date: 2013-03-03 10:22 am (UTC)I just hope they fix things before some scriptkiddy with a grudge on Samsung starts exploiting it.
no subject
Date: 2013-03-03 12:30 pm (UTC)no subject
Date: 2013-03-09 06:22 am (UTC)Thinkoad Bricked... related?
Date: 2013-03-13 03:31 am (UTC)I have an Edge E430 (3254-CTO). I was booting Arch on kernel 3.8.2 (CK patchset) when it stalled. This wasn't unusual as there has been major changes in LVM2 with Arch that I have been still trying to figure out. So I manually scanned for the PV when I had a kernel panic.
The kernel panic was really funky looking with very little info having been dumped (maybe 7-10 short jumbled lines). Though this may have been because I was in the initrd still.
After that I was not able to boot whatsoever. No POST or anything. The fan spins like it is going to do something, but nothing after. I even put the optical drive back in so that i could see if it saw that. It spins like it always does when power is applied, but nothing else.
Since you are the only person who seems to be an authority on this particular problem, I am trying to contact you.
Hopefully I will get a reply from you here. I will check periodically. My computer has been sent for repair, but I should contact you at the very least.
Re: Thinkoad Bricked... related?
Date: 2013-03-13 03:32 am (UTC)Re: Thinkoad Bricked... related?
From:Re: Thinkoad Bricked... related?
From: (Anonymous) - Date: 2013-03-13 02:39 pm (UTC) - ExpandCSM Ubuntu
Date: 2013-04-18 11:46 pm (UTC)Can i install Ubuntu on it in the CSM Mode - i will format the whole system- . I hope you can help me. Tanks alot.
Greetings
Re: CSM Ubuntu
Date: 2013-04-26 11:31 am (UTC)i write from ths machine ... Yeah you can, but is a little bit difficult. You must formate the whole Plate. Patch the Bootloader (see wiki), create /boot on SDA (HDD)with EXT2 format.
The rest in EXt4 and enjoy Ubuntu. The only problem is, that the linux kernel does not use your ATI graphic Card ... the hardwareID is not in the current Kernel. But the primary Intel is good enough for work and midlevel simulations.
Greetings
bricked samsung laptop np300 in SA
Date: 2014-06-20 08:00 am (UTC)same issue here, with my NP 300E5C its one of the affected ones,
my one started slowly at first, take long to shut down, start up show no boot devices, start up and immediately switch off, would also show no drive plugged in even though its plugged in, at first I though it was the initial BIOS firmware I put on but then I read more affecting it.
taken it to samsung SA replaced my board, seems to have fixed it temporarily but not too sure how long it will last as its starting its acting up again,
what else can I do to prevent this, I am using win 7 and secure boot off and CSM mode on, UEFI disabled, is their any way to flash a diffrent BIOS firmware instead of the phoenix secure-core tiano BIOS firmware, is there any other BIOS alternatives I can flash that don't have this bug?
no subject
Date: 2021-10-07 07:04 pm (UTC)1. in the motherboard section
Motherboard Slots: 2xPCI Express x1, 1xPCI Express x8, 1xPCI Express x16
1. in the bus section
PCI Express x8 Bus #1 [J6B2]
NVIDIA GeForce GTX 1050 Ti [HP]
[General Information]
Device Name: NVIDIA GeForce GTX 1050 Ti [HP]
Original Device Name: NVIDIA GeForce GTX 1050 Ti (GP107M)
Device Class: VGA Compatible Adapter
Revision ID: A1
PCI Address (Bus: Device:
Function) Number: 1:0:0
PCI Latency Timer: 0
Hardware ID: PCI\VEN_10DE&DEV_1C8C&SUBSYS_84ED103C&REV_A1
[PCI Express]
Version: 3.0
Maximum Link Width: 16x
Current Link Width: 8x
Maximum Link Speed: 8.0 GT/s
Current Link Speed: 2.5 GT/s
Device/Port Type: PCI Express Endpoint
Slot Implemented: No
Emergency Power Reduction: Not Supported
Active State Power Management (ASPM) Support: L0s and L1
Active State Power Management (ASPM) Status: L0s and L1 Entry
L0s Exit Latency: 256 - 512 ns
L1 Exit Latency: 2 - 4 us
Maximum Payload Size Supported: 256 bytes
Maximum Payload Size: 256 bytes
Resizable BAR Support: Not Supported
[System Resources]
Interrupt Line: N/A
Interrupt Pin: INTA#
Memory Base Address 0 63000000
Memory Base Address 1 50000000
Memory Base Address 3 60000000
I/O Base Address 5 0
[Features]
Bus Mastering: Enabled
Running At 66 MHz: Not Capable
Fast Back-to-Back Transactions: Not Capable
[Driver Information]
Driver Manufacturer: NVIDIA
Driver Description: NVIDIA GeForce GTX 1050 Ti
Driver Provider: NVIDIA
Driver Version: 27.21.14.6627 (GeForce 466.27)
Driver Date: 23-Apr-2021
DCH/UWD Driver: Capable
DeviceInstanceId PCI\VEN_10DE&DEV_1C8C&SUBSYS_84ED103C&REV_A1\4&33ECA368&0&0008
Location Paths PCIROOT(0)#PCI(0100)#PCI(0000)
The PCI Express bus enumeration ends with #3 and the NVMe PCIe SSD controller is not listed.
However, a Hwinfo64 report of a comparable board can be found on the Internet.
There, an NVMe PCIe SSD controller is listed in the Bus section (see below) under PCI Express x4.
PCI Express x4 Bus #4
Samsung NVMe PCIe SSD Controller
[General Information]
Device Name: Samsung NVMe PCIe SSD Controller
Original Device Name: Samsung Electronics NVMe PCIe SSD Controller
Device Class: NVMe Controller
Revision ID: 0
PCI Address (Bus: Device: Function) Number: 4:0:0
PCI Latency Timer: 0
Hardware ID: PCI\VEN_144D&DEV_A808&SUBSYS_A801144D&REV_00
[PCI Express]
Version: 3.0
Maximum Link Width: 4x
Current Link Width: 4x
Maximum Link Speed: 8.0 GT/s
Current Link Speed: 8.0 GT/s
Device/Port Type: PCI Express Endpoint
Slot Implemented: No
Emergency Power Reduction: Not Supported
Active State Power Management (ASPM) Support: L1
Active State Power Management (ASPM) Status: L1 Entry
L0s Exit Latency: >4 us
L1 Exit Latency: 32 - 64 us
Maximum Payload Size Supported: 256 bytes
Maximum Payload Size: 256 bytes
[System Resources]
Interrupt Line: N/A
Interrupt Pin: INTA#
Memory Base Address 0 AD200000
[Features]
Bus Mastering: Enabled
Running At 66 MHz: Not Capable
Fast Back-to-Back Transactions: Not Capable
[Driver Information]
Driver Manufacturer: Standard NVM Express Controller
Driver Description: Standard NVM Express Controller
Driver Provider: Microsoft
Driver Version: 10.0.19041.488
Driver Date: 21-Jun-2006
DeviceInstanceId PCI\VEN_144D&DEV_A808&SUBSYS_A801144D&REV_00\4&CDF9F35&0&00DC
Location Paths PCIROOT(0)#PCI(1B04)#PCI(0000)
My HP Pavilion, on the other hand, does not have a PCI Express x4 slot, but it does have a PCI Express x8 slot. According to the Hwinfo64 report of my HP Pavilion notebook, the graphics card NVIDIA GeForce GTX 1050 Ti (GP107M) sits in the PCI Express x8 slot. Specified for this graphics card is: x4.
Maximum Link Width: 16x
In the Hwinfo64 report of the board used for comparison, the graphics card is attached to the PCI Express x16 bus #1 (see below) as listed in the Bus section, i.e. in the correct place (16 lanes).
PCI Express x16 Bus #1
NVIDIA GeForce GTX 1060 [GIGABYTE]
[General Information]
Device Name: NVIDIA GeForce GTX 1060 [GIGABYTE]
Original Device Name: NVIDIA GeForce GTX 1060 (GP106M/N17P-G1)
Device Class: VGA Compatible Adapter
Revision ID: A1
PCI Address (Bus:Device:Function) Number: 1:0:0
PCI Latency Timer: 0
Hardware ID: PCI\VEN_10DE&DEV_1C20&SUBSYS_16521458&REV_A1
Version: 3.0
Maximum Link Width: 16x
Current Link Width: 16x
Maximum Link Speed: 8.0 GT/s
Current Link Speed: 2.5 GT/s
Device/Port Type: PCI Express Endpoint
Slot Implemented: No
Emergency Power Reduction: Not Supported
Active State Power Management (ASPM) Support: L0s and L1
Active State Power Management (ASPM) Status: L0s and L1 Entry
L0s Exit Latency: 256 - 512 ns
L1 Exit Latency: 8 - 16 us
Maximum Payload Size Supported: 256 bytes
Maximum Payload Size: 256 bytes
[System Resources]
Interrupt Line: N/A
Interrupt Pin: INTA#
Memory Base Address 0 AC000000
Memory Base Address 1 80000000
Memory Base Address 3 90000000
I/O Base Address 5 0
[Features]
Bus Mastering: Enabled
Running At 66 MHz: Not Capable
Fast Back-to-Back Transactions: Not Capable
[Driver Information]
Driver Manufacturer: NVIDIA
Driver Description: NVIDIA GeForce GTX 1060
Driver Provider: NVIDIA
Driver Version: 27.21.14.5167 (GeForce 451.67)
Driver Date: 05-Jul-2020
DCH/UWD Driver: Capable
DeviceInstanceId PCI\VEN_10DE&DEV_1C20&SUBSYS_16521458&REV_A1\4&DAED9F9&0&0008
Location Paths PCIROOT(0)#PCI(0100)#PCI(0000)
In the System Buses section of the SiSoft Sandra report of my Pavilion notebook (see below), there is a warning 1204 at the end. Possibly, the reference there to the higher maximum supported speed is due to the lower number of lanes (8 in the bus instead of 16).
SiSoftware Sandra Systembusse
Systembusse
Schnittstellenversion: 2.30
PCI Busse: 4
PCIe Busse: 3
Systembus
Typ: PCI
Gerätenummer: 0
Multiplikator: 1x
Systembus
Typ: PCIe 3.0 x8 2.5Gbps
Gerätenummer: 1
Multiplikator: 3x
Bridge: Intel Core6 (Skylake) PCIe Controller (x16)
Systembus
Typ: PCIe 3.0 x1 2.5Gbps
Gerätenummer: 2
Multiplikator: 3x
Bridge: Intel ICH300 (Cannon Point) PCI Express Root Port #14
Gerät mit Anschluss verbunden: HP RTL8168/8111 PCI-E Gigabit Ethernet NIC
Systembus
Typ: PCIe 3.0 x1 2.5Gbps
Gerätenummer: 3
Multiplikator: 3x
Bridge: Intel ICH300 (Cannon Point) PCI Express Root Port #16
Gerät mit Anschluss verbunden: HP RTS522A PCI Express Card Reader
Leistungstipps
Warning 1204: Speed is below the maximum supported speed. The device may be in power-saving mode.
Tip 3: Press Enter or double-click on a tip to find out more information.
If there is a possibility to disconnect the NVIDIA graphics card electrically, I could unscrew the notebook, do the disconnection and boot the Windows partition (In the default setting, another graphics card is active, after all). The hardware initialization should then assign the NVMe SSD to the correct port, since the NVIDIA graphics card is no longer present.
What could be the reason that the PCI Express x16 Bus is not listed by HWINFO in the BUS section of the HP notebook and why the NVIDIA Graphics card is assigned to the PCI Express x8 Bus of the HP notebook instead of the PCI Express x16 Bus?
Thank you very much.