[personal profile] mjg59
(Edit to add: this issue is restricted to the mobile SKUs. Desktop parts have very different power management behaviour)

Linux 4.5 seems to have got Intel's Skylake platform (ie, 6th-generation Core CPUs) to the point where graphics work pretty reliably, which is great progress (4.4 tended to lose all my windows every so often, especially over suspend/resume). I'm even running Wayland happily. Unfortunately one of the reasons I have a laptop is that I want to be able to do things like use it on battery, and power consumption's an important part of that. Skylake continues the trend from Haswell of moving to an SoC-type model where clock and power domains are shared between components that were previously entirely independent, and so you can't enter deep power saving states unless multiple components all have the correct power management configuration. On Haswell/Broadwell this manifested in the form of Serial ATA link power management being involved in preventing the package from going into deep power saving states - setting that up correctly resulted in a reduction in full-system power consumption of about 40%[1].

I've now got a Skylake platform with a nice shiny NVMe device, so Serial ATA policy isn't relevant (the platform doesn't even expose a SATA controller). The deepest power saving state I can get into is PC3, despite Skylake supporting PC8 - so I'm probably consuming about 40% more power than I should be. And nobody seems to know what needs to be done to fix this. I've found no public documentation on the power management dependencies on Skylake. Turning on everything in Powertop doesn't improve anything. My battery life is pretty poor and the system is pretty warm.

The best thing about this is the following statement from page 64 of the 6th Generation Intel ® Processor Datasheet for U-Platforms:

Caution: Long term reliability cannot be assured unless all the Low-Power Idle States are enabled.

which is pretty concerning. Without support for states deeper than PC3, Linux is running in a configuration that Intel imply may trigger premature failure. That's obviously not good. Until this situation is improved, you probably shouldn't buy any Skylake systems if you're planning on running Linux.

[1] These patches never went upstream. Someone reported that they resulted in their SSD throwing errors and I couldn't find anybody with deeper levels of SATA experience who was interested in working on the problem. Intel's AHCI drivers for Windows do the right thing, but I couldn't find anybody at Intel who could get any information from their Windows driver team.

Date: 2016-04-13 09:32 pm (UTC)
From: [personal profile] edmonds
By working reliably under 4.5, do you mean you don't have to use any of the i915 module parameter workarounds like enable_rc6=0 ?

Been (mostly) fixed for me.

Date: 2016-04-13 10:41 pm (UTC)
From: [personal profile] gourdcaptain
If it's the issue involving limiting it to C6 or lower, that actually got fixed two weeks back and is just now being pushed to stable kernels - I'm running 4.4.7 no problems on Skylake hardware (Lenovo Yoga 700 (11-inch) with an Intel Core m5 6Y54 cpu) with all C-States enabled and powertop telling me it's spending time in C10 even. (Before 4.4.7 got released and packaged on Arch (3rd party repo, not main yet), I was running 4.6rc1 and rc2 to get this working.)

https://bugzilla.kernel.org/show_bug.cgi?id=109081 - The bug report of the issue.
Edited Date: 2016-04-13 10:52 pm (UTC)

Re: Actually been fixed.

Date: 2016-04-13 10:52 pm (UTC)
From: [personal profile] edmonds
No, IIUC, this is related to *R*C6, which is a power saving state on the GPU. Not C6.

I think this is the actual bug report: https://bugs.freedesktop.org/show_bug.cgi?id=94161.

Re: Actually been fixed.

Date: 2016-04-13 10:58 pm (UTC)
From: [personal profile] gourdcaptain
Huh. Haven't had that issue either (and I've used this system for long stretches of moderate use in the week and a half I've had it). Have had random ACPI related crashes at boot (~50% of the time)unless I increase the wait time in Systemd-boot to ten seconds, weirdly enough, but that's more of a lousy BIOS/UEFI issue (given that on successful boots, it logs a bunch of ACPI table errors in dmesg). (Still trying to figure out how to report that given the kernel panic messages highly vary between crashes, scroll mostly off the screen, and the system completely freezes up after one without letting me use any of the stuff I read about online to capture it.)

Re: Actually been fixed.

Date: 2016-04-14 01:08 am (UTC)
From: (Anonymous)
This actually reeks of a bunch of microcode and firmware issues that got fixed in the last months. Ensure you have microcode 0x73 or later, that's actually a good hint both the microcode and the PCH firmware are not crash-prone buggy crap.

As far as I am concerned, the kernel should refuse to boot on any Skylake box with a BIOS older than 2016 or running a microcode revision earlier than 0x73. That would certainly be a lot more truthful to everyone involved.

If an UEFI update is not available yet from your vendor, ask for your money back. A properly up-to-date UEFI for Skylake with SGX support will have microcode 0x83 or higher. If it has SGX support permanently disabled by UEFI, 0x76 is enough.

Re: Actually been fixed.

Date: 2016-04-14 03:13 am (UTC)
From: [personal profile] gourdcaptain
0x74 microcode, released 3/15/16, and SGX disabled. Unfortunately, I can't flash newer UEFI if they put one out (still the most recent for it as of this posting) because the updater is Windows only (although I did flash the most recent one before wiping the drive, and have a Clonezilla backup of the Windows install if I absolutely have to). At least hopefully there'll eventually be microcode files I can early boot load.

EDIT: At least this is less bad than when I got a Broadwell i7 5700hq laptop last year and the microcode-based TSX issues were so bad I could only boot Fedora 22 for a month stably (it would crash under any load) until MSI (the ones I bought it from) were the first out with a fixed microcode update. (And their updater actually works from the UEFI loading off a USB stick). Intel's just awful anymore, but not like we really have any alternatives, given how bad AMD CPUs are for a lot of things anymore.

EDIT: Seriously, TSX was properly disabled under Haswell for a year at that point! Why was Broadwell shipping with it enabled and faulty? Did they not even check?
Edited Date: 2016-04-14 03:23 am (UTC)

EFI updates

Date: 2016-04-14 07:37 am (UTC)
From: (Anonymous)
FYI: Since EFI can directly run Portable Executables (.exe) you can just drop the .exe from your vendor in your EFI system partition and run it from the EFI menu, no need to boot windows. I've done this on my Dell XPS 13 system multiple times now.

Re: EFI updates

Date: 2016-04-14 07:42 am (UTC)
From: [personal profile] gourdcaptain
That's a thing you can do? I've been digging everywhere on ways to install UEFI updates on this thing, and it hasn't come up. Not that I'm disbelieving you, it just seems amazingly poorly documented. And it's the same EXE update files they have for Windows?

Re: EFI updates

Date: 2016-04-14 09:19 am (UTC)
From: [personal profile] gourdcaptain
Ah then that's probably not going to fly with the Lenovo ones. (In my defender for buying it, finding good 11 inch laptops these days is hard- netbooks have mostly died out in favor of tablets and such. Plus, I needed Skylake for hardware HEVC decoding since nothing in the relatively cheap laptop range is going to have a discrete card that can do that, and 1080p HEVC takes a fair chunk of CPU to decode.)

Re: EFI updates

Date: 2016-04-15 06:34 am (UTC)
From: (Anonymous)
Lenovo provides CD images, that boots and flash firmware without windows.
https://download.lenovo.com/pccbbs/mobiles/n1gur08w.txt

Re: EFI updates

Date: 2016-04-15 07:29 am (UTC)
From: [personal profile] gourdcaptain
Yeah, that's for a model that cost twice as much as the dinky little thing I'm using right now. I'm not seeing one listed for the Yoga 700 11-inch: http://support.lenovo.com/us/en/products/Laptops-and-netbooks/Yoga-Series/yoga-700-11isk?linkTrack=Homepage:Body_Search%20Products&beta=false

Re: EFI updates

Date: 2016-05-01 07:43 am (UTC)
From: [personal profile] gourdcaptain
Just to make google searches for this a bit more helpful, some experimentation later seems to link these random boot crashes to the i2c bus going funky when a bunch of stuff tries to hit it all at once during boot: https://bugzilla.kernel.org/show_bug.cgi?id=105251

Doing a weird workaround involving delaying loading hid_multitouch seems to cut down the boot failures by a fair amount: https://bugzilla.redhat.com/show_bug.cgi?id=1297188#c13
(NOTE: Still seem to happen a fair amount, did some statistical testing but still might be the placebo effect.)
Edited Date: 2016-05-09 08:36 pm (UTC)

Re: EFI updates

Date: 2016-05-09 08:38 pm (UTC)
From: [personal profile] gourdcaptain
There was a new UEFI update for the laptop released a few days ago. Windows install only. So after going to the trouble of installing Windows 10 to an external USB drive to boot off of and run the update (I really didn't feel like repartitioning the drive or having to image and restore it) that actualy worked to do the update.

But it's still 0x74 microcode. (facepalm)

Re: EFI updates

Date: 2016-04-14 07:27 pm (UTC)
From: [personal profile] mikeymop
On the XPS 13, how are you dropping into the efi shell to execute these?
I always imagined they'd have windows software that would hamper this, even if it can run .exe

can you write/share a relevant tutorial for this process, as I will need to do this until Dell publishes the files for fwupdate.

Re: EFI updates

Date: 2016-04-14 07:47 pm (UTC)
From: (Anonymous)
in this generation:
xps 9350: https://secure-lvfs.rhcloud.com/lvfs/device/33773727-8ee7-4d81-9fa0-57e8d889e1fa
precision 5510: https://secure-lvfs.rhcloud.com/lvfs/device/124c207d-5db8-4d95-bd31-34fd971b34f9

Otherwise put the .EXE from support.dell.com on a FAT32 USB key or on the ESP and select flash BIOS from the F12 POST menu.

Re: EFI updates

Date: 2016-04-18 07:56 pm (UTC)
From: (Anonymous)
Thank you,

I found instructions for flashing the .exe, but you gave me .cab files. How can I flash .cabs?

Can still do the .exe method on the 9350?

Re: EFI updates

Date: 2016-04-14 07:31 pm (UTC)
From: [personal profile] mikeymop
I found what he's talking about. It's on the Arch wiki for the curious

http://hgdev.co/install-bios-update-under-linux-on-the-dell-xps-13-9343-2015/

I'll have to try this on my 9350 when I get it tomorrow.
Does anyone know how I can check the microcode version?

Re: EFI updates

Date: 2016-04-14 07:46 pm (UTC)
From: (Anonymous)
9350 has updates at LVFS that can be applied as capsules.
https://secure-lvfs.rhcloud.com/lvfs/device/33773727-8ee7-4d81-9fa0-57e8d889e1fa

Re: Actually been fixed.

Date: 2016-04-15 07:38 pm (UTC)
kensey: (Default)
From: [personal profile] kensey

At least this is less bad than when I got a Broadwell i7 5700hq laptop last year and the microcode-based TSX issues were so bad I could only boot Fedora 22 for a month stably (it would crash under any load) until MSI (the ones I bought it from) were the first out with a fixed microcode update.

Funny you should mention -- my Sager work laptop has an i7-5700HQ, and Fedora 22 runs fine on it, but 23 crashes within seconds to minutes. For now I'm just continuing to run F22, but I also can't run any VMs or containers that contain a libc that tickles the TSX issue or my laptop reboots!

Eventually I'll have to buckle down and figure out how to apply one of the firmware updates floating around out there that supposedly fix this (I think actually the one most commonly used came from MSI's updater), because neither Sager nor Clevo (the hardware OEM) has put out any firmware updates for it, and I don't want to be stuck running F22 past its end-of-support.

Re: Actually been fixed.

Date: 2016-04-15 09:31 pm (UTC)
From: [personal profile] gourdcaptain
Yeah, I had that issue, although luckily I had the MSI laptop those updates came from.
https://github.com/bgw/bdw-ucode-update-tool - Someone's attempt to hack together an updater for those.
Unfortunately, all my experience with messing with microcode packages is on Arch where I can just stick it in my systemd-boot config as another initrd before the main one. Which I know you can do with GRUB as well, it's just GRUB's config files are hilariously complicated, IMHO.

Honestly, the lack of updates is a shame upon your hardware vendor, given that it even affects things under Windows - apparently Office 2016's installer, even.

EDIT: Nothing against Fedora, but I had to get off it as soon as possible because while I like a lot of the stuff it does as a distro, nobody'd packaged Bumblebee and CUDA in a way where you could get both on the same system (and I needed both at the time for work urgently) - all the CUDA packages had a hard dependency on a normal NVIDIA driver install. (Primarily because the guy doing it sees Bumblebee as a "dirty hack" that shouldn't be supported. Okay, buddy, you got any other options for making this hardware work in the meantime?) Arch is the only distro I've found which DOESN'T have a NVIDIA driver dependency for the CUDA package, which is pretty handy for being able to run the CUDA debugger on a laptop remotely connected to your system with an NVIDIA card.
Edited Date: 2016-04-15 09:36 pm (UTC)

Not fixed for yoga 900

Date: 2016-04-16 01:40 pm (UTC)
From: (Anonymous)
On Lenovo Yoga 900 with 4.6.0 rc3, the C-states (beyond C2) seem not to get enabled. Still considerable battery lifetime.

➜ ~ powertop

Package
C2 (pc2) 53.0% |
C3 (pc3) 0.0% |
C6 (pc6) 0.0% |
C7 (pc7) 0.0% |
C8 (pc8) 0.0% |
C9 (pc9) 0.0% |
C10 (pc10) 0.0% |

➜ ~ grep . /sys/devices/system/cpu/cpuidle/*
/sys/devices/system/cpu/cpuidle/current_driver:intel_idle
/sys/devices/system/cpu/cpuidle/current_governor_ro:menu

➜ ~ sudo rdmsr 0xE2
1e008006

➜ ~ dmesg | grep -i error
[ 2.894522] EXT4-fs (sda9): re-mounted. Opts: errors=remount-ro
[ 2.958327] tpm_crb: probe of MSFT0101:00 failed with error -16
[ 3.069000] iwlwifi 0000:01:00.0: Direct firmware load for iwlwifi-8000C-21.ucode failed with error -2
[ 3.069357] iwlwifi 0000:01:00.0: Direct firmware load for iwlwifi-8000C-20.ucode failed with error -2
[ 3.069371] iwlwifi 0000:01:00.0: Direct firmware load for iwlwifi-8000C-19.ucode failed with error -2
[ 3.069380] iwlwifi 0000:01:00.0: Direct firmware load for iwlwifi-8000C-18.ucode failed with error -2
[ 3.069390] iwlwifi 0000:01:00.0: Direct firmware load for iwlwifi-8000C-17.ucode failed with error -2
[ 3.404045] i2c_hid i2c-ITE8396:00: error in i2c_hid_init_report size:19 / ret_size:18

Profile

Matthew Garrett

About Matthew

Power management, mobile and firmware developer on Linux. Security developer at Aurora. Ex-biologist. [personal profile] mjg59 on Twitter. Content here should not be interpreted as the opinion of my employer. Also on Mastodon.

Page Summary

Expand Cut Tags

No cut tags