[personal profile] mjg59
Around a year ago I wrote some patches in an attempt to improve power management on Haswell and Broadwell systems by configuring Serial ATA power management appropriately. I got a couple of reports of them triggering SATA errors for some users, couldn't reproduce them myself and so didn't have a lot of confidence in them. Time passed.

I've been working on power management stuff again this week, so it seemed like a good opportunity to revisit these. I've made a few changes and pushed a couple of trees - one against master and one against 4.5.

First, these probably only have relevance to users of mobile Intel parts in the U or S range (/proc/cpuinfo will tell you - you're looking for a four-digit number that starts with 4 (Haswell), 5 (Broadwell) or 6 (Skylake) and ends with U or S), and won't do anything unless you have SATA drives (including PCI-based SATA). To test them, first disable anything like TLP that might alter your SATA link power management policy. Then check powertop - you should only be getting to PC3 at best. Build a kernel with these patches and boot it. /sys/class/scsi_host/*/link_power_management_policy should read "firmware". Check powertop and see whether you're getting into deeper PC states. Now run your system for a while and check the kernel log for any SATA errors that you didn't see before.

Let me know if you see SATA errors and are willing to help debug this, and leave a comment if you don't see any improvement in PC states.

Awesome!

Date: 2016-04-18 04:01 am (UTC)
From: (Anonymous)
After you mentioned that these hadn't made it upstream in your last post, I was hoping you'd give them another shot. Are you planning to push these upstream? Other than the errors, Tejun's comments seemed to be positive.

just tried...

Date: 2016-04-18 10:10 am (UTC)
From: (Anonymous)
First of all thanks for keep working in this direction! I've applied patches to 4.5.1 and don't notice any difference in powertop:
- cat /sys/class/scsi_host/*/link_power_management_policy
firmware
firmware
firmware
firmware
- powertop indicates non-zero values only for pc2 and pc3.

Last patch might want to be more restrictive

Date: 2016-04-18 11:01 am (UTC)
From: (Anonymous)
The last patch, which changes all Intel SATA controllers to firmware mode, might want to restrict itself to mobile parts, possibly with a cut-date.

IMHO that patch is bound to cause issues on desktops and servers like my supermicro board, which are not sold as a complete system with storage. You can barely trust the firmware to keep things together... hopefully it defaults to alpm disabled, but I wouldn´t trust it that much: these boards never have up-to-date firmware components, from microcode to option ROMs.

SATA ALPM is not going to be nearly as important in non-mobile parts, or on older mobile parts. There is no reason to risk data loss by a *default* configuration change there...

the other patches in the series look good.

T440p

Date: 2016-04-18 11:07 am (UTC)
From: (Anonymous)
>and ends with U or S
Is your patch appropriate for M-series of mobile Haswell processors, like i5-4300M?
I succeeded in achieving pc7 on my T440p with i5-4300M by booting with i915.enable_psr=1 (which is going to be enabled by default in 4.6 kernel anyways) and setting link_power_management_policy to min_power. But currently video driver is unstable in this configuration: https://bugs.freedesktop.org/show_bug.cgi?id=94985

Re: T440p

Date: 2016-04-18 02:48 pm (UTC)
From: (Anonymous)
Do you have both laptop screen and external monitor running? With that configuration I was never able to have a halfway flicker free system not even with a eDP/HDMI connected Monitor.
Try to disable yout internal screen with arandr or xrandr.
Btw. even with no external screen connected I still get occasional short flicker on my laptop screen (ips) with psr enabled but only when idle. At least my system goes into its lowest possible pc7 state for hightly changing 10 or 20% :(

Re: T440p

From: (Anonymous) - Date: 2016-04-18 03:45 pm (UTC) - Expand

Re: T440p

From: (Anonymous) - Date: 2016-04-18 03:53 pm (UTC) - Expand

Date: 2016-04-18 02:44 pm (UTC)
From: (Anonymous)
I too have a t440p with a i7-4600M. Too bad those patches won't do anything on mobile full voltage cpus. Or do they?

Date: 2016-04-18 04:06 pm (UTC)
From: (Anonymous)
i5-4210Y here. The reverse is true here, my Haswell processor is even lower voltage than the U series. Am I affected?

Regarding HKEY patch

Date: 2016-04-18 07:32 pm (UTC)
From: (Anonymous)
Hi, Matthew! Sorry for contacting you here - I cannot find any of your email addresses. So, my question is about this patch http://www.gossamer-threads.com/lists/linux/kernel/1901769. Why it was not merged? This is essential to allow us to use most functional keys on new Thinkpads. Here is a bug as well: https://bugzilla.kernel.org/show_bug.cgi?id=114731
Thank you!

Re: Regarding HKEY patch

From: (Anonymous) - Date: 2016-04-18 10:22 pm (UTC) - Expand

Firmware mode doesn't help

Date: 2016-04-18 07:53 pm (UTC)
From: [personal profile] maleadt
Testing on a Thinkpad T440s (i7-4600U) the firmware-stashed settings seem unhelpful: whereas on 4.4 both min and medium_power allow entering pc7 up to 90% of the time, 4.5 patched + firmware mode puts the system back in pc2 all the time. Can't seem to spot a difference between medium_power mode on 4.4 or 4.5 patched.

Haven't seen SATA errors with the new medium_power mode, but those aren't easy to trigger (I definitely get errors once in a while using min_power, most of the time /dev/sda falling of the bus).

Date: 2016-04-18 08:01 pm (UTC)
From: (Anonymous)
Is there any indication that testing SATA link power management on Intel's desktop platforms would be of use? I have Haswell and other desktop platforms and a near-comprehensive library of SATA SSDs. I've tested their ability to enter low-power modes but haven't subjected them to any sustained use with link power management enabled, so I've never encountered those errors, but I'd be willing to try.
From: (Anonymous)
I tried your patches despite my CPU is neither U nor S. Turning tlp off had a very bad effect on my power consumption. It stayed at 23W jumping to 33W and finally rested on 16W. CPU PC3 states dropped to 0%, PC2 reduced to 10%. Apparently firmware did a very bad job configuring my system for powersave.

I reconfigured tlp to use medium_power policy for SATA and started the daemon. The power consumption returned to my usual 14W. I see my usual 70% PC2 and 15% PC3 states.

I could see no change whatsoever, but I'm willing to help with debugging.
I see no SATA errors at all in my journal.
From: (Anonymous)
I have additionally checked the CPU PC states of my notebook under Windows 10. The package doesn't go below pc2 - i.e. it is worse than under linux.

Re: Thinkpad T540p i7-4700MQ didn't show any improvement

From: (Anonymous) - Date: 2016-04-19 11:42 am (UTC) - Expand

Re: Thinkpad T540p i7-4700MQ didn't show any improvement

From: (Anonymous) - Date: 2016-04-19 01:44 pm (UTC) - Expand

Re: Thinkpad T540p i7-4700MQ didn't show any improvement

From: (Anonymous) - Date: 2016-04-19 02:07 pm (UTC) - Expand

No deep C-states on 4.6-git

Date: 2016-04-19 12:32 pm (UTC)
From: (Anonymous)
Using "firmware" as the LPM policy on the 4.6-git kernel, I only get PC2 on my ThinkPad T450s (Core i5-5200U). "medium_power" or "min_power" both unlock C6 (I have never seen the package hit C7 on this hardware)

On 4.5 though, "firmware" does get C6. I personally haven't seen any SATA errors either before or now, even on min_power.

Re: No deep C-states on 4.6-git

Date: 2016-04-19 12:40 pm (UTC)
From: (Anonymous)
Might be related to this bug:
https://bugzilla.kernel.org/show_bug.cgi?id=115771#c106

Re: No deep C-states on 4.6-git

From: (Anonymous) - Date: 2016-04-19 11:19 pm (UTC) - Expand

Re: No deep C-states on 4.6-git

From: (Anonymous) - Date: 2016-04-20 03:30 am (UTC) - Expand

Re: No deep C-states on 4.6-git

From: (Anonymous) - Date: 2016-04-20 11:24 am (UTC) - Expand

Lenovo T550 - i5-5200U

Date: 2016-04-19 07:18 pm (UTC)
From: (Anonymous)
With this patchset over 4.6-rc4, I get no improvement in firmware mode over max_performance (no less then PC2).
Switching to min_power allows me to reach PC7.
Maybe Lenovo is very conservative on this setting in their BIOSes?

Re: Lenovo T550 - i5-5200U

From: (Anonymous) - Date: 2016-04-20 05:54 pm (UTC) - Expand

Re: Lenovo T550 - i5-5200U

From: (Anonymous) - Date: 2016-04-24 09:01 am (UTC) - Expand

Re: Lenovo T550 - i5-5200U

From: (Anonymous) - Date: 2016-04-24 11:32 am (UTC) - Expand

Re: Lenovo T550 - i5-5200U

From: (Anonymous) - Date: 2016-04-20 06:12 pm (UTC) - Expand

NUC results

Date: 2016-04-19 07:38 pm (UTC)
From: (Anonymous)
NUC with i5-4250U - doesn't get into PC anything without the patch. Shows consistently 90%+ in pc2 after the patch with firmware policy, nothing in pc2+. No errors in brief testing.

Only reaching package C3

Date: 2016-04-20 07:33 pm (UTC)
From: (Anonymous)
Hello Matthew,

could you give a little advice? Haswell system here (T540p), every "experimental" power-saving knob that I know of is turned on (incl. pcie_aspm=powersave, link_power_management_policy=min_power, etc), but I'm not getting any deeper than package C3 state. Yet all cores are in C7 (as reported by powertop).

How can I detect the "offending" device?

Re: Only reaching package C3

From: (Anonymous) - Date: 2016-04-21 11:15 pm (UTC) - Expand

Re: Only reaching package C3

From: (Anonymous) - Date: 2016-04-21 11:34 pm (UTC) - Expand

Re: Only reaching package C3

From: (Anonymous) - Date: 2016-04-22 01:02 am (UTC) - Expand

Date: 2016-04-20 11:06 pm (UTC)
From: (Anonymous)
I am currently trying your patches on a Broadwell machine (ThinkPad T450s). I am normally running RC kernels, up until now 4.6rc2. I've compiled your 4.6rc3 patch locally, installed, and booted into it.

After booting up mjg59-4.6-rc3 my /sys/class/scsi_host/*/link_power_management_policy reads min_power. If I 'echo -n firmware | sudo tee */link_power_management_policy', it does not complain, and catting the files shows that the driver is switched.

While previously before I have been able to enter pc6, I can now only maximally enter pc2.

If I switch them back to min_power, the machine will resume going into pc6 state. This seems broken to me. I'm going to go back to my rc kernels until these issues are fixed.

Please let me know if you'd like more information, or you'd like me to pull and run additional changes.

----
bkero

(no subject)

From: (Anonymous) - Date: 2016-04-22 07:12 am (UTC) - Expand

(no subject)

From: (Anonymous) - Date: 2016-04-28 08:30 am (UTC) - Expand

no difference for me

Date: 2016-04-21 10:48 pm (UTC)
From: [personal profile] cmichal
On a dell xps13 I don't see that this makes anything any better. I could get into pc7 without the patches as long as I set the link power management to min_power. With the patches, and 'firmware' pc2 is the best it will do.

One thing that drives me a bit crazy on this system is that the actual power consumption drops significantly following a suspend/resume cycle. After resume from hibernate, the best power consumption will be ~3.5-4W, but if I then do a suspend/resume, that will drop to ~2.5W. This seems to happen with or without the patches. It looks like the firmware is tweaking some knob that powertop doesn't know about. Any ideas on how to find that knob would be appreciated.

NVMe?

Date: 2016-04-22 01:31 am (UTC)
From: (Anonymous)
By "PCI-based SATA" do you mean that the patches are also relevant to NVMe?

Broadwell X1 Carbon

Date: 2016-04-22 01:32 pm (UTC)
From: (Anonymous)
Thanks for posting this!

I've been running these patches for most of this week, and haven't seen any SATA errors in dmesg.

I've got an X1 Carbon with a i7-5600U.

I've run `powertop --auto-tune`, and when /sys/class/scsi_host/*/link_power_management_policy is set to firmware, I don't see the package drop below C2. If I set link_power_management_policy to medium_power, I do see significant time spent in C6 and (mostly) C7.

i5-6200U, no success

Date: 2016-04-25 09:37 am (UTC)
From: (Anonymous)
http://pastebin.com/aM6SgBNU

I'm using LG Gram 14 with Linux 4.5.2.

Using "min_power" allowed me to enter pc3, but after the patch and echoing "firmware", it goes no higher than pc2.

:(

Lenovo X250 i7-5600U, Samsung 850 Evo

Date: 2016-04-25 11:34 am (UTC)
From: (Anonymous)
System didn't go to anything beyond C2 with "firmware", goes down to C6 with "min_power". Note that the SSD is not original. Fingerprint and Smartcard readers are disabled via firmware, Bluetooth is off and USB autosuspends enabled etc., I haven't tweaked the SD card reader.

Kernel Version Linux version 4.5.2-custom+
System Name LENOVO20CLS06D00ThinkPad X250
CPU Information 4 Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz

Thanks for your effort!

Nitpick

Date: 2016-04-27 02:58 pm (UTC)
From: (Anonymous)
Your rule of thumb regarding the the first digit of the four-digit number applies to Core i3/i5/i7 and Core M/m3/m5/m7 but only to some Xeon and to no Pentium, Celeron processors.

http://ark.intel.com/products/codename/42174/Haswell
http://ark.intel.com/products/codename/38530/Broadwell
http://ark.intel.com/products/codename/37572/Skylake

Thank you for all your work!

PC6

Date: 2016-04-29 08:45 pm (UTC)
From: (Anonymous)
Device: Dell Inc. XPS 13 9343/0TM99H, BIOS A07 11/11/2015

I patched your changes into arch linux's linux-ck version package 4.5.2.

With /sys/class/scsi_host/*/link_power_management_policy = firmware I get down to PC6.
With /sys/class/scsi_host/*/link_power_management_policy = min_power I get down to PC7

nothing on t460s, i5-6200U

Date: 2016-05-30 08:36 pm (UTC)
From: [personal profile] maaax
I tested your 5 commits applied to current 4.7.0-rc1+ but regardless of the policy only pc2 is possible. The good: no sata-errors.

Tell me if I can test anything or if you need more infos.
Edited Date: 2016-05-30 08:59 pm (UTC)

Re: nothing on t460s, i5-6200U

Date: 2016-08-29 04:32 pm (UTC)
From: [personal profile] maaax
After upgrading the IME firmware, pc7 is now possible on my t460s

Macbook Pro continuous errors

Date: 2016-09-21 02:06 am (UTC)
From: [personal profile] lkcl
hiya matthew,

i'm tied to using 3.16 at the moment due to having a rootfs partition that's overgrown to the point where literally nothing will fit (certainly not another linux kernel) - i have to do a total OS wipe and reformat which i'm putting off.

macbook pros are known for having continuous (once per second) errors like this:

[915140.664521] ata1: SError: { PHYRdyChg }
[915140.664524] ata1: hard resetting link
[915141.387232] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[915141.387657] ata1.00: unexpected _GTF length (8)
[915141.388159] ata1.00: unexpected _GTF length (8)
[915141.388203] ata1.00: configured for UDMA/33
[915141.388275] ata1: EH complete
[915141.488383] ata1: exception Emask 0x10 SAct 0x0 SErr 0x10000 action 0xe frozen
[915141.488388] ata1: irq_stat 0x00400000, PHY RDY changed
[915141.488390] ata1: SError: { PHYRdyChg }
[915141.488393] ata1: hard resetting link
[915142.211871] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[915142.212285] ata1.00: unexpected _GTF length (8)
[915142.212786] ata1.00: unexpected _GTF length (8)
[915142.212831] ata1.00: configured for UDMA/33
[915142.212897] ata1: EH complete
[915142.313041] ata1: exception Emask 0x10 SAct 0x0 SErr 0x10000 action 0xe frozen
[915142.313044] ata1: irq_stat 0x00400000, PHY RDY changed
[915142.313045] ata1: SError: { PHYRdyChg }
[915142.313048] ata1: hard resetting link
[915143.036521] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[915143.036921] ata1.00: unexpected _GTF length (8)
[915143.037424] ata1.00: unexpected _GTF length (8)
[915143.037507] ata1.00: configured for UDMA/33
[915143.037570] ata1: EH complete
[915250.985136] ata1.00: exception Emask 0x10 SAct 0x1e000000 SErr 0x4040000 action 0xe frozen
[915250.985140] ata1.00: irq_stat 0x80000040, connection status changed
[915250.985142] ata1: SError: { CommWake DevExch }
[915250.985145] ata1.00: failed command: WRITE FPDMA QUEUED
[915250.985150] ata1.00: cmd 61/00:c8:d0:ff:53/04:00:0b:00:00/40 tag 25 ncq 524288 out
res 40/00:c4:70:6c:52/00:00:07:00:00/40 Emask 0x10 (ATA bus error)
[915250.985152] ata1.00: status: { DRDY }
[915250.985154] ata1.00: failed command: WRITE FPDMA QUEUED
[915250.985157] ata1.00: cmd 61/00:d0:d0:03:54/04:00:0b:00:00/40 tag 26 ncq 524288 out
res 40/00:c4:70:6c:52/00:00:07:00:00/40 Emask 0x10 (ATA bus error)
[915250.985159] ata1.00: status: { DRDY }
[915250.985161] ata1.00: failed command: WRITE FPDMA QUEUED
[915250.985164] ata1.00: cmd 61/00:d8:d0:07:54/04:00:0b:00:00/40 tag 27 ncq 524288 out
res 40/00:c4:70:6c:52/00:00:07:00:00/40 Emask 0x10 (ATA bus error)
[915250.985166] ata1.00: status: { DRDY }
[915250.985167] ata1.00: failed command: WRITE FPDMA QUEUED
[915250.985170] ata1.00: cmd 61/e8:e0:d0:0b:54/03:00:0b:00:00/40 tag 28 ncq 512000 out
res 40/00:c4:70:6c:52/00:00:07:00:00/40 Emask 0x10 (ATA bus error)
[915250.985171] ata1.00: status: { DRDY }
[915250.985175] ata1: hard resetting link
[915251.709502] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

pretty soon /var/log/syslog is a couple of gigabytes in length.

this is "solved" by running:

echo min_power > /sys/class/scsi_host/host0/link_power_management_policy


*however*....

the problem is that any kind of suspend/resume, or even removal of the power cord, or insertion of the power cord, or removal of a USB device, or insertion of a USB device, or basically *anything* event-driven... results in the policy being changed without authorisation.

what i've had to do is to add a cron script which simply sets that policy to min_power literally every minute.

it's a fairly ridiculous situation.

Profile

Matthew Garrett

About Matthew

Power management, mobile and firmware developer on Linux. Security developer at Google. Ex-biologist. @mjg59 on Twitter. Content here should not be interpreted as the opinion of my employer.

Expand Cut Tags

No cut tags