Project

General

Profile

Bug #12218

AMD graphics regression since Tails 2.10

Added by intrigeri over 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Hardware support
Target version:
-
Start date:
02/09/2017
Due date:
% Done:

100%

QA Check:
Feature Branch:
Type of work:
Research
Blueprint:
Starter:
Affected tool:

Description

This was reported against Tails 2.10 on #11850. According to https://bugs.debian.org/838858 and its duplicates, this should be fixed in firmware-amd-graphics 20161130-1. IMO we should upgrade to the latest version of that package in Tails 2.11.


Related issues

Related to Tails - Bug #11850: Only software rendering (no hardware acceleration) on some AMD GPUs since 2.4 Resolved 09/28/2016
Blocked by Tails - Feature #12122: Upgrade Linux to 4.9 Resolved 12/27/2016
Blocked by Tails - Bug #12298: Can't build the devel branch due to virtualbox-guest-dkms incompatibility with Linux 4.9 Resolved 03/05/2017

Associated revisions

Revision 2b1a9a33 (diff)
Added by intrigeri about 2 years ago

Include the amdgpu module in the initramfs (refs: #12218).

This might help ensure this driver is used, instead of the radeon module,
on hardware that it supports better than the radeon one.

Revision dffc0b2c (diff)
Added by intrigeri about 2 years ago

Document problem and workaround wrt. AMD Radeon R9 390 (Closes: #12218).

History

#1 Updated by intrigeri over 2 years ago

  • Related to Bug #11850: Only software rendering (no hardware acceleration) on some AMD GPUs since 2.4 added

#2 Updated by intrigeri over 2 years ago

  • Status changed from Confirmed to In Progress
  • % Done changed from 0 to 10
  • Feature Branch set to bugfix/12218-upgrade-amd-graphics-firmware

... and actually, I'll upgrade all packages built from src:firmware-nonfree: it's been in Stretch since 1.5 months, fixes a number of hardware compatibility problems and did not introduce any RC bug. Also, our import-package script works this way.

#3 Updated by intrigeri over 2 years ago

  • Assignee changed from intrigeri to anonym
  • % Done changed from 10 to 50
  • QA Check set to Ready for QA

anonym: please review'n'merge.

Help desk: please point people who experience this problem to the corresponding nightly built ISO, and report back here.

Resulting diff between .packages (stable vs. this branch):

 file    1:5.22+15-2+deb8u3
 file-roller    3.14.1-1
 findutils    4.4.2-9+b1
-firmware-amd-graphics    20160824-1~bpo8+1
-firmware-atheros    20160824-1~bpo8+1
+firmware-amd-graphics    20161130-2
+firmware-atheros    20161130-2
 firmware-b43-installer    1:019-3
 firmware-b43legacy-installer    1:019-3
-firmware-brcm80211    20160824-1~bpo8+1
-firmware-intel-sound    20160824-1~bpo8+1
-firmware-ipw2x00    20160824-1~bpo8+1
-firmware-iwlwifi    20160824-1~bpo8+1
-firmware-libertas    20160824-1~bpo8+1
-firmware-linux    20160824-1~bpo8+1
+firmware-brcm80211    20161130-2
+firmware-intel-sound    20161130-2
+firmware-ipw2x00    20161130-2
+firmware-iwlwifi    20161130-2
+firmware-libertas    20161130-2
+firmware-linux    20161130-2
 firmware-linux-free    3.4
-firmware-linux-nonfree    20160824-1~bpo8+1
-firmware-misc-nonfree    20160824-1~bpo8+1
-firmware-realtek    20160824-1~bpo8+1
-firmware-ti-connectivity    20160824-1~bpo8+1
+firmware-linux-nonfree    20161130-2
+firmware-misc-nonfree    20161130-2
+firmware-realtek    20161130-2
+firmware-ti-connectivity    20161130-2
 firmware-zd1211    1:1.5-4
 florence    0.6.2-2
 fontconfig    2.11.0-6.3+deb8u1

... as expected.

#4 Updated by Wurd over 2 years ago

Thanks!

Is this the correct ISO to test? https://nightly.tails.boum.org/build_Tails_ISO_bugfix-12218-upgrade-amd-graphics-firmware/lastStable/archive/latest.iso

I have tested that, and it doesn't work. It freezes before I can do anything, directly after the mouse cursor appears for the first time. Tested a few times, but all I see is a black cursor on a grey rectangle with no text, while the background is blue.

#5 Updated by intrigeri over 2 years ago

Is this the correct ISO to test? https://nightly.tails.boum.org/build_Tails_ISO_bugfix-12218-upgrade-amd-graphics-firmware/lastStable/archive/latest.iso

Yes, assuming you downloaded it in the last 4 days. Generally, please prefer the ISO image from the build-artifacts sub-directory (https://nightly.tails.boum.org/build_Tails_ISO_bugfix-12218-upgrade-amd-graphics-firmware/lastStable/archive/build-artifacts/) that has a more useful filename :)

I have tested that, and it doesn't work. It freezes before I can do anything, directly after the mouse cursor appears for the first time. Tested a few times, but all I see is a black cursor on a grey rectangle with no text, while the background is blue.

Is that an improvement or a regression compared to how Tails 2.10 behaves?

What's the exact graphics hardware?

#6 Updated by intrigeri over 2 years ago

  • QA Check changed from Ready for QA to Info Needed

#7 Updated by Wurd over 2 years ago

Yes, I have downloaded it roughly 1 hour before I commented here, so within the last 4 days.

Tails 4.10 "worked", meaning it didn't freeze. It runs with software rendering, so it has 1 fps and is quite unusable, but it does not freeze, so in theory you could use it.

With this ISO its not even possible to get to the desktop, because it just freezes. It does not lag, because before any UI could be displayed, it freezes and a hard reset it needed. I would not consider it an improvement :)

My GPU is a AMD R9 390 with 8GB VRAM.

#8 Updated by intrigeri about 2 years ago

  • Assignee changed from anonym to intrigeri
  • Target version changed from Tails_2.11 to Tails_2.12
  • % Done changed from 50 to 10
  • QA Check deleted (Info Needed)
  • Feature Branch deleted (bugfix/12218-upgrade-amd-graphics-firmware)

I got no data from our help desk, and so far the only test report about the proposed branch tells me it's actually introducing a regression, so I don't think it would be wise to merge it in a point release.

Tails 2.12 we will upgrade at least three other parts of the graphics stack:

  • the firmware this ticket is about (our pinning pulls them from jessie-backports, that now has 20161130-2~bpo8+1)
  • Linux 4.9 (#12122)
  • mesa 13.0.4-1~bpo8+1

So I'll come back to it after the 2.11 release, once #12122 is done, and will ping everyone affected + help desk to try and gather some data wrt. how the upgraded graphics stack behaves.

#9 Updated by intrigeri about 2 years ago

#10 Updated by intrigeri about 2 years ago

  • Blocked by Bug #12298: Can't build the devel branch due to virtualbox-guest-dkms incompatibility with Linux 4.9 added

#11 Updated by intrigeri about 2 years ago

  • % Done changed from 10 to 20

Anyone affected, please try an ISO from https://nightly.tails.boum.org/build_Tails_ISO_devel/lastSuccessful/archive/build-artifacts/ and report back how it behaves compared to 2.11 and the last 3.0~beta.

Help desk: please relay the above to whoever is affected. Thanks!

#12 Updated by intrigeri about 2 years ago

Help desk: please relay the above to whoever is affected. Thanks!

... and it would be excellent if one of you took the lead on this info gathering, and assigned this ticket to themselves (at this point it makes little sense to leave it assigned to me).

#13 Updated by Kaffka2 about 2 years ago

Tested with 'tails-i386-devel-2.12-20170308T1018Z-9f9da8a.iso', seems to be working for my 'ATI Radeon R9 390 (Fury)' card, correct driver is loaded and desktop as well as video decoding is running smoothly.

Scrolling in firefox seems to be a bit 'sluggish', but I think I had the same minor gripe 'back in the days' with ubuntu too, so this is probably unrelated, although tor/firefox is running fine in my main system (arch-linux) nowadays.

TL;DR:
All in all I'd say its fixed. ;)

#14 Updated by intrigeri about 2 years ago

Kaffka2 wrote:

Tested with 'tails-i386-devel-2.12-20170308T1018Z-9f9da8a.iso', seems to be working for my 'ATI Radeon R9 390 (Fury)' card, correct driver is loaded and desktop as well as video decoding is running smoothly.

Great news, thanks!

I'll wait for more test results from users / help desk before I call this fixed in 2.12.

#15 Updated by Wurd about 2 years ago

I have tested both 2.12 and 3.0.

With tails-i386-devel-2.12-20170311T0923Z-969031a.iso, the issue is the exact same like with 2.11. Tails starts until you see the mouse cursor and the blue background, I can move the cursor for ~0.5 seconds, and then everything freezes and I have to hard reset the PC. So no improvement unfortunately.

With tails-amd64-3.0-beta2.iso, the first time I tried I was able to boot into Tails, could log in and use the desktop. It was all smooth, so it used the GPU correctly. The issue seemed to be fixed.
Then I restarted Tails, and it no longer worked. It boots until the mouse cursor is visible (on a black background), then I can move the mouse cursor for a few seconds, and then Tails seems to shut itself down. The monitors all turn off, so I think Tails completely stops doing anything. So I have to hard reset the PC too. I've tried it 8 or 9 times more then, also tried completely turning power off for a while and wait before I launch Tails again, but every time now I only got until I saw the mouse cursor and then Tails shut down, I have to hard reset the PC and so on.

So for some reason it worked once, and then it stopped working.

I am using a AMD R9 390, so the same like Kaffka2. Just that Kaffka2 wrote "R9 390 (Fury)", such a card does not exist. Either he has a "AMD R9 390" (same like I have) or he has a "AMD R9 Fury", those are different GPUs.

#16 Updated by Kaffka2 about 2 years ago

I'm teribly sorry, for some reason I have stuck 'R9 390 Fury' stuck in my head, Wurd is right it is actually the 'ATI Radeon R9 Fury' (This one [[https://www.amazon.de/dp/B019DMB3QK?m=A3JWKAKR8XB7XF&tag=idealode-am-pk-21&ascsubtag=uvZ49X_5DIYVuzvvptKW4A]]

I also tested with 'tails-amd64-3.0-beta2.iso' over the weekend, which was also working for my R9 Fury.

#17 Updated by Wurd about 2 years ago

I have also tested tails-amd64-3.0-beta3.iso now, no difference to beta2.

#18 Updated by Wurd about 2 years ago

Is there anything else I can test to help you fix this issue, like a way to get any info about why Tails freezes? Are there logfiles somewhere that can be accessed later (after a restart)?

#19 Updated by intrigeri about 2 years ago

I'll first sum up what we know already. Compared to Tails 2.11, current devel branch upgrades:

  • firmware-amd-graphics to 20161130-2~bpo8+1
  • Linux to 4.9.13
  • mesa to 13.0.5-1~bpo8+1
  • xserver-xorg-video-amdgpu to 1.2.0-1~bpo8+1

This was already the case when Wurd and Kaffka2 tested and reported back above:

  • R9 Fury (Kaffka2): works
  • R9 390 (Wurd): "the issue is the exact same like with 2.11. Tails starts until you see the mouse cursor and the blue background, I can move the cursor for ~0.5 seconds, and then everything freezes and I have to hard reset the PC"

And on 3.0~beta2 and 3.0~beta3, the R9 390 worked once but next boots exposed similar symptoms as current devel branch.

#20 Updated by intrigeri about 2 years ago

  • Subject changed from AMD graphics regression caused by wrong/missing firmware to AMD graphics regression since Tails 2.10

#21 Updated by intrigeri about 2 years ago

Possible future improvements:

  • mesa 13.0.6 (in testing/sid, not in jessie-backports) has some bugfixes that might be related; this can be tested with a nightly build of feature/stretch-unfrozen
  • Linux 4.9.18: currently in Debian unstable; if the above doesn't work, I'll prepare an ISO with this update
  • libdrm 2.4.76, Linux 4.10.x, mesa 17.0.2: currently in Debian experimental; if none of the above works, I'll prepare an ISO with this updates

Other things worth testing:

  • add debug to the kernel command line, hoping it will produce more useful error messages

Now, there are two other potential issues:

#22 Updated by intrigeri about 2 years ago

intrigeri wrote:

Possible future improvements:

mesa 13.0.6-1~bpo8+1 was uploaded to jessie-backports today (at my request). It's stuck in https://ftp-master.debian.org/backports-new.html for now, should be accepted within a week... hopefully before the Tails 2.12 freeze.

#23 Updated by Wurd about 2 years ago

Thanks!

mesa 13.0.6 (in testing/sid, not in jessie-backports) has some bugfixes that might be related; this can be tested with a nightly build of feature/stretch-unfrozen

I have tested that, it behaves like the Tails 3.0 beta. Its starting until the mouse cursor is visible, then all monitors turn off and it looks like tails is completely dead.

add debug to the kernel command line, hoping it will produce more useful error messages

I'm not really seeing anything different from adding "debug" there, it shows some text for a second before the 3 point animation starts, but after that everything is same.

these 2 ISOs with amdgpu in the initramfs:
https://nightly.tails.boum.org/build_Tails_ISO_feature-stretch-unfrozen/lastSuccessful/archive/build-artifacts/
https://nightly.tails.boum.org/build_Tails_ISO_feature-stretch/lastSuccessful/archive/build-artifacts/

The first iso you linked there is the exact same iso like the one you linked above (the mesa 13.0.6 one).

I have tested the second one, and it has the exact same issue, no difference there.

adding xorg-driver=amdgpu radeon.blacklist=yes on the kernel command line
adding xorg-driver=radeon amdgpu.blacklist=yes on the kernel command line

I have not yet tested that, because I'm not sure, with which iso should I test that?

There are quite a few different isos I could test it with.

In the 4.4 to 4.10 isos, the issue is that all the graphics run on the CPU only, so everything is laggy, but it runs and doesn't crash.

In the 4.11 isos the problem is that once the mouse cursor is visible, it freezes and needs to be reset.

And in the 3.0 beta isos and all the new isos you linked 2 days ago, Tails seems to completely turn off after the mouse cursor is visible, so it doesnt freeze but the monitors don't get any signal any more.

Which of these issues should those command line things fix, so with which iso should I test it?

#24 Updated by intrigeri about 2 years ago

Hi,

Wurd:

mesa 13.0.6 (in testing/sid, not in jessie-backports) has some bugfixes that might be related; this can be tested with a nightly build of feature/stretch-unfrozen

I have tested that, it behaves like the Tails 3.0 beta. Its starting until the mouse cursor is visible, then all monitors turn off and it looks like tails is completely dead.

OK, thanks. Too bad.

these 2 ISOs with amdgpu in the initramfs:
https://nightly.tails.boum.org/build_Tails_ISO_feature-stretch-unfrozen/lastSuccessful/archive/build-artifacts/
https://nightly.tails.boum.org/build_Tails_ISO_feature-stretch/lastSuccessful/archive/build-artifacts/

The first iso you linked there is the exact same iso like the one you linked above (the mesa 13.0.6 one).

Right.

I have tested the second one, and it has the exact same issue, no difference there.

OK. I wasn't too hopeful, but well…

adding xorg-driver=amdgpu radeon.blacklist=yes on the kernel command line
adding xorg-driver=radeon amdgpu.blacklist=yes on the kernel command line

I have not yet tested that, because I'm not sure, with which iso should I test that?

https://nightly.tails.boum.org/build_Tails_ISO_feature-stretch-unfrozen/lastSuccessful/archive/build-artifacts/

In the 4.4 to 4.10 isos, the issue is that all the graphics run on the CPU only, so everything is laggy, but it runs and doesn't crash.

Assuming you mean "2.4 to 2.10": so, it seems the problem you see is different from what this ticket was initially about (that one was about regressions since 2.10).

In the 4.11 isos the problem is that once the mouse cursor is visible, it freezes and needs to be reset.

Assuming you mean "2.11": the only relevant change between Tails 2.10 and 2.11 is the upgrade of the Linux kernel from 4.8.11-1~bpo8+1 to 4.8.15-2~bpo8+2. Interesting!

The only relevant changes between 4.8.11 and 4.8.15 I can find in the changelog are:

- [x86] drm/amdgpu: fix power state when port pm is unavailable
- drm/radeon: fix power state when port pm is unavailable
- [x86] drm/amdgpu: fix check for port PM availability
- drm/radeon: fix check for port PM availability

So perhaps it would be relevant testing different values for:

  • radeon.dpm, radeon.aspm, radeon.runpm and radeon.bapm (with xorg-driver=radeon)
  • amdgpu.dpm, amdgpu.aspm, amdgpu.runpm and amdgpu.bapm (with xorg-driver=amdgpu)

And in the 3.0 beta isos and all the new isos you linked 2 days ago, Tails seems to completely turn off after the mouse cursor is visible, so it doesnt freeze but the monitors don't get any signal any more.

Which of these issues should those command line things fix, so with which iso should I test it?

Let's now focus on the 3.0~betaN ones, since 2.12 (that will be frozen in a few days) will likely be the last release on the 2.x series.

It would be interesting to see if you can reproduce this on regular (non-Tails) Debian Stretch. This would make is vastly easier to report bugs upstream and have them fixed there :)

#25 Updated by intrigeri about 2 years ago

Also, if you can, please try plugging your monitor(s) into different ports.

#26 Updated by Wurd about 2 years ago

It probably makes a lot of sense to test regular Debian Stretch, I weren't able to find any live ISOs of Debian Stretch, everything I found has to be installed. Could you link me to some Debian Stretch ISO that I can test without having to install it?

#27 Updated by intrigeri about 2 years ago

It probably makes a lot of sense to test regular Debian Stretch, I weren't able to find any live ISOs of Debian Stretch, everything I found has to be installed. Could you link me to some Debian Stretch ISO that I can test without having to install it?

There are none AFAIK, sorry :/

#28 Updated by intrigeri about 2 years ago

  • Target version changed from Tails_2.12 to Tails_3.0

Wurd: sorry if I was unclear, but even if you can't test regular Debian Stretch, I'm still waiting for your results wrt. the other tests I've requested.

#29 Updated by intrigeri about 2 years ago

  • Assignee deleted (intrigeri)
  • Priority changed from Elevated to Normal
  • Target version deleted (Tails_3.0)

There's been no word from help desk about this kind of issues since months, so I'll assume that the affected hardware is rather rare. Adjusting priority and target version accordingly. Dear help desk: please let the Foundations Team know if you have more or different info.

I did my homework and the maintainer for mesa in Debian was kind enough to have a look. Here's a slightly edited version of his summary:

  • The R9 Fury (aka. Fiji, member of Volcanic Islands Family, GCN 1.2) which is only supported by the amdgpu stack seems to be fixed.
  • The R9 390 (also known as Hawaii, member of Sea Islands (CIK) Family, GCN 1.1 [1][2]) is only supported by the radeon stack by default, so enabling amdgpu can't break support for this hardware, as long as CONFIG_DRM_AMDGPU_SI=Y and CONFIG_DRM_AMDGPU_CIK=Y are not enabled (otherwise the card would be supported by both drivers and radeon would have to be blacklisted).
  • The remaining issue for the R9 390 sounds like a GPU hang, could be related to kernel, firmware, mesa, llvm or some other package like xserver-xorg-core because it's using glamor for 2D acceleration via OpenGL (3D). Generally It's hard to say more without seeing some logs (dmesg and Xorg.log).

[1] https://wiki.freedesktop.org/xorg/RadeonFeature/
[2] https://en.wikipedia.org/wiki/AMD_Radeon_Rx_300_series

So I think that forcing the amdgpu driver (xorg-driver=amdgpu) and its various options won't be useful as the corresponding support for the R9 390 is not in our kernel anyway.

But testing different values for radeon.dpm, radeon.aspm, radeon.runpm and radeon.bapm (with xorg-driver=radeon) can still be useful.

And ideally, affected people would redirect the Journal to another machine on the LAN, so that we get logs of what's going on. Otherwise I feel we're fiddling in the dark and won't ever be able to request proper help from those who could actually fix the problem => unassigning from me for now as I've already spent more than 3 hours about it, and really have no clue how I can help more without logs.

#30 Updated by intrigeri about 2 years ago

I forgot, in case only the GUI crashes, but not the entire system, you can gather logs this way: set a root password on the kernel command-line with rootpw=YOUR_PASSWORD. Do this only for testing purposes as that's unsafe: all processes will be able to learn about that password. This should allow you to log in as oot via a text console (CTRL+ALT+F3), and save the output of journalctl -a to another USB stick, so you can share it here.

#31 Updated by intrigeri about 2 years ago

  • Type of work changed from Code to Research

#32 Updated by intrigeri about 2 years ago

Another way to gather logs: try the latest Fedora release live CD (or, even better: Fedora Rawhide live CD if there's any such thing), gather logs, and report this upstream so they have all the info they need to improve things / suggest you a workaround.

#33 Updated by Wurd about 2 years ago

Thanks very much integri!

So, I tested Fedora Rawhide, same issue like in Tails. I tested Fedora 25 too, also doesn't work.
I tested Debian 8.7.1, that works perfectly without any issues.

Then I went to Tails 3.0 beta 3 and tested with the arguments you mentioned. I first tested with "xorg-driver=amdgpu" just to see what happens, and it just stops the booting after a while, doesn't crash though.

I've continued with "xorg-driver=radeon radeon.dpm=1", the same issue as always, so no difference.
Then I've tested "xorg-driver=radeon radeon.dpm=0", and that booted to the Tails greeter without any issues. I logged in, and all graphics were smooth without any issues. I even started a 1080p youtube video, and that played fine. All monitors worked fine too.
I restarted Tails and did the same again, still worked. I shut my PC down, turned power off, turned it on again, did the same again, still worked without any issue.

Then I tested it without the command again, and it failed with the usual issue.

Then I was interested in how Tails 2.12 RC1 would work with "xorg-driver=radeon radeon.dpm=0", so I tested it. It also boots fine, has smooth graphics on the desktop. I've tried the same youtube video again, and that looked laggy now, probably something like 10 fps.

So 2.12 is not as good as 3.0, but it works, its smooth and it works :) 3.0 is perfect.

So I'm happy now, it would be good if Tails would set that argument by default if it detects a Hawaii GPU. Not an issue to just enter it manually, if you know that you have to do it. Most affected users will likely not find this website though, so they won't know about that fix.

So again, thank very much for your effort! :) If I should test anything else, just tell me.

#34 Updated by intrigeri about 2 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 20 to 100

#35 Updated by intrigeri about 2 years ago

So, I tested Fedora Rawhide, same issue like in Tails. I tested Fedora 25 too, also doesn't work.

Very useful: at least we know that even the latest version of relevant software (mostly the Linux kernel I guess) is affected.

Then I've tested "xorg-driver=radeon radeon.dpm=0", and that booted to the Tails greeter without any issues. I logged in, and all graphics were smooth without any issues. I even started a 1080p youtube video, and that played fine. All monitors worked fine too.
[...]
Then I was interested in how Tails 2.12 RC1 would work with "xorg-driver=radeon radeon.dpm=0", so I tested it. It also boots fine, has smooth graphics on the desktop.

Great news! After confirming this via web searches:

I've just documented this on our Known Issues page. So case closed as far as Tails is concerned.

… but I was not able to find any proper upstream bug report about it.

So I'm happy now, it would be good if Tails would set that argument by default if it detects a Hawaii GPU.

We're not in the business of applying Tails-specific quirks that require patching the Linux kernel, sorry :/
So next step to anyone who wants to see this fixed where it should be: please gather enough info and report the bug to the Linux kernel upstream.

Thanks for all your testing, glad we found a workaround :)

#36 Updated by intrigeri about 2 years ago

intrigeri wrote:

So next step to anyone who wants to see this fixed where it should be: please gather enough info and report the bug to the Linux kernel upstream.

… although perhaps https://www.x.org/wiki/RadeonFeature/#index11h2 is a better place to start than the Linux kernel.

Also, this might be magically fixed once https://bugs.debian.org/847570 is resolved and this card is supported by the amdgpu driver instead of the radeon one, who knows.

#37 Updated by intrigeri over 1 year ago

intrigeri wrote:

Also, this might be magically fixed once https://bugs.debian.org/847570 is resolved and this card is supported by the amdgpu driver instead of the radeon one, who knows.

This was fixed in the last upload of Linux in Debian. We should have it in Tails 3.4.

Also available in: Atom PDF