Project

General

Profile

Bug #16224

Black screen after the boot menu with Intel GPU (i915)

Added by goupille 2 months ago. Updated 21 days ago.

Status:
Resolved
Priority:
Elevated
Assignee:
-
Category:
Hardware support
Target version:
Start date:
12/13/2018
Due date:
% Done:

100%

QA Check:
Pass
Feature Branch:
Type of work:
End-user documentation
Blueprint:
Starter:
Affected tool:

Description

Several users reported that since upgrading to 3.11, Tails no longer boot, displaying an empty black screen after the boot menu in normal mode, and in troubleshooting mode it ends up with the following message :

Error starting GDM with your graphics card: Intel Corporation Core Processor Integrated Graphics Controller [8086:0046] (rev 02).

Adding xorg-driver=intel to the startup options does not solve the issue.


Related issues

Related to Tails - Bug #16145: Upgrade Linux to 4.18.20 Resolved 11/22/2018
Related to Tails - Bug #16447: Regression on some Intel GPU (Braswell, Kaby Lake) In Progress 02/08/2019
Blocks Tails - Feature #15941: Core work 2018Q4 → 2019Q2: Technical writing Confirmed 09/11/2018
Blocks Tails - Feature #15507: Core work 2019Q1: Foundations Team Confirmed 04/08/2018
Blocked by Tails - Bug #16073: Upgrade Linux to 4.19 Resolved 10/25/2018

Associated revisions

Revision 4a9f5562 (diff)
Added by intrigeri 2 months ago

Document workaround for regression on some Intel graphics cards (refs: #16224)

Revision 7608e2ba (diff)
Added by intrigeri 2 months ago

Also document the workaround for some Intel graphics cards on the general graphics issues page (refs: #16224)

This workaround can fix other kinds of present or future issues.
Let's document it so next time we face such problems, we can
identify a suitable workaround faster and more easily.

Revision 2ff903bc (diff)
Added by intrigeri 2 months ago

Document another workaround for Intel graphics cards (refs: #16224)

For some graphics cards, most notably 8086:0046, we force the Intel X.Org driver
(config/chroot_local-includes/usr/share/live/config/xserver-xorg/intel.ids) so
adding only nomodeset is not sufficient: the Intel driver won't work in this
context. So for those, add nomodeset and force X.Org to use the vesa driver,
which it would have automatically picked if we were not forcing the Intel one.

History

#1 Updated by goupille 2 months ago

to be more clear, that's the GPU on the thinkpad x201

#2 Updated by goupille 2 months ago

same issue with

Intel HD Graphics [8086:0046] (rev18)

(intel core i3 - M380)

#3 Updated by goupille 2 months ago

two anonymous users reported the same problem (blankscreen) with the following card :

Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (primary) [8086:2a02] (rev 0c)

#4 Updated by goupille 2 months ago

#5 Updated by goupille 2 months ago

  • Subject changed from Black screen after the boot menu with Intel HD GPU first generation (Westmere) to Black screen after the boot menu with Intel GPU (i915)

#6 Updated by goupille 2 months ago

  • Priority changed from Normal to Elevated

I set the priority to "elevated", given the number of users reporting this

#7 Updated by goupille 2 months ago

  • Assignee changed from intrigeri to CyrilBrulebois

Adding this bug to 3.11's known issues (with an anchor) could be a good thing for us (helpdesk)...

#8 Updated by emmapeel 2 months ago

A user in XMPP reports that going through the links on the Debian report found this patch:

https://patchwork.freedesktop.org/patch/265653/

The user volunteered to test Tails ISO images with the patch on their laptops Thinkpad X200, X200s, T500, T400s, X301, and T400. Tails 3.11 currently doesn't work correctly on any of them.

#9 Updated by CyrilBrulebois 2 months ago

I've just pushed a commit to the master branch adding this under “Known issues”. Feel free to adjust the wording if needed before calling for translations (if that isn't done automatically):

kibi@armor:~/work/clients/tails/tails.git$ git show -- wiki/src/news/version_3.11.mdwn
commit 7523fcc7e35c002e2ebaf4b00660c5dd293d16f4
Author: Cyril Brulebois <ckb@riseup.net>
Date:   Sat Dec 15 14:48:36 2018 +0100

    Document #16224 as a known issue.

    Requested-by: goupille (for frontdesk).

diff --git a/wiki/src/news/version_3.11.mdwn b/wiki/src/news/version_3.11.mdwn
index fd0aa3c806..112f3e3914 100644
--- a/wiki/src/news/version_3.11.mdwn
+++ b/wiki/src/news/version_3.11.mdwn
@@ -50,7 +50,11 @@ For more details, read our [[!tails_gitweb debian/changelog desc="changelog"]].

 # Known issues

-None specific to this release.
+- Tails may fail to start on some computers with Intel graphical
+  hardware: a regression in the i915 Linux kernel module can lead
+  to a black screen when trying to boot this Tails version
+  ([[!tails_ticket 16224]], [[!debbug 914980]]). Users may want
+  to delay upgrading until a solution has been identified.

 See the list of [[long-standing issues|support/known_issues]].

(This only shows the actual change in the MDWN file; PO files were updated as well in the same commit.)

#10 Updated by CyrilBrulebois 2 months ago

I'll check what happened in the upstream (mainline) and downstream (debian) kernels, and see whether I can build a patched kernel and then an ISO, that users could try. This might be material for an emergency release given the prominence of Intel GPUs…

#11 Updated by intrigeri 2 months ago

  • Related to Bug #16145: Upgrade Linux to 4.18.20 added

#12 Updated by CyrilBrulebois 2 months ago

  • Assignee changed from CyrilBrulebois to anonym

No luck upstream, so I've tried to assess the situation on the Debian side, and came up with this suggestion: <https://bugs.debian.org/914980#50>

My patch against upstream had this commit message:

Revert "drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5" 

This reverts commit 06e562e7f515292ea7721475950f23554214adde.

v4.18.20 regresses at least on gen4 as seen in these bug reports:
  https://bugs.freedesktop.org/108850
  https://bugs.freedesktop.org/108984
  https://bugs.debian.org/914980
  https://redmine.tails.boum.org/code/issues/16224

This patch landed in various drm-intel branches but hasn't found its way
to linux-4.18.y yet:
  https://patchwork.freedesktop.org/patch/265653/

Trying to apply it on top of v4.18.20 triggers several conflicts, so it
seems safer to just revert what seems to be the culprit, as confirmed by
a user reporting this revert fixes the problem for them, and by this
part of the commit message for the actual fix in drm-intel:

    commit 5179749925933575a67f9d8f16d0cc204f98a29f
    Author: Chris Wilson <chris@chris-wilson.co.uk>
    Date:   Tue Dec 4 14:15:16 2018 +0000

        drm/i915: Allocate a common scratch page
    […]
        Fixes: 06e562e7f515 ("drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5") # v4.18.20
        Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108850
    […]

Signed-off-by: Cyril Brulebois <cyril@debamax.com>

As the person responsible for releasing 3.11 with this regression, I was meaning to get a fix against the Debian package to get a test ISO built with it, so that testers could report whether the patch was doing its job. The end goal being finding a short-term solution, which would let us contemplate the feasibility of an emergency release.

Since then, I've learnt that 4.18.y is EOL, that the next upload to Debian is going to be 4.19-based anyway, and that the obvious way to deal with such an issue is to revert to the previous kernel, instead of building our own kernel…

Reassigning to anonym for input (as RM and as 4.18.20 merge submitter). I'm fine with doing the needed release work (if we end up doing an emergency release) once a solution has been found, but also fine with letting someone else handle it if that's not desirable.

In the meanwhile, I'll be dealing with the last post-release (3.11) steps, which I had postponed for personal reasons; sorry for the breakages in Jenkins etc. in the meanwhile (#16226).

#13 Updated by intrigeri 2 months ago

  • Category set to Hardware support
  • Assignee changed from anonym to intrigeri
  • Priority changed from Elevated to High
  • Target version set to Tails_3.12

Dear kibi,

As the person responsible for releasing 3.11 with this regression,

I greatly appreciate the work you've put into this: it feels very good to see that there are people around to tackle such issues when I'm away! Thank you :)

I think I understand why you've felt personally responsible:

  • You were the one who pushed the big red "release" button.
  • In other projects (e.g. Debian), dealing with such post-release fallout is on the release managers' plate, be it formally or de facto.

Now, I'd like to provide another perspective:

  • I've started preparing a timeline for a blameless postmortem process and tl;dr: we've been very unlucky and nobody did anything very wrong. Whoever pushed the big red "release" button is a mere detail in a long series of unfortunate events, coincidences, small mistakes, and missing info/communication that lead to releasing with this regression.
  • In Tails, the RM is not responsible for the actual code that's in the release nor for such regressions. Dealing with such fallout is the FT's job. In this specific case, well, perhaps you were the only active FT person last week (I, for one, was not) so it did not make a big difference in practice. But I think it's worth clarifying that you did not have to handle this with your RM hat on: it's great that you did it (and please report this work as part of your FT work!) but that's not part of the expectations.

[…] and that the obvious way to deal with such an issue is to revert to the previous kernel

First I'll quickly try to find a workaround we could document so that affected users can use Tails 3.11 and we don't have to put out an emergency release. That would be ideal but I'm not holding my breathe. Help desk, if anyone already did this work, please tell me what's known to not work.

Then if I find no workaround, I'll investigate the possibility of downgrading the kernel: on Saturday I only took a very quick look at the CVEs fixed by the kernel upgrade brought by Tails 3.11 and at first glance it did not seem too unreasonable to downgrade; but I need to look a bit closer before concluding that this is a viable option. If it turns out it's not, then building our own kernel might be the best option on the table (upgrading to 4.19 should be considered too though).

I'm fine with doing the needed release work (if we end up doing an emergency release) once a solution has been found, but also fine with letting someone else handle it

Excellent :)

Cheers!

#14 Updated by intrigeri 2 months ago

#15 Updated by intrigeri 2 months ago

  • Type of work changed from Research to End-user documentation

First I'll quickly try to find a workaround we could document so that affected users can use Tails 3.11

I could reproduce this bug on a X200 (8086:2a42 rev 07, for which we don't force the intel X.Org driver in config/chroot_local-includes/usr/share/live/config/xserver-xorg/intel.ids).

Then I've tested some workarounds:

  • modprobe.blacklist=i915: OK (native resolution, vesa driver)
  • nomodeset: OK (native resolution, vesa driver)
  • nofb: crash in early boot
  • modprobe.blacklist=i915 xorg-driver=intel: GDM fails to start
  • nomodeset xorg-driver=intel: GDM fails to start
  • nofb xorg-driver=intel: crash in early boot

So I'll document the workaround that's easiest to type: nomodeset.

Help desk, please ask affected users to add the nomodeset option in the boot menu and report back if that's enough to fix their problem. I expect that on some hardware, Tails won't work as well as usual but I hope it'll at least start and fulfil basic needs; expected issues: sluggish graphics performance (in particular with high screen resolutions), smaller resolution than the native one.

Unless we get reports that this workaround is not sufficient on a broad set of hardware, that'll be good enough and we don't need to put out an emergency release (which is good because all our options have issues: either kernel security regressions, or non-trivial initial dev costs + increased maintenance costs, or big risk of introducing other regressions).

#16 Updated by intrigeri 2 months ago

  • Status changed from Confirmed to In Progress

#17 Updated by intrigeri 2 months ago

  • Assignee changed from intrigeri to sajolida
  • % Done changed from 0 to 50
  • QA Check set to Ready for QA

Hi sajolida! I've documented the workaround, trying to stick to the style we use in similar text elsewhere. Please review the 2 commits listed in "Associated revisions" above, that I've pushed straight to master given the pretty bad impact and scope of this regression. Thanks in advance!

(BTW, somewhat off-topic: https://tails.boum.org/doc/first_steps/startup_options/#boot_menu does not tell that in the Boot Loader Menu, the keyboard layout is US QWERTY. I suspect it'll make it hard for many users to add the options we document here and there. If you agree, happy to check if we already have a ticket about that and file one if not. I guess that we could include a picture of a US QWERTY keyboard layout on that page.)

#18 Updated by goupille 2 months ago

the workaround doesn't solve the issue with the Ironlake-Arrandale GPU

Intel Corporation Core Processor Integrated Graphics Controller [8086:0046] (rev 02)

#19 Updated by mercedes508 2 months ago

goupille wrote:

the workaround doesn't solve the issue with the Ironlake-Arrandale GPU

[...]

Received at least 4 bug reports today confirming this :)

#20 Updated by sajolida 2 months ago

  • Blocks Feature #15941: Core work 2018Q4 → 2019Q2: Technical writing added

#21 Updated by sajolida 2 months ago

  • Assignee changed from sajolida to intrigeri
  • QA Check changed from Ready for QA to Info Needed

Both revisions look really fine!

#22 Updated by intrigeri 2 months ago

  • Assignee changed from intrigeri to mercedes508

goupille wrote:

the workaround doesn't solve the issue with the Ironlake-Arrandale GPU

[...]

Received at least 4 bug reports today confirming this :)

Are they really all on Ironlake-Arrandale?

#23 Updated by intrigeri 2 months ago

intrigeri wrote:

goupille wrote:

the workaround doesn't solve the issue with the Ironlake-Arrandale GPU

[...]

Received at least 4 bug reports today confirming this :)

Are they really all on Ironlake-Arrandale?

Apparently not: I've been forwarded a report that the workaround does not work on 8086:0046 (rev 02) either.

Help desk, please give us aggregated data: ideally the list of affected GPUs, and at least the subset of those where the workaround is reported not to work. Thanks!

#24 Updated by mercedes508 2 months ago

intrigeri wrote:

intrigeri wrote:

goupille wrote:

the workaround doesn't solve the issue with the Ironlake-Arrandale GPU

[...]

Received at least 4 bug reports today confirming this :)

Are they really all on Ironlake-Arrandale?

Apparently not: I've been forwarded a report that the workaround does not work on 8086:0046 (rev 02) either.

Help desk, please give us aggregated data: ideally the list of affected GPUs, and at least the subset of those where the workaround is reported not to work. Thanks!

Hey, well the 4 reports from yesterday are all for 8086:0046 (rev 02) which is the one described by goupille in comment #18 as well or am I missing something?

#25 Updated by intrigeri 2 months ago

Hey, well the 4 reports from yesterday are all for 8086:0046 (rev 02) which is the one described by goupille in comment #18 as well or am I missing something?

Thanks for the clarification :)

If any other GPU is affected, please let us know.

#26 Updated by intrigeri 2 months ago

For GPUs where nomodeset is not enough, try: nomodeset xorg-driver=vesa (we're forcing the intel driver there and that may not work with nomodeset).

#27 Updated by mercedes508 2 months ago

Some basic stats from the last 3 days bug reports:

  • [8086:0046] (rev 02): 7 reports and nomodeset doesn't work.
  • Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07): 1 report & nomodeset works
  • [8086:0046] (rev 12): 4 reports and nomodeset doesn't work
  • [8086:0046] (Rev 18): 1 report and nomodeset doesn't work
  • [8086:2a42] (rev 07): 1 report & nomodeset works

#28 Updated by CyrilBrulebois 2 months ago

In #16226#note-15 we were wondering whether staying at/downgrading to 3.10.1 is documented as a workaround for this bug; if it is, we should keep the image around; otherwise we should delete the relevant files.

#29 Updated by mercedes508 2 months ago

  • Assignee changed from mercedes508 to intrigeri

intrigeri wrote:

For GPUs where nomodeset is not enough, try: nomodeset xorg-driver=vesa (we're forcing the intel driver there and that may not work with nomodeset).

Just got the 2 first positive reports for this workaround on [8086:0046] (rev 2). Will let you know later if I get more.

#30 Updated by intrigeri 2 months ago

  • Assignee changed from intrigeri to mercedes508

intrigeri wrote:

For GPUs where nomodeset is not enough, try: nomodeset xorg-driver=vesa (we're forcing the intel driver there and that may not work with nomodeset).

Just got a first positive report for this workaround on [8086:0046] (rev 2).

Thanks, documented!

#31 Updated by mercedes508 2 months ago

OK so today:

  • [8086:0046] (rev 02): nomodeset xorg-driver=vesa works (5 reports)
  • [8086:0046] (rev 12): nomodeset xorg-driver=vesa doesn't work (1 report)

#32 Updated by mercedes508 about 2 months ago

OK, so it basically works for everyone now, didn't receive reports about nomodeset xorg-driver=vesa not working. Even though people complain a bit a bout the quality of the graphic.

#33 Updated by intrigeri about 2 months ago

  • Assignee changed from mercedes508 to intrigeri
  • QA Check changed from Info Needed to Ready for QA

OK, issue mitigated then. Great! :) Next step: test on a build from the devel branch to confirm that the problem is indeed solved there (without the workarounds).

#34 Updated by intrigeri about 2 months ago

#35 Updated by intrigeri about 2 months ago

#36 Updated by intrigeri about 2 months ago

  • Blocked by Bug #16073: Upgrade Linux to 4.19 added

#37 Updated by intrigeri about 2 months ago

  • Priority changed from High to Elevated

#38 Updated by anonym about 1 month ago

I think we need a blameless postmortem analysis for this issue. It might be as easy as this: whoever did the "bare metal" manual tests didn't do a thorough enough job to catch this serious problem. Of course, whoever did it followed our current instructions so is not their fault, rather our manual tests are clearly insufficient.

We need a bit more rigorous hardware testing when bumping kernels (which should also be done for the merge request's QA, not only release QA since that's a bit late), like a list of very common hardware to test, and asking tails-testers@ for help. Modern Intel GPUs naturally belongs on that list considering that most Intel systems will use it. Which reminds me that we probably should test on AMD hardware since we developers mostly (only?) use Intel hardware so far. And so on.

#39 Updated by intrigeri about 1 month ago

I think we need a blameless postmortem analysis for this issue.

Yes! Feel free to initiate it somewhere (I'd rather privately). A good way to start is to cooperatively build a timeline of facts.

#40 Updated by intrigeri about 1 month ago

  • Status changed from In Progress to Fix committed
  • Assignee deleted (intrigeri)
  • % Done changed from 50 to 100
  • QA Check changed from Ready for QA to Pass

I confirm this is fixed on devel (ThinkPad X200) since #16073 was merged.

#41 Updated by anonym 21 days ago

  • Status changed from Fix committed to Resolved

#42 Updated by mercedes508 12 days ago

  • Related to Bug #16447: Regression on some Intel GPU (Braswell, Kaby Lake) added

Also available in: Atom PDF