Check getrandom() status and potential problems
Look into potential regressions caused by the getrandom() fix for https://security-tracker.debian.org/tracker/CVE-2018-1108.
Related bugs (some mostly as examples, but some have fixes we might want):
- plymouth: https://bugs.debian.org/897572
- slow boot blocking startup of the GNOME session: https://bugs.debian.org/897632
arc4random_buf()bug that affects many programs: https://bugs.debian.org/898088
- some services cannot start: https://bugs.debian.org/897599, https://bugs.debian.org/897917
- util-linux 2.32 should help: https://bugs.debian.org/897572#184
- the Fixing Linux getrandom() in stable thread on debian-devel@ has more examples, context and explanations
#5 Updated by intrigeri over 1 year ago
- Description updated (diff)
- Status changed from Confirmed to In Progress
I've compared boot time between devel and 3.8 (tested in the same VM, with no virtio-rng device). The test suite runs in about the same time (and it enables no virtio-rng device either). devel takes ~10% longer to boot from syslinux to the Greeter; this minor difference might be explained away by an outdated SquashFS sort file (not very convincing though). In all my tests I've been careful not to use input devices, which could skew the results by making the rng init faster.
FTR on devel, I see the "random: crng init done" message in the Journal as soon as haveged has started, i.e. ~15s after the kernel was loaded; immediately after that, the "Show Plymouth Boot Screen" service is started. This matches the time when I see the Plymouth splash screen appear but may be purely coincidental as the splash screen is shown at the exact same time on stable.
So at first glance, we're not affected. Given haveged was mentioned as a possible workaround for this problem, I guess it's what saves us here.
It might be interesting to test if applying https://salsa.debian.org/debian/plymouth/commit/3f818f2f3e8ccf5789f53a293f0eb439733704a4#567819342d86b4fd114233c980808d6fb7281046_108_110 on devel changes anything to boot time but at this point, I'm inclined to just close this ticket as resolved and spend my time on actual problems instead :)
#7 Updated by CyrilBrulebois over 1 year ago
- Status changed from In Progress to Resolved
I've just read through your conclusions, and they look rather solid to me.
I was about to mention the
fc-cache thing in plymouth[*] that could help, but as you mentioned, having haveged kick in during the boot sequence should make sure we don't hit this issue in practice.
As you mentioned, if we have timings reported by test suite, and if they don't double or so all of a sudden, there's not much to fix; it we get such a behaviour later on, we already know where we should start digging…
Marking as resolved accordingly.
[*] I had to borrow for use in the debian-installer build, to avoid similar issues. We were getting a black screen until many fontconfig timeouts happen, due to the font cache getting built at run-time, which leverages
getrandom(), retries a few times with a delay before falling back to using
/dev/*random. (I should follow up on https://debamax.com/blog/2018/05/25/debugging-black-screen-in-debian-installer/ by the way. ;))