Project

General

Profile

Bug #9012

Network is sometimes not unblocked post-Greeter in Jessie

Added by intrigeri over 4 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Elevated
Assignee:
-
Category:
Spoof MAC
Target version:
Start date:
03/04/2015
Due date:
% Done:

50%

Feature Branch:
bugfix/9012-unblock-network
Type of work:
Research
Blueprint:
Starter:
Affected tool:

Description

On an ISO built today from feature/jessie at a48119d41f0083b22210b59c62ebace02ab9c7f0, after logging in the only network adapter I can see in ip a is lo. If I manually run /usr/local/sbin/tails-unblock-network in a root terminal in the GNOME session, then the network comes up.

I cannot reproduce this with an ISO from a week ago, built from 6e49989443c85c3cfb69fefac6b4534f601f2e96. The Git diff doesn't ring a bell, so likely the culprit is one of:

  • systemd/udev upgrade from 215-11 to 215-12
  • Linux upgrade from 3.16.7-ckt4-3 to 3.16.7-ckt7-1

journalctl-al.log View (1.41 MB) anonym, 06/12/2015 06:26 PM

journalctl-al-2.log View (1.17 MB) anonym, 12/21/2015 10:29 AM

dconf-dump (3.05 KB) anonym, 12/21/2015 10:34 AM

tails-debugging-info (1.37 MB) anonym, 12/21/2015 10:34 AM


Related issues

Blocked by Tails - Feature #10583: Release a first Jessie-based beta Resolved 11/19/2015

Associated revisions

Revision 271e43fa (diff)
Added by intrigeri almost 4 years ago

tails-unblock-network: sleep a bit after unblocking network devices, and before triggering udev events.

That's ugly but I wasn't able to find anything better. We'll see if it fixes
the problem everywhere, or if it merely makes it less frequent..

Closes: #9012

Revision fb534852 (diff)
Added by intrigeri over 3 years ago

tails-unblock-network: sleep a bit longer between deleting the blacklist, and triggering udev.

This might help cure refs: #9012. It should not slow down the login process as
long as we run tails-unblock-network in the background from PostLogin.

Revision 3d732f77 (diff)
Added by intrigeri over 3 years ago

tails-unblock-network.service: touch /etc/modprobe.d/ after deleting the blacklist.

This might help, in case there's some aufs bug/optimization that prevents the
parent directory to be seen as updated by whatever udev uses to watch it.
In particular, I'm wary of the effects of the noxino option we're using.

refs: #9012

Revision 744ad738 (diff)
Added by intrigeri over 3 years ago

Increase logging for tails-unblock-network.

refs: #9012

Revision 6a712dca (diff)
Added by intrigeri over 3 years ago

Add known issues.

refs: #10576, #10807, #9012

Revision b0a2d99d
Added by anonym over 3 years ago

Merge remote-tracking branch 'origin/bugfix/9012-unblock-network' into devel

Fix-committed: #9012

Revision 30431d85 (diff)
Added by intrigeri over 3 years ago

Re-add lost known issues.

refs: #10576, #10807, #9012

Commit 6a712dca8fd8fcfd113034504a7a31f6707e106e was lost somehow,
probably in a buggy merge resolution.

History

#1 Updated by intrigeri over 4 years ago

  • Description updated (diff)

#2 Updated by intrigeri over 4 years ago

  • Description updated (diff)

#3 Updated by intrigeri over 4 years ago

  • Subject changed from Network is not unblocked post-Greeter in Jessie to Network is sometimes not unblocked post-Greeter in Jessie
  • Assignee set to intrigeri

Well, now I can't reproduce this failure anymore. I suspect there's a race condition somewhere. I should look into the hacks I had to do to adapt MAC spoofing / network unblocking to Jessie and systemd: there were quite a few bugfixes in this area in Jessie's systemd since then, and perhaps our hacks can now be simplified, and are causing this bug in their current state.

#4 Updated by anonym about 4 years ago

I've seen this now in a build from commit: 6ff01bd. I've attached the output of journalctl -al, as requested over IRC.

#5 Updated by intrigeri almost 4 years ago

I think I've just reproduced it with the automated test suite: no NM icon appeared.

#6 Updated by intrigeri almost 4 years ago

anonym wrote:

I've seen this now in a build from commit: 6ff01bd. I've attached the output of journalctl -al, as requested over IRC.

That log shows that NM has successfully acquired a DHCP lease, and everything seems to be working, so I don't understand how it is related to this bug => anonym, next time you see it, please send the same log again, and explain a bit more what makes you think it's this bug. Thanks!

#7 Updated by intrigeri almost 4 years ago

I've just seen it. No time / too lazy to debug, but running udevadm trigger was enough to unblock the network. Redmine won't let me upload the Journal, so I'll keep it around locally.

#8 Updated by intrigeri almost 4 years ago

intrigeri wrote:

I've just seen it. No time / too lazy to debug, but running udevadm trigger was enough to unblock the network.

/bin/udevadm trigger --type=devices --action=add is actually enough to unblock the network. Note that tails-unblock-network exits successfully when the unblocking fails.

Potentially relevant pieces of log (during the time when tails-unblock-network is running, but doesn't actually unblock the network:

  • sh[2222]: Failed to get realtime timestamp: Cannot assign requested address -- that's in tails-restricted-network-detector.service, which shouldn't affect the next operations
  • systemd[1]: Failed to reset devices.list on /system.slice: Invalid argument, but this seems to be neglictible

#9 Updated by intrigeri almost 4 years ago

  • Status changed from Confirmed to Resolved
  • % Done changed from 0 to 100

#10 Updated by intrigeri almost 4 years ago

  • Assignee deleted (intrigeri)

#11 Updated by intrigeri over 3 years ago

  • Status changed from Resolved to In Progress
  • % Done changed from 100 to 50

I've seen it again :/

#12 Updated by intrigeri over 3 years ago

  • Assignee set to intrigeri

#13 Updated by intrigeri over 3 years ago

I've not seen it since my Jessie sprint has started. I could not reproduce it today when booting from USB on: ThinkPad X61, ThinkPad X200, MacBook Pro 8,1 13-inch. At this point I think it'll have to wait for the first beta release of Tails 2.0, or until we see it again in the test suite.

#14 Updated by intrigeri over 3 years ago

#15 Updated by anonym over 3 years ago

I saw it today on an image built from the same commit as 2.0-beta1.

#16 Updated by anonym over 3 years ago

anonym wrote:

I saw it today on an image built from the same commit as 2.0-beta1.

See attached file for the output of journalctl -al (just noticed that -l is the default nowadays). Random things to notice:

  • at Dec 21 15:55:30 the GNOME desktop has started and I run sudo su in gnome-terminal
  • at Dec 21 15:57:38 (or slightly before) I run tails-unblock-network manually
  • it seems I got a default GNOME Shell desktop (there's no panel at the bottom, the Activities button instead of Applications) so GNOME Classic failed some how
  • this was on a Tails 1.8 USB installation upgraded (Upgrade by cloning) to Tails/Jessie 2.0~beta1
  • I had enabled read-write persistence (migrated from Tails 1.8)

#17 Updated by anonym over 3 years ago

More debugging info attached.

#18 Updated by anonym over 3 years ago

After a lot of rebooting, roughly half with persistence enabled and half with it disabled (obviously) I have seen a few cases of the failure in the former case, but none whatsoever in the latter. Coincidence, or could enabling persistence makes the unblocking fail more often? OTOH, I'm pretty sure I did not use persistence when I had the issue six months ago (see command 4).

Just a thought: in the Greeter's PostLogin script we start tails-unblock-network in the background => in parallel with the GNOME login process. Perhaps the udev triggering (which seem to cause devices to be removed and then re-added) is not such a great idea at this critical time for GNOME? Perhaps we should remove the "&"?

#19 Updated by intrigeri over 3 years ago

anonym wrote:

  • it seems I got a default GNOME Shell desktop (there's no panel at the bottom, the Activities button instead of Applications) so GNOME Classic failed some how

I've seen that occasionally too (but much later post-login, like hours). I wonder if our custom value for enabled-extensions (that lacks the extensions enabled by Classic mode) is at fault.

#20 Updated by intrigeri over 3 years ago

I looked at journalctl-al-2.log in details.

The network adapter is: 00:03.0 Ethernet controller: Red Hat, Inc Virtio network device.

Relevant bits of logs:

Dec 21 15:51:51 localhost.localdomain kernel: input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input2
Dec 21 15:51:51 localhost.localdomain kernel: ACPI: Power Button [PWRF]

Dec 21 15:52:22 amnesia systemd-logind[1814]: Watching system buttons on /dev/input/event1 (Power Button)

Dec 21 15:52:24 amnesia gdm-Xorg-:0[1926]: (II) config/udev: Adding input device Power Button (/dev/input/event1)
Dec 21 15:52:24 amnesia gdm-Xorg-:0[1926]: (**) Power Button: Applying InputClass "evdev keyboard catchall" 

Dec 21 15:53:39 amnesia gdm-session-worker[3502]: Running /usr/local/sbin/tails-unblock-network...
Dec 21 15:53:39 amnesia gdm-session-worker[3502]: + systemctl start tails-unblock-network.service

[at this point, the blacklist file is supposed to have been deleted]

Dec 21 15:53:39 amnesia gdm-session-worker[3502]: + echo Sleeping...
Dec 21 15:53:39 amnesia gdm-session-worker[3502]: Sleeping...
Dec 21 15:53:39 amnesia gdm-session-worker[3502]: + sleep 2

Dec 21 15:53:42 amnesia gdm-session-worker[3502]: systemd-udev-trigger.service restarted.
Dec 21 15:53:42 amnesia gdm-session-worker[3502]: Restarting systemd-udev-settle.service...
Dec 21 15:53:42 amnesia gdm-session-worker[3502]: + systemctl restart systemd-udev-settle.service

Dec 21 15:53:42 amnesia gdm-Xorg-:0[1926]: (II) config/udev: removing device Power Button
Dec 21 15:53:42 amnesia systemd-logind[1814]: Watching system buttons on /dev/input/event1 (Power Button)
Dec 21 15:53:42 amnesia gdm-Xorg-:0[1926]: (II) evdev: Power Button: Close
Dec 21 15:53:42 amnesia gdm-Xorg-:0[1926]: (II) UnloadModule: "evdev" 
Dec 21 15:53:42 amnesia gdm-Xorg-:0[1926]: (II) config/udev: Adding input device Power Button (/dev/input/event1)

Dec 21 15:53:42 amnesia gdm-session-worker[3502]: systemd-udev-settle.service restarted.

Dec 21 15:53:42 amnesia gdm-session-worker[3502]: + systemctl enable NetworkManager.service NetworkManager-wait-online.service

Dec 21 15:53:42 amnesia gdm-session-worker[3502]: + systemctl start NetworkManager.service NetworkManager-dispatcher.service
Dec 21 15:53:42 amnesia systemd[1]: Failed to reset devices.list on /system.slice: Invalid argument

Dec 21 15:53:42 amnesia kernel: input: spice vdagent tablet as /devices/virtual/input/input6

Dec 21 15:55:30 amnesia sudo[4819]: amnesia : TTY=pts/0 ; PWD=/home/amnesia ; USER=root ; COMMAND=/bin/su
Dec 21 15:55:30 amnesia sudo[4819]: pam_unix(sudo:session): session opened for user root by amnesia(uid=0)

[user runs tails-unblock-network manually]

[nothing in the Journal for two minutes, strange?]

[Dec 21 15:57:38 (or slightly before), user runs tails-unblock-network manually]

Dec 21 15:57:38 amnesia gdm-Xorg-:0[1926]: (II) config/udev: removing device Power Button
Dec 21 15:57:38 amnesia systemd-logind[1814]: Watching system buttons on /dev/input/event1 (Power Button)

[and here comes the network adapter, finally:]

Dec 21 15:57:38 amnesia kernel: virtio-pci 0000:00:03.0: irq 45 for MSI/MSI-X
Dec 21 15:57:38 amnesia kernel: virtio-pci 0000:00:03.0: irq 46 for MSI/MSI-X
Dec 21 15:57:38 amnesia kernel: virtio-pci 0000:00:03.0: irq 47 for MSI/MSI-X

Dec 21 15:57:38 amnesia spoof-mac[5361]: Trying to spoof MAC address of NIC eth0...

So in short, deleting the blacklist file + sync in tails-unblock-network.service, then sleeping 2 seconds and triggering udev, was not enough for udev and the kernel to pick up the network adapter.

From here, here's what I would like to do for 2.0~rc1:

  • I'm preparing a branch that increases logging in tails-unblock-network, and makes udevd pass modprobe the --verbose option. I'd like to see it merged into devel, so that next time someone sees this bug I get more information. In particular, I'm curious to know if modprobe virtio_net is run at all, the first time we (unsuccessfully) trigger udev.
  • I think it's not too risky to bet that there's a race condition between the time we delete the blacklist file, and the time we trigger udev. Raising the sleep time (urgl) might help, and should not slow down the login process as long as we run tails-unblock-network in the background from PostLogin. I'll try to do that for 2.0~rc1.

Longer-term:

  • I suspect that an aufs bug prevents udev from picking up the change in /etc/modprobe.d; if it's the case, then switching to overlayfs (#8415) will fix that. I've looked into the aufs documentation quite a bit, searched for information wrt. whether unlink/rm is atomic. I haven't found anything conclusive. I'm concerned about our usage of noxino: "If you do not want to use xino, use noxino mount option. Use this option with care, since the inode number may be changed silently and unexpectedly anytime. For example, rmdir failure, recursive chmod/chown/etc to a large and deep directory or anything else. And some applications will not work correctly."
  • We're using a somewhat old kernel (3.16), I wonder if there might be a bug somewhere, that would be fixed by upgrading (#10298).

#21 Updated by intrigeri over 3 years ago

  • Assignee changed from intrigeri to anonym
  • QA Check set to Ready for QA
  • Feature Branch set to bugfix/9012-unblock-network

Hi,

I've done what I said (above) I would do for 2.0~rc1. Please review'n'merge, but I doubt it fixes the problem for real, so don't mark as fix committed, just reassign to me for 2.2.

#22 Updated by anonym over 3 years ago

  • Status changed from In Progress to Fix committed
  • % Done changed from 50 to 100

#23 Updated by anonym over 3 years ago

  • Assignee deleted (anonym)
  • QA Check changed from Ready for QA to Pass

Commits make sense. Let's hope they make a difference!

#24 Updated by intrigeri over 3 years ago

  • Status changed from Fix committed to In Progress
  • Assignee set to intrigeri
  • Target version changed from Tails_2.0 to Tails_2.2
  • % Done changed from 100 to 50
  • QA Check deleted (Pass)

Thanks! I said earlier:

I doubt it fixes the problem for real, so don't mark as fix committed, just reassign to me for 2.2.

... so I just did that.

#25 Updated by intrigeri over 3 years ago

  • Target version changed from Tails_2.2 to Tails_2.3

#26 Updated by intrigeri about 3 years ago

  • Target version changed from Tails_2.3 to Tails_2.4

#27 Updated by intrigeri about 3 years ago

anonym and frontdesk people: did you see this happen or reported since Tails 2.0 was released?

#28 Updated by intrigeri about 3 years ago

  • Assignee changed from intrigeri to anonym
  • QA Check set to Info Needed

#29 Updated by emmapeel about 3 years ago

Not that I remember....

#30 Updated by anonym about 3 years ago

  • Assignee changed from anonym to intrigeri

I cannot recall seeing it. I propose we close this ticket.

#31 Updated by intrigeri about 3 years ago

  • Status changed from In Progress to Resolved
  • Assignee deleted (intrigeri)

OK, tentatively closing then. Frontdesk, please keep an eye on it :)

Also available in: Atom PDF