Project

General

Profile

Bug #17307

nginx "504 Gateway Time-out" while refreshing the website at Tails release time

Added by intrigeri 4 months ago. Updated 13 days ago.

Status:
Resolved
Priority:
Elevated
Assignee:
Category:
Infrastructure
Target version:
Start date:
Due date:
% Done:

0%

Feature Branch:
Type of work:
Sysadmin
Blueprint:
Starter:
Affected tool:

Description

This happened to kibi today while pushing the web/release-4.1 branch before publishing the release. Note that in theory, there's no reason why this push should trigger a website refresh.

This can also happen when pushing the master branch, which puts ikiwiki in a broken state, that can only be recovered from by rebuilding the website, which requires sysadmin privs. Can we increase this timeout?


Related issues

Related to Tails - Bug #17361: Streamline our release process Confirmed
Related to Tails - Bug #17363: Ensure only pushes to the master branch trigger a website refresh Resolved

History

#1 Updated by intrigeri 4 months ago

  • Description updated (diff)

#2 Updated by intrigeri 4 months ago

  • Related to Bug #17361: Streamline our release process added

#3 Updated by intrigeri 3 months ago

  • Related to Bug #17363: Ensure only pushes to the master branch trigger a website refresh added

#4 Updated by intrigeri 3 months ago

  • Status changed from Confirmed to Needs Validation
  • Assignee changed from intrigeri to zen

#5 Updated by CyrilBrulebois 3 months ago

  • Target version changed from Tails_4.2 to Tails_4.3

#6 Updated by anonym about 2 months ago

  • Target version changed from Tails_4.3 to Tails_4.4

#7 Updated by CyrilBrulebois 20 days ago

  • Target version changed from Tails_4.4 to Tails_4.5

#8 Updated by zen 15 days ago

  • Assignee changed from zen to intrigeri

I've never done this workflow myself, but from the context I'm assuming that what happens is:

  • Person pushes to Git repo.
  • Git hook uses curl to make HTTP request to ikiwiki.cgi triggering rebuild.
  • Ikiwiki is busy for some reason, user waits patiently.
  • HTTP request times out after 5 minutes.
  • Git hook get's a 504 from curl and shows it to the user.

Can you please confirm that this understanding is correct?

If that is correct, I agree that increasing timeout out prevent inconsistent state of the website, but I don't understand why turning off buffering would improve UX, as curl will still wait for the content of the request before returning anything to the user. Is it the case that the progress information will be shown to the user? Maybe we want to use -I as an option for curl?

It is also still not clear why a push to a non-master branch would trigger the rebuild, as the Git hook explicitly avoids that.

Maybe I need to better understand the details of how the problem expresses itself to be able to review the proposed solutions.

#9 Updated by intrigeri 13 days ago

  • Status changed from Needs Validation to Resolved

Hi!

I've never done this workflow myself, but from the context I'm assuming that what happens is:

  • Person pushes to Git repo.
  • Git hook uses curl to make HTTP request to ikiwiki.cgi triggering rebuild.
  • Ikiwiki is busy for some reason, user waits patiently.
  • HTTP request times out after 5 minutes.
  • Git hook get's a 504 from curl and shows it to the user.

Can you please confirm that this understanding is correct?

Yep, I think that's it, from the PoV of the person who does the Git push.

And on top of that, I think that when the request times out, ikiwiki may get killed, which results in the website being in broken state, that can be repaired only by our sysadmins.

If that is correct, I agree that increasing timeout out prevent inconsistent state of the website,

OK, I'm glad this part is validated :)
It was actually the most important aspect of this ticket ⇒ closing as resolved.
The UX part was a bonus, "while I'm at it" attempt.

but I don't understand why turning off buffering would improve UX, as curl will still wait for the content of the request before returning anything to the user. Is it the case that the progress information will be shown to the user?

I've just tested it and it seems that you're right: empirically, it seems that the output is stuck on "Requesting update of https://tails.boum.org/..." until ikiwiki has finished refreshing the website, at which point I see the output. It's not 100% clear to me at this point if that's caused by:

  1. the way we use curl
  2. how we pipe the output to perl
  3. how ikiwiki.cgi behaves
  4. how nginx behaves

In order to dismiss one of the curl-related potential culprits, I've passed it the --no-buffer option. This did not change anything (empirically).
I've also verified that when piping something through perl -p, perl processes lines one after the other and outputs the result incrementally, so that's not it either.
That's not much progress but it narrows a little bit the scope of the investigation :)

Maybe we want to use -I as an option for curl?

I was not sure:

  • Why -I aka. --head would change anything: the reply header includes the success/error HTTP code, which presumably is unknown until the ikiwiki operation completes, so the same problems (lack of progress output to the user) should occur.
  • Whether ikiwiki.cgi?do=ping would do anything when it receives a HEAD HTTP command (as opposed to a GET).

Also, I would find it sad to hide the non-header part of the HTTP response, which is sometimes useful.

But anyway, I tested it. The good news is that ikiwiki.cgi does its job; the bad news is that the output is stil displayed in one batch at the end, so this does not help wrt. UX ⇒ reverted.

If you have other cheap ideas to try & improve the UX aspect, let's try them. But IMO this is not important enough to warrant tracking this as an issue on Redmine.

It is also still not clear why a push to a non-master branch would trigger the rebuild, as the Git hook explicitly avoids that.

I think you mean files/gitolite/hooks/www_website_ping-post-update.hook and I agree. Either something else, that I don't understand yet, is going on. Or the "while pushing the web/release-4.1 branch" part of the bug report was incorrect. Without more info, I think we should close this issue and ask kibi to report back next time this happens.

Also available in: Atom PDF