Project

General

Profile

Feature #8650

Feature #5734: Monitor servers

Feature #9482: Create a monitoring setup prototype

Configure monitoring for the most critical services

Added by intrigeri over 4 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Infrastructure
Target version:
Start date:
01/09/2015
Due date:
% Done:

100%

Feature Branch:
Type of work:
Sysadmin
Blueprint:
Starter:
Affected tool:

Description

That is, those with "CRITICAL" priority on the blueprint.


Related issues

Blocked by Tails - Feature #8648: Initial set up of the monitoring software Resolved 03/07/2016
Blocks Tails - Feature #8652: Evaluate how the initial monitoring setup behaves and adjust things accordingly Resolved 01/09/2015

History

#1 Updated by intrigeri over 4 years ago

  • Blocked by Feature #8649: Specify our monitoring needs and build an inventory of the services that need monitoring added

#2 Updated by intrigeri over 4 years ago

  • Blocked by Feature #8648: Initial set up of the monitoring software added

#3 Updated by intrigeri over 4 years ago

  • Blocked by deleted (Feature #8649: Specify our monitoring needs and build an inventory of the services that need monitoring)

#4 Updated by intrigeri over 4 years ago

  • Blocked by deleted (Feature #8648: Initial set up of the monitoring software)

#5 Updated by intrigeri over 4 years ago

  • Target version changed from Tails_1.8 to Tails_1.5
  • Parent task changed from #5734 to #9482

#6 Updated by intrigeri over 4 years ago

  • Blocked by Feature #8648: Initial set up of the monitoring software added

#8 Updated by intrigeri over 4 years ago

  • Blocks Feature #8652: Evaluate how the initial monitoring setup behaves and adjust things accordingly added

#9 Updated by intrigeri over 4 years ago

  • Target version changed from Tails_1.5 to Tails_1.6

#10 Updated by Dr_Whax about 4 years ago

  • Target version changed from Tails_1.6 to Tails_1.5

#11 Updated by intrigeri about 4 years ago

  • Target version changed from Tails_1.5 to Tails_1.6

#12 Updated by bertagaz almost 4 years ago

  • Target version changed from Tails_1.6 to Tails_1.7

#13 Updated by intrigeri almost 4 years ago

  • Description updated (diff)

#14 Updated by intrigeri almost 4 years ago

  • Due date set to 10/26/2015

#15 Updated by intrigeri almost 4 years ago

  • Due date deleted (10/26/2015)
  • Assignee changed from Dr_Whax to bertagaz
  • Target version changed from Tails_1.7 to Tails_2.0

#16 Updated by bertagaz over 3 years ago

  • Target version changed from Tails_2.0 to Tails_2.2

#17 Updated by bertagaz over 3 years ago

  • Status changed from Confirmed to In Progress
  • % Done changed from 0 to 20

I've done configuration for all this checks, and they can all be done with the plugins shipped in the monitoring-plugins-* Debian packages, apart from the rsync one, which requires the use of one that can be found on the nagios exchange website with little adaptations.

#18 Updated by bertagaz over 3 years ago

  • Target version changed from Tails_2.2 to Tails_2.3

#19 Updated by bertagaz over 3 years ago

  • % Done changed from 20 to 30

I've deployed what is the skeleton for service checks. That's tails::monitoring::service, and a first APT check has been deployed.

Now as a reminder, the services marked as CRITICAL in the blueprint are:

#20 Updated by intrigeri over 3 years ago

Hi! Commit 695ac501582cc771440368d837574bf76d950942 in puppet-tails introduces a buggy regexp that clearly doesn't do what you believe it does (in practice it makes the validation much more lax than intended). I think you rather mean something like '^\w+(?:\w|\.)+$'@ (untested). Please do test such regexps when unsure, both with strings that are supposed to match, and with strings that are not supposed to match, especially when it's about validating stuff :)

#21 Updated by bertagaz over 3 years ago

intrigeri wrote:

Hi! Commit 695ac501582cc771440368d837574bf76d950942 in puppet-tails introduces a buggy regexp that clearly doesn't do what you believe it does (in practice it makes the validation much more lax than intended). I think you rather mean something like '^\w+(?:\w|\.)+$'@ (untested). Please do test such regexps when unsure, both with strings that are supposed to match, and with strings that are not supposed to match, especially when it's about validating stuff :)

I did test it, and it did seem to work, even with invalid input. Your proposal doesn't seem to work to me otoh: it seems you misplaced/forgot the '@'. I've pushed another version, which is maybe a bit more strict.

#22 Updated by intrigeri over 3 years ago

intrigeri wrote:

Hi! Commit 695ac501582cc771440368d837574bf76d950942 in puppet-tails introduces a buggy regexp that clearly doesn't do what you believe it does (in practice it makes the validation much more lax than intended). I think you rather mean something like '^\w+(?:\w|\.)+$'@ (untested). Please do test such regexps when unsure, both with strings that are supposed to match, and with strings that are not supposed to match, especially when it's about validating stuff :)

I did test it, and it did seem to work, even with invalid input.

OK, so you were unlucky and did not hit any of the false negatives (cases when it would validate buggy input). FYI it would have (erroneously) validated for example that string:

abc@a%_»$PATH

... which, I believe, was not intended :)

Your proposal doesn't seem to work to me otoh: it seems you misplaced/forgot the '@'.

Redmine mangled my comment (or rather, I got it wrong how to include the "" char in inline code that's formatted with ""). Sorry about the confusion! In such cases, when what you see is obviously wrong, you may want to pretend you're going to edit my comment (the pen icon next to it), so you can see what exactly I have typed :) And then of course you can cancel the edit.

I've pushed another version, which is maybe a bit more strict.

I think you misunderstand a little bit how a character set (in square brackets) works. E.g. those strings are valid according to the current validation regexp:

a|@b
a@b|

... which feels quite wrong.

The regexp I've proposed has none of those problems. But apparently you now want (or need) to allow dashes in both the left-hand and right-hand side of the "@", so my proposal is outdated. Here's an updated (and simpler) one:

^[\w-]+@[\w.-]+$

I suggest you either take that one as-is, or learn some basics about regexps, before submitting another proposal you don't fully understand the meaning of. Fair enough?

#23 Updated by bertagaz over 3 years ago

intrigeri wrote:

I suggest you either take that one as-is, or learn some basics about regexps, before submitting another proposal you don't fully understand the meaning of. Fair enough?

Yep, thx for the lengthy explanation. I've fixed that with your own regexp in commit puppet-tails:9007871

#24 Updated by bertagaz over 3 years ago

  • Assignee changed from bertagaz to intrigeri
  • % Done changed from 30 to 70
  • QA Check set to Ready for QA

bertagaz wrote:

ok, I've deployed all this checks. Most of them are HTTP checks and use the same tails::monitoring::service::http manifest.

#25 Updated by intrigeri over 3 years ago

  • Assignee changed from intrigeri to bertagaz
  • QA Check changed from Ready for QA to Dev Needed

ok, I've deployed all this checks. Most of them are HTTP checks and use the same tails::monitoring::service::http manifest.

Thanks. I've had a look from the web interface PoV (I didn't look at the code this time, let's move on and I'll review it all at the same time when you're done with the other checks).

https://icingaweb2.tails.boum.org/monitoring/service/history?host=ecours.tails.boum.org&service=nightly_stable and https://icingaweb2.tails.boum.org/monitoring/service/history?host=ecours.tails.boum.org&service=nightly_devel result in 404 errors and socket timeouts too often; I realize that this very ticket is not about fixing problems identified by monitoring, and the majority of these problems are probably on the monitored systems' side (and not problems with the monitoring system); but let's not close another QA work chapter by leaving it in a shape in which we ignore false positives that are too numerous ⇒ please file a subtask of #9484 to investigate and fix these flaky checks (and probably their root cause, most of the time; I can help with some of those, we can share them as part of sysadmin shifts).

Same for:

Once these robustness issues are well tracked in a way that explicitly blocks #9484, please close this ticket as resolved. Congrats!

#26 Updated by bertagaz over 3 years ago

  • Status changed from In Progress to Resolved
  • Assignee deleted (bertagaz)
  • % Done changed from 70 to 100

intrigeri wrote:

https://icingaweb2.tails.boum.org/monitoring/service/history?host=ecours.tails.boum.org&service=nightly_stable and https://icingaweb2.tails.boum.org/monitoring/service/history?host=ecours.tails.boum.org&service=nightly_devel result in 404 errors and socket timeouts too often; I realize that this very ticket is not about fixing problems identified by monitoring, and the majority of these problems are probably on the monitored systems' side (and not problems with the monitoring system); but let's not close another QA work chapter by leaving it in a shape in which we ignore false positives that are too numerous ⇒ please file a subtask of #9484 to investigate and fix these flaky checks (and probably their root cause, most of the time; I can help with some of those, we can share them as part of sysadmin shifts).

Same for:

Once these robustness issues are well tracked in a way that explicitly blocks #9484, please close this ticket as resolved. Congrats!

I've made #11358 blocking #9484 as a way to track this. Might be that our different HTTP checks running concurrently and often (every minute at least for each, 30s if the check is flappy) wasn't also a bit too more intense, and it might be that being less aggressive on that will help. If not, then it will deserve another ticket.

Also available in: Atom PDF