Project

General

Profile

Bug #16126

no alerts when icinga2 is down

Added by groente 10 months ago. Updated 4 months ago.

Status:
Confirmed
Priority:
Elevated
Assignee:
-
Category:
-
Target version:
-
Start date:
11/14/2018
Due date:
% Done:

0%

Feature Branch:
Type of work:
Sysadmin
Blueprint:
Starter:
Affected tool:

Description

icinga2 on monitor was down for several days without us noticing. the web frontend showed no indication of the backend being down and there seem to be no other checks outside of icinga to keep an eye whether our monitoring is still actually functioning.

let's set an hourly cron for a simple script called that attempts to connect to monitor on port 5665 and mails tails-sysadmins on failure. i'd propose running this script on ecours, what do you think?

History

#1 Updated by intrigeri 10 months ago

there seem to be no other checks outside of icinga to keep an eye whether our monitoring is still actually functioning.

Assuming that icinga2.service works decently well, the systemd check for monitor.lizard should tell us if icinga2.service is not in a good shape on that host. Now of course, if that service is down, Icinga2 won't report about itself being down. But I would expect one could teach the central monitoring aggregator (Icinga2 on ecours) to treat "no recent results from check X" as a check failure. Had we had that in place, would have seen that something was wrong.

let's set an hourly cron for a simple script called that attempts to connect to monitor on port 5665 and mails tails-sysadmins on failure. i'd propose running this script on ecours, what do you think?

I'm all for adding an external check for that service. Any reason we can't do this with an icinga check (running on ecours) instead of a cronjob?

#2 Updated by intrigeri 4 months ago

  • Assignee deleted (bertagaz)
  • QA Check deleted (Info Needed)

Also available in: Atom PDF