Configuring Monitoring Correctly

We get a lot of questions about how to set up Netumo correctly and our teams are there to advise our customers on the best monitoring setup for their different sites.

Ultimately you know if you set up a monitoring solution properly during an unplanned outage, if

  1. It was the first to notify you about the outage.
  2. It grabs your attention.

Let’s go through some tips to ensure the above happens.

Create clever monitors

You can define lots of monitors but think about what you are monitoring and how you want to monitor it. Think also holistically, you might have lots of sites on the same provider/host.

Ensure that your monitor some unique content on the page., It’s not just about frequency. A monitor every 5 minutes verifying the content returned, can be more effective than a simple ping monitor checking every minute.

For example, if you have multiple websites on the same host, then they don’t all need to be monitored every 1 minute. Simple set one to 1-minute interval checks., the rest to 5 or 10 minutes. The one every minute will alert you if the host goes down, then the rest will monitor the individual sites.

Monitor Everything

Create steps in your processes so that you never miss adding a domain/website to be monitored. We’ve been there ourselves where a new portal or site is deployed and we forget to set up monitoring. Having clearly documented going-to-production playbooks ensures that you never forget this.

Notify the right people

Notifications depend on how your teams are set up. If you have a 24/7 NOC/DevOps team then things are easier cause you always have somebody on standby at the office on the alert for such issues, but if you are a small company with no round-the-clock coverage, you might require a different approach.

Use SMS, Twitter, Telegram, Slack, and Microsoft Teams communications. Ensure that all members have them set up on their phones. This ensures that the proper fold is advised.

Notify in Advance

After each outage one should ask, could this have been avoided? Could I have been advised in advance?

Pre-outage checks exist like domain and certificate expiry, but don’t leave till the last 3 to 5 days to alert your teams. What happens if the alert is triggered on a Friday afternoon or worse during a long weekend? Make sure that there is ample time for your teams to complete the job.

Related Posts