Click the plus symbol on the page to view the alert creation form. Select the type of action to perform from Slack, PagerDuty, webhook request, or email as shown here:
Slack and PagerDuty integrations will redirect you to the corresponding site to integrate, while webhooks require a value, in this example the email(s) you’d like to notify:
Choose if the alert applies to every check, or a single check.
Select a metric to alert on, if your only concern is that the website, API, or application is up then select “Downtime”. You may wish to report on unresponsive endpoints using “Total Response Time” as well.
Enter the number of minutes of downtime within the time window (15 Minutes here) which is acceptable before triggering an alert. Choosing this value depends greatly on the nature of your service.
For less obtrusive actions such as Slack or email it may be fine to trigger an alert when downtime is above a minute or two, however triggering a PagerDuty for an intermittent problem is less desirable.
Click save and you’re good to go!
Apex Ping supports a variety of metrics to alert on. You’ll likely want a “Downtime” and “Total Response Time” alert to ensure availability and responsiveness. Some alerts such as “Total Response Time” allow you to alert against stats such as min, max, average or percentiles for additional control.
Typically you may have a “Downtime” alert of “above 2 minutes” to fire off a Slack or email alert, with a second less sensitive alert of “above 5 minutes in the last 30 minutes” to notify the team member on-call via PagerDuty.
If you’re receiving too many notifications, you may want to consider increasing the time window to 15, 30, or 60 minutes instead of the default of 5.
For example changing an alert from “downtime above 1 minute in the last 5 minutes” to “downtime above 5 minutes in the last 15 minutes” will decrease the likelihood of a triggered alert, as well as notifying you of changes once every 60 minutes.
Apex Ping currently confirms downtime in three additional locations to ensure that the failure is not intermittent. This helps reduce false-positives, however it’s important to note that a single “minute” of downtime is effectively equivalent to four HTTP requests failing.
Requests which take longer than 10 seconds time out, and are treated as errors which contribute to downtime.
Alert emails are delivered from the firstname.lastname@example.org address, subjects are formatted as follows, which may be useful for integration with services such as OpsGenie: