How to Create PagerDuty Alerts
Resoto constantly monitors your infrastructure, and can alert you to any detected issues. PagerDuty is the de-facto standard to escalate alerts. In this guide, we will configure Resoto to send alerts to PagerDuty with a custom command.
You will also need a valid routing key for your PagerDuty account.
Open the relevant service in PagerDuty and click Integrations. Then, click the Add new integration button.
Expand Events API V2 and copy the revealed integration key:note
We will refer to this key as the "routing key" for the remainder of these instructions.
> config edit resoto.core.commands
Add the routing key copied in step 2 as the default value of the
routing_keyparameter in the
pagerdutysection. This will allow you to execute the
pagerdutycommand without specifying the routing key parameter each time.info
pagerdutycommand has the following parameters, all of which are required:
Parameter Description Default Value
Events API V2 integration key
String identifier that PagerDuty will use to ensure that only a single alert is active at a time
Alert severity (
Location of the affected system (preferably a hostname or FQDN)
Alert action (
Name of the monitoring client submitting the event
URL to the monitoring client
PagerDuty events API URL endpoint
Define the search criteria that will trigger an alert. For example, let's say we want to send alerts whenever we find a Kubernetes Pod updated in the last hour with a restart count greater than 20:
> search is(kubernetes_pod) and pod_status.container_statuses[*].restart_count > 20 and last_update<1h
kind=kubernetes_pod, name=db-operator-mcd4g, restart_count=, age=2mo5d, last_update=23m, cloud=k8s, account=prod, region=kube-system
Now that we've defined the alert trigger, we will simply pipe the result of the search query to the
pagerdutycommand, replacing the
namewith your desired alert name:
> search is(kubernetes_pod) and pod_status.container_statuses[*].restart_count > 20 and last_update<1h | pagerduty summary="Pods are restarting too often!" dedup_key="Resoto::PodRestartedTooOften"
If the defined condition is currently true, you should see a new alert in PagerDuty.
> jobs add --id alert_on_pod_failure--wait-for-event post_collect 'search is(kubernetes_pod) and pod_status.container_statuses[*].restart_count > 20 and last_update<1h | pagerduty summary="Pods are restarting too often!" dedup_key="Resoto::PodRestartedTooOften"