Handling alerts

This topic is part of the manual Operations for version Current of Genesys Multicloud CX Private Edition.

Introduction

Alerts notify you when certain metrics exceed specified thresholds. In some services, alerting is enabled by default; in others, you must enable alerting when you deploy the service. See the respective service guides (listed here) for details about service-specific support for alerting.

Alert rules

By default, most services define alerts for certain key operational parameters. The alerts are PrometheusRule objects that are defined in a YAML file. The metrics collected from the applicable service are evaluated based on the expression specified in the rule. An alert is triggered if the value of the expression is true.

Private edition does not support custom alerts triggered by rules you define yourself. However, some services — for example, Designer — enable you to modify certain parameters in the values.yaml file to customize the predefined alerts by modifying the values that trigger the alert. See the respective service-level guides for information about the limited customization each service might support.

Prometheus / Alertmanager

Enable ServiceMonitor or PodMonitor to scrape metrics from the cluster. To import custom alerts or notification configurations, follow these steps.

Alerting Rules

This section describes how to create alert rules and import custom rules.

Create alert rules. These rules triggers alerts based on the values.

apiVersion: "monitoring.coreos.com/v1"
kind: PrometheusRule
metadata:
  name: <name>-alertrules
  labels:
    genesysengage/monitoring: prometheus
    service: <service>
    servicename: <servicename>
    tenant: <tenant> --> Ex: shared
spec:
  groups:
  - name: <name>-alert
    rules:
    - alert: <alert-name>
      expr: <expression>
      for: <time> For ex: 5m
      labels:
        severity: <severity> For ex: critical
        service:  <service>
        servicename:  <servicename>
      annotations:
        summary: "<description>"

Import the custom rule.

kubectl apply -f <rules.yaml> -n monitoring

Customizing Alertmanager configuration for notifications

Alertmanager sends notifications to the notification provider such as email or Webhook (PagerDuty) when an alert is triggered.

Alertmanager configuration for Notifications

Alertmanager sends notifications to the notification provider (such as email or PagerDuty) when an alert is triggered.

To add notification configuration, edit alertmanager.yaml using the following steps:

Load the configuration map into a file using the following command.

kubectl get configmap prometheus-alertmanager --namespace=monitoring -o yaml > alertmanager.yaml

Add the configuration in alertmanager.yaml.

global:
  resolve_timeout: 5m
route:
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 12h
  receiver: default
  routes:
  - match:
      alertname: Watchdog
    repeat_interval: 5m
    receiver: watchdog
  - match:
      service: <your_service>
    routes:
    - match:
        <your_matching_rules>
      receiver: <receiver>
receivers:
- name: default
- name: watchdog
- name: <receiver>
  <receiver_configuration>

Save the changes in the file and replace the configuration map.

 kubectl replace configmaps prometheus-alertmanager --namespace=monitoring -f alertmanager.yaml

For more details about configuring receivers for alert notification and how the receiver types are created/configured, refer to Configuring alert notifications.

GKE platform

Google Cloud operations suite – Alerting

Google Cloud operations suite is backed by Stackdriver which ingests and processes alerts based on predefined policy configuration.

Stackdriver utilizes Google Cloud Monitoring API for management of metric and alert policies within the operation suite.

Here are some key features provided by Google Cloud operation suite:

Google Cloud API supports over 1,500 Cloud Monitoring metrics.
Alert policies are configured as a resource object in cloud monitoring API.
Unlike Alert Manager, policies are defined directly through GCP Cloud Monitoring API via REST or GRCP request. There are no custom resource objects in Kubernetes for alert polices in GKE.
Defining alert policies allows you to define specific conditions and actions to take in reaction to key metrics and other criteria.
Notification channels are used to specify where alerts should be sent when an incident occurs. For example:
- Webhook
- Email
- PagerDuty

For more details, refer to the following Google document pages:

Google Cloud Monitoring API - Alert Policy

Alert Policy REST API

All API requests to Google Cloud Monitoring API require proper authentication before you query and apply configuration.

See Google authentication for further details.

Here are various functions that are available for creation of custom alert policy.

projects.alertPolicies.create

POST https://monitoring.googleapis.com/v3/{name}/alertPolicies

projects.alertPolicies.delete

DELETE https://monitoring.googleapis.com/v3/{name}

projects.alertPolicies.get

GET https://monitoring.googleapis.com/v3/{name}

projects.alertPolicies.list

GET https://monitoring.googleapis.com/v3/{name}/alertPolicies

projects.alertPolicies.patch

PATCH https://monitoring.googleapis.com/v3/{alertPolicy.name}

Alert Policy example

This example assumes you have created notification channel and uptime check prior to deployment.

AlertPolicy - NGINX Ingress Uptime Check

{
  "displayName": "Uptime-Test uptime failure- Ingress",
  "documentation": {
    "content": "Indicates issue with NGINX Ingress availability.  Check ingress-nginx-controller-* in the 'ingress-nginx' namespace",
    "mimeType": "text/markdown"
  },
  "conditions": [
    {
      "displayName": "Failure of uptime check_id uptime-test",
      "conditionThreshold": {
        "aggregations": [
          {
            "alignmentPeriod": "1200s",
            "crossSeriesReducer": "REDUCE_COUNT_FALSE",
            "groupByFields": [
              "resource.label.*"
            ],
            "perSeriesAligner": "ALIGN_NEXT_OLDER"
          }
        ],
        "comparison": "COMPARISON_GT",
        "duration": "60s",
        "filter": "metric.type=\"monitoring.googleapis.com/uptime_check/check_passed\" AND metric.label.check_id=\"<upgtimeCheckConfigs ID> pod " AND resource.type=\"k8s_service\"",
        "thresholdValue": 1,
        "trigger": {
          "count": 1
        }
      }
    }
  ],
  "combiner": "OR",
  "enabled": true,
  "notificationChannels": [
    "projects/gcpe0001/notificationChannels/<notificationChannel ID>"
  ]
}

Operations

Observability

Logging