Metrics and Alerts

From Genesys Documentation
< User:Marudhu.pandian@genesys.com
Revision as of 06:37, December 13, 2021 by Marudhu.pandian@genesys.com (talk | contribs) (Created page with "{{Article |Standalone=No |DisplayName=Metrics and Alerts |TocName=Metrics and Alerts |Context=Learn which metrics you should monitor for <service_name> and when to sound the a...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Learn which metrics you should monitor for <service_name> and when to sound the alarm.

Metrics

Designer supports the Prometheus monitoring system. Desginer and DAS generate application related metrics at the /metric API in the standard Prometheus client format.

Designer metrics

[+] Some of the metrics exposed by Designer are as follows:

DAS metrics

[+] Some of the metrics exposed by DAS are as follows:
Important
In addition to the above metrics, we can obtain infrastructure related metrics by installing standard Prometheus clients in the Kubernetes cluster.

Alerts

This section provides a list of available alerts and information on enabling, disabling, and updating alerts.

Available Alerts

Microservice Alarm Description Alert Name Default Threshold Default Interval (seconds) Default Alert Severity
DES and DAS Pod CPU Usage Triggered when a pod's CPU utilization is beyond the threshold. CPUUtilization 75% 180 CRITICAL
DES and DAS Pod Memory Usage Triggered when a pod's memory utilization is beyond the threshold. MemoryUtilization 75% 180 CRITICAL
DES and DAS Pod Restarts Count Triggered when a pod's restart count is beyond the threshold. containerRestartAlert 5 180 CRITICAL
DES and DAS Pod Ready Count Triggered when a pod's ready count is less than the threshold (1). containerReadyAlert 1 60 CRITICAL
DES and DAS Deployment availability Triggered when Designer/DAS pod metrics are unavailable. AbsentAlert 1 60 CRITICAL
DES and DAS Azure Fileshare PVC Usage Triggered when file share usage is greater than the threshold. WorkspaceUtilization 80% 180 HIGH
DES and DAS Health Status Triggered when Designer/DAS health status is 0. Health 0 60 CRITICAL
DES and DAS Workspace Health Status Triggered when Designer/DAS is not able to communicate with the workspace. WorkspaceHealth 0 60 CRITICAL
DES ElasticSearch Health Status Triggered when Designer/DAS is not able to reach the Elasticsearch server. ESHealth 0 60 CRITICAL
DES GWS Health Status Triggered when Designer/DAS is not able to reach the GWS server. GWSHealth 0 60 CRITICAL
DAS PHP Health Status Triggered when Designer/DAS experiences a PHP Health check failure. PHPHealth 0 60 CRITICAL
DAS Proxy Health Status Triggered when Designer/DAS experiences a Proxy Health check failure. ProxyHealth 0 60 CRITICAL
DAS Application 5XX Error Alarm Triggered when DAS exceeds the allowed 5xx error count threshold specified here. HTTP5XXCount 10 180 HIGH
DAS Application 4XX Error Alarm Triggered when DAS exceeds the 4xx error count threshold specified here. HTTP4XXCount 100 180 HIGH
DAS DAS PHP Latency Alert Triggered when the average time taken by a PHP request is greater than the threshold (in seconds) specified here. PhpLatency 10 seconds 180 HIGH
DAS DAS HTTP Latency Alert Triggered when the average time taken by a HTTP request is greater than the threshold (in seconds) specified here. HTTPLatency 10 seconds 180 HIGH

Enable alerts in Designer

To enable alerts in Designer, use either of the following methods:

Method 1: Enable Prometheus alerts in the values.yaml file.

designer:
    prometheus:
        alerts:
            enabled: true # this will be false by default.

Method 2: Find out the active deployment color and execute the below command in the corresponding deployment:

helm upgrade --install designer-blue -f designer-values.yaml designer-9.0.xx.tgz --set designer.deployment.strategy=blue-green --set designer.prometheus.alerts.enabled=true

Disable alerts in Designer

To disable or delete alerts, use either of the following methods:

Method 1: Disable Prometheus alerts in the values.yaml file.

designer:
    prometheus:
        alerts:
            enabled: false # this will be false default.

Method 2: Pass the below parameter along with the Helm upgrade command.

helm upgrade --install designer-blue -f designer-values.yaml designer-9.0.xx.tgz --set designer.deployment.strategy=blue-green --set designer.prometheus.alerts.enabled=false

Enable alerts in DAS

To enable alerts, use either of the following methods:

Method 1: Enable Prometheus alerts in the values.yaml file.

das:
    prometheus:
        alerts:
            enabled: true # this will be false default.

Method 2: Pass the below parameter along with the Helm upgrade command.

helm upgrade --install designer-das-blue -f designer-values.yaml designer-das-9.0.xx.tgz --set das.deployment.strategy=blue-green --set das.prometheus.alerts.enabled=true

Disable alerts in DAS

To disable or delete alerts, use either of the following methods:

Method 1: Disable Prometheus alerts in the values.yaml file.

das:
    prometheus:
        alerts:
            enabled: false # this will be false default.

Method 2: Pass the below parameter along with the Helm upgrade command.

helm upgrade --install designer-das-blue -f designer-values.yaml designer-das-9.0.xx.tgz --set das.deployment.strategy=blue-green --set das.prometheus.alerts.enabled=false

Update alert parameters

The following alert parameters can be updated:

  • Alert Threshold (ALERT_PARAMETER_NAME: threshold)
  • Alert Interval (ALERT_PARAMETER_NAME: interval)
  • Alert Severity (ALERT_PARAMETER_NAME: AlertPriority)

Perform the following steps to update the above alerts:

  1. Refer to the list of alerts and identify the name of the alert you want to update or modify.
  2. Update the alert by adding a parameter in the below format in the values.yaml file:
    designer:
        prometheus:
            alerts:
                <ALERT_NAME>:
                    <ALERT_PARAMETER_NAME1>: <ALERT_PARAMETER_VALUE1>
                    <ALERT_PARAMETER_NAME2>: <ALERT_PARAMETER_VALUE2>

For example, consider the CPU utilization alert. The alert name is CPUUtilization with a default threshold of 75, severity set to CRITICAL and interval set to 180s. To modify its threshold to 80, severity to HIGH, and interval to 120 seconds, you will have to make the following changes in the values.yaml file:

designer:
    prometheus:
        alerts:
            CPUUtilization:
                threshold: 80
                interval: 120
                AlertPriority: HIGH

Add new Prometheus alerts

If you want to add new alerts for the metrics available in the Prometheus server, you can use a custom alert block.
Important
Currently, custom alert blocks support only simple PromQL expressions.
A simple PromQL expression contains the below elements:

PromQLElements.png

  • To create custom alerts, define the alerts using the below format in the values.yaml file:
    designer(das):
        prometheus:
            alerts:
                customalerts:
                    - <CUSTOM_ALERT_1>
                      enabled: true ## we must set it to true to create custom alerts
                      name: <CUSTOM_ALERT_NAME>
                      expr:
                        metric: <METRIC_NAME>
                        labels:
                            <LABEL_NAME1> : <LABEL_VALUE1>
                            <LABEL_NAME2> : <LABEL_VALUE2>
                      operator: OPERATOR # '<' or '>' or '=' or '!='
                      interval: <ALERT_INTERVAL>
                      threshold: <ALERT_THRESHOLD>
                      AlertPriority: <ALERT_SEVERITY>
  • Custom alerts is a list and you can add any number of alerts to it. Custom alerts appear as follows in the alertfile.yaml file:
    groups:
        - name:
          rules:
            - alert: DESIGNER<CUSTOM_ALERT_NAME>
              expr: <ALERT_METRIC_NAME> { <LABEL_NAME_1>=<LABEL_VALUE_1>, <LABEL_NAME_2>=<LABEL_VALUE_2>} OPERATOR <ALERT_THRESHOLD>
              for: <ALERT_INTERVAL>
              labels:
                  severity: <ALERT_SEVERITY>
              annotations:
              summary: DESIGNER<CUSTOM_ALERT_NAME> has crossed the threshold of <ALERT_THRESHOLD>
              information: DESIGNER<CUSTOM_ALERT_NAME> has crossed the threshold of <ALERT_THRESHOLD> for <ALERT_INTERVAL>
  • To disable custom alerts, set designer.prometheus.alerts.customalerts.<CUSTOM_ALERT_NAME>.enabled to false or remove the custom alerts from the list.

Expected output

The above instructions will create a Kubernetes custom resource object, PrometheusRule, and add its name to the Helm chart. After executing the above steps, you can check if the PrometheusRule resource object is created for Designer and DAS by using the following command kubectl command:

kubectl get prometheusrule <resource name> # designer-prometheus-alerts or designer-das-prometheus-alerts

Grafana dashboard

To create a Grafana dashboard, you must run the Designer/DAS Helm chart with the deployment strategy set to grafana. The default Grafana dashboard will have a graph for almost all of the alerts listed above and in addition, a few graphs for metrics exposed by the Designer and DAS applications.

Enable Grafana dashboard

To enable the Grafana dashboard execute the following commands:

Designer

helm upgrade --install designer-dashboard -f designer-values.yaml designer-9.0.xx.tgz --set designer.deployment.strategy=grafana --set designer.grafana.enabled=true

DAS

helm upgrade --install designer-das-dashboard -f designer-das-values.yaml designer-das-9.0.xx.tgz --set das.deployment.strategy=grafana --set das.grafana.enabled=true

Disable Grafana dashboard

To disable the Grafana dashboard execute the following commands:

Designer

helm upgrade --install designer-dashboard -f designer-values.yaml designer-9.0.xx.tgz --set designer.deployment.strategy=grafana --set designer.grafana.enabled=false

DAS

helm upgrade --install designer-das-dashboard -f designer-das-values.yaml designer-das-9.0.xx.tgz --set das.deployment.strategy=grafana --set das.grafana.enabled=false

Expected output

The above steps will create a ConfigMap resource containing the Grafana dashboard.json file. The monitoring service needs to be configured to read the ConfigMap and create the Grafana dashboard.