DAS metrics and alerts

This topic is part of the manual Designer Private Edition Guide for version Current of Designer.

Service	CRD or annotations?	Port	Endpoint/Selector	Metrics update interval
DAS	ServiceMonitor	8081	selector: matchLabels: {{- include "das.serviceSelectorLabels" . \| nindent 6 }} Labels to identify which service to communicate with depend on an unique label applicable to DAS. Path: `/metrics`	10 seconds

See details about:

Metrics[edit source]

Given below are some of the metrics exposed by the DAS service:

Important

DAS exposes many Genesys-defined as well as system metrics. You can query Prometheus directly to see all the available metrics. The metrics documented on this page are likely to be particularly useful. Genesys does not commit to maintain other currently available DAS metrics not documented on this page.

Metric and description	Metric details	Indicator of
sdr_requests_received Number of requests received since DAS is running (provided for each CCID).	Unit: Type: Counter Label: Sample value: 1998352
sdr_requests_rejected Number requests rejected since DAS is running (provided for each CCID).	Unit: Type: Counter Label: Sample value:
data_tables_requests_failures Number of failed data table requests since DAS is running (provided for each CCID).	Unit: Type: Counter Label: Sample value: 80
data_tables_request_duration Data table requests latency in seconds, since DAS is running (provided for each CCID).	Unit: seconds Type: Histogram Label: Sample value: 189
business_hours_requests_failures Number of failed business hours requests since DAS is running.	Unit: Type: Counter Label: Sample value:
business_hours_request_duration Business hours requests latency in seconds, since DAS is running (provided for each CCID).	Unit: seconds Type: Histogram Label: Sample value: 26
special_days_requests_failures Number of failed special days requests since DAS is running.	Unit: Type: Counter Label: Sample value:
special_days_request_duration Special days requests latency in seconds, since DAS is running (provided for each CCID).	Unit: seconds Type: Histogram Label: Sample value: 34
external_requests_failures Number of failed external requests since DAS is running.	Unit: Type: Counter Label: Sample value:
external_requests_timedout Number of timed out external requests since DAS is running.	Unit: Type: Counter Label: Sample value:
external_requests_duration External requests latency in seconds, since DAS is running.	Unit: seconds Type: Histogram Label: Sample value:
das_http_request_duration_seconds HTTP request latency in seconds (provided for each request type and CCID).	Unit: seconds Type: Histogram Label: Sample value: 40
das_http_requests_total Number of HTTP requests (provided for each request type and CCID).	Unit: Type: Counter Label: Sample value: 40
nginx_metric_errors_total Number of nginx-lua-prometheus errors.	Unit: Type: Counter Label: Sample value: 2

Alerts[edit source]

The following alerts are defined for DAS.

Alert	Severity	Description	Threshold
CPUUtilization (Alarm: Pod CPU Usage)	CRITICAL	Triggered when a pod's CPU utilization is beyond the threshold.	75% Default interval: 180s
MemoryUtilization (Alarm: Pod Memory Usage)	CRITICAL	Triggered when a pod's memory utilization is beyond the threshold.	75% Default interval: 180s
containerRestartAlert (Alarm: Pod Restarts Count)	CRITICAL	Triggered when a pod's restart count is beyond the threshold.	5 Default interval: 180s
containerReadyAlert (Alarm: Pod Ready Count)	CRITICAL	Triggered when a pod's ready count is less than the threshold (1).	1 Default interval: 60s
AbsentAlert (Alarm: Deployment availability)	CRITICAL	Triggered when DAS pod metrics are unavailable.	1 Default interval: 60s
WorkspaceUtilization (Alarm: Azure Fileshare PVC Usage)	HIGH	Triggered when file share usage is greater than the threshold.	80% Default interval: 180s
Health (Alarm: Health Status)	CRITICAL	Triggered when DAS health status is 0.	0 Default interval: 60s
WorkspaceHealth (Alarm: Workspace Health Status)	CRITICAL	Triggered when DAS is not able to communicate with the workspace.	0 Default interval: 60s
PHPHealth (Alarm: PHP Health Status)	CRITICAL	Triggered when Designer/DAS experiences a PHP Health check failure.	0 Default interval: 60s
ProxyHealth (Alarm: Proxy Health Status)	CRITICAL	Triggered when Designer/DAS experiences a Proxy Health check failure.	0 Default interval: 60s
HTTP5XXCount (Alarm: Application 5XX Error)	HIGH	Triggered when DAS exceeds the allowed 5xx error count threshold specified here.	10 Default interval: 180s
HTTP4XXCount (Alarm: Application 4XX Error)	HIGH	Triggered when DAS exceeds the 4xx error count threshold specified here.	100 Default interval: 180s
PhpLatency (Alarm: DAS PHP Latency Alert)	HIGH	Triggered when the average time taken by a PHP request is greater than the threshold (in seconds) specified here.	10s Default interval: 180s
HTTPLatency (Alarm: DAS HTTP Latency Alert)	HIGH	Triggered when the average time taken by a HTTP request is greater than the threshold (in seconds) specified here.	10s Default interval: 180s

Designer Private Edition Guide

Overview

Configure and deploy

Upgrade, roll back, or uninstall

Observability

Kubernetes platform specific information

DAS metrics and alerts

Contents

Metrics[edit source]

Alerts[edit source]