DAS metrics and alerts

From Genesys Documentation
Jump to: navigation, search
This topic is part of the manual Designer Private Edition Guide for version Current of Designer.

Find the metrics DAS exposes and the alerts defined for DAS.

Service CRD or annotations? Port Endpoint/Selector Metrics update interval
DAS ServiceMonitor 8081
selector:
    matchLabels:
      {{- include "das.serviceSelectorLabels" . | nindent 6 }}
Labels to identify which service to communicate with depend on an unique label applicable to DAS.

Path: /metrics

10 seconds

See details about:

Metrics[edit source]

Given below are some of the metrics exposed by the DAS service:

Important
DAS exposes many Genesys-defined as well as system metrics. You can query Prometheus directly to see all the available metrics. The metrics documented on this page are likely to be particularly useful. Genesys does not commit to maintain other currently available DAS metrics not documented on this page.
Metric and description Metric details Indicator of
sdr_requests_received

Number of requests received since DAS is running (provided for each CCID).

Unit:

Type: Counter
Label:
Sample value: 1998352

sdr_requests_rejected

Number requests rejected since DAS is running (provided for each CCID).

Unit:

Type: Counter
Label:
Sample value:

data_tables_requests_failures

Number of failed data table requests since DAS is running (provided for each CCID).

Unit:

Type: Counter
Label:
Sample value: 80

data_tables_request_duration

Data table requests latency in seconds, since DAS is running (provided for each CCID).

Unit: seconds

Type: Histogram
Label:
Sample value: 189

business_hours_requests_failures

Number of failed business hours requests since DAS is running.

Unit:

Type: Counter
Label:
Sample value:

business_hours_request_duration

Business hours requests latency in seconds, since DAS is running (provided for each CCID).

Unit: seconds

Type: Histogram
Label:
Sample value: 26

special_days_requests_failures

Number of failed special days requests since DAS is running.

Unit:

Type: Counter
Label:
Sample value:

special_days_request_duration

Special days requests latency in seconds, since DAS is running (provided for each CCID).

Unit: seconds

Type: Histogram
Label:
Sample value: 34

external_requests_failures

Number of failed external requests since DAS is running.

Unit:

Type: Counter
Label:
Sample value:

external_requests_timedout

Number of timed out external requests since DAS is running.

Unit:

Type: Counter
Label:
Sample value:

external_requests_duration

External requests latency in seconds, since DAS is running.

Unit: seconds

Type: Histogram
Label:
Sample value:

das_http_request_duration_seconds

HTTP request latency in seconds (provided for each request type and CCID).

Unit: seconds

Type: Histogram
Label:
Sample value: 40

das_http_requests_total

Number of HTTP requests (provided for each request type and CCID).

Unit:

Type: Counter
Label:
Sample value: 40

nginx_metric_errors_total

Number of nginx-lua-prometheus errors.

Unit:

Type: Counter
Label:
Sample value: 2

Alerts[edit source]

The following alerts are defined for DAS.

Alert Severity Description Based on Threshold
CPUUtilization
(Alarm: Pod CPU Usage)
CRITICAL Triggered when a pod's CPU utilization is beyond the threshold. 75%
Default interval: 180s


MemoryUtilization
(Alarm: Pod Memory Usage)
CRITICAL Triggered when a pod's memory utilization is beyond the threshold. 75%
Default interval: 180s


containerRestartAlert
(Alarm: Pod Restarts Count)
CRITICAL Triggered when a pod's restart count is beyond the threshold. 5
Default interval: 180s


containerReadyAlert
(Alarm: Pod Ready Count)
CRITICAL Triggered when a pod's ready count is less than the threshold (1). 1
Default interval: 60s


AbsentAlert
(Alarm: Deployment availability)
CRITICAL Triggered when DAS pod metrics are unavailable. 1
Default interval: 60s


WorkspaceUtilization
(Alarm: Azure Fileshare PVC Usage)
HIGH Triggered when file share usage is greater than the threshold. 80%
Default interval: 180s


Health
(Alarm: Health Status)
CRITICAL Triggered when DAS health status is 0. 0
Default interval: 60s


WorkspaceHealth
(Alarm: Workspace Health Status)
CRITICAL Triggered when DAS is not able to communicate with the workspace. 0
Default interval: 60s


PHPHealth
(Alarm: PHP Health Status)
CRITICAL Triggered when Designer/DAS experiences a PHP Health check failure. 0
Default interval: 60s


ProxyHealth
(Alarm: Proxy Health Status)
CRITICAL Triggered when Designer/DAS experiences a Proxy Health check failure. 0
Default interval: 60s


HTTP5XXCount
(Alarm: Application 5XX Error)
HIGH Triggered when DAS exceeds the allowed 5xx error count threshold specified here. 10
Default interval: 180s


HTTP4XXCount
(Alarm: Application 4XX Error)
HIGH Triggered when DAS exceeds the 4xx error count threshold specified here. 100
Default interval: 180s


PhpLatency
(Alarm: DAS PHP Latency Alert)
HIGH Triggered when the average time taken by a PHP request is greater than the threshold (in seconds) specified here. 10s
Default interval: 180s


HTTPLatency
(Alarm: DAS HTTP Latency Alert)
HIGH Triggered when the average time taken by a HTTP request is greater than the threshold (in seconds) specified here. 10s
Default interval: 180s
Comments or questions about this documentation? Contact us for support!