DAS metrics and alerts
Find the metrics DAS exposes and the alerts defined for DAS.
| Service | CRD or annotations? | Port | Endpoint/Selector | Metrics update interval |
|---|---|---|---|---|
| DAS | ServiceMonitor | 8081 | selector:
matchLabels:
{{- include "das.serviceSelectorLabels" . | nindent 6 }}Path: |
10 seconds |
See details about:
Metrics[edit source]
Given below are some of the metrics exposed by the DAS service:
| Metric and description | Metric details | Indicator of |
|---|---|---|
| sdr_ Number of requests received since DAS is running (provided for each CCID). |
Unit: Type: Counter |
|
| sdr_ Number requests rejected since DAS is running (provided for each CCID). |
Unit: Type: Counter |
|
| data_ Number of failed data table requests since DAS is running (provided for each CCID). |
Unit: Type: Counter |
|
| data_ Data table requests latency in seconds, since DAS is running (provided for each CCID). |
Unit: seconds Type: Histogram |
|
| business_ Number of failed business hours requests since DAS is running. |
Unit: Type: Counter |
|
| business_ Business hours requests latency in seconds, since DAS is running (provided for each CCID). |
Unit: seconds Type: Histogram |
|
| special_ Number of failed special days requests since DAS is running. |
Unit: Type: Counter |
|
| special_ Special days requests latency in seconds, since DAS is running (provided for each CCID). |
Unit: seconds Type: Histogram |
|
| external_ Number of failed external requests since DAS is running. |
Unit: Type: Counter |
|
| external_ Number of timed out external requests since DAS is running. |
Unit: Type: Counter |
|
| external_ External requests latency in seconds, since DAS is running. |
Unit: seconds Type: Histogram |
|
| das_ HTTP request latency in seconds (provided for each request type and CCID). |
Unit: seconds Type: Histogram |
|
| das_ Number of HTTP requests (provided for each request type and CCID). |
Unit: Type: Counter |
|
| nginx_ Number of nginx-lua-prometheus errors. |
Unit: Type: Counter |
Alerts[edit source]
The following alerts are defined for DAS.
| Alert | Severity | Description | Based on | Threshold |
|---|---|---|---|---|
| CPUUtilization (Alarm: Pod CPU Usage) |
CRITICAL | Triggered when a pod's CPU utilization is beyond the threshold. | 75% Default interval: 180s
| |
| MemoryUtilization (Alarm: Pod Memory Usage) |
CRITICAL | Triggered when a pod's memory utilization is beyond the threshold. | 75% Default interval: 180s
| |
| containerRestartAlert (Alarm: Pod Restarts Count) |
CRITICAL | Triggered when a pod's restart count is beyond the threshold. | 5 Default interval: 180s
| |
| containerReadyAlert (Alarm: Pod Ready Count) |
CRITICAL | Triggered when a pod's ready count is less than the threshold (1). | 1 Default interval: 60s
| |
| AbsentAlert (Alarm: Deployment availability) |
CRITICAL | Triggered when DAS pod metrics are unavailable. | 1 Default interval: 60s
| |
| WorkspaceUtilization (Alarm: Azure Fileshare PVC Usage) |
HIGH | Triggered when file share usage is greater than the threshold. | 80% Default interval: 180s
| |
| Health (Alarm: Health Status) |
CRITICAL | Triggered when DAS health status is 0. | 0 Default interval: 60s
| |
| WorkspaceHealth (Alarm: Workspace Health Status) |
CRITICAL | Triggered when DAS is not able to communicate with the workspace. | 0 Default interval: 60s
| |
| PHPHealth (Alarm: PHP Health Status) |
CRITICAL | Triggered when Designer/DAS experiences a PHP Health check failure. | 0 Default interval: 60s
| |
| ProxyHealth (Alarm: Proxy Health Status) |
CRITICAL | Triggered when Designer/DAS experiences a Proxy Health check failure. | 0 Default interval: 60s
| |
| HTTP5XXCount (Alarm: Application 5XX Error) |
HIGH | Triggered when DAS exceeds the allowed 5xx error count threshold specified here. | 10 Default interval: 180s
| |
| HTTP4XXCount (Alarm: Application 4XX Error) |
HIGH | Triggered when DAS exceeds the 4xx error count threshold specified here. | 100 Default interval: 180s
| |
| PhpLatency (Alarm: DAS PHP Latency Alert) |
HIGH | Triggered when the average time taken by a PHP request is greater than the threshold (in seconds) specified here. | 10s Default interval: 180s
| |
| HTTPLatency (Alarm: DAS HTTP Latency Alert) |
HIGH | Triggered when the average time taken by a HTTP request is greater than the threshold (in seconds) specified here. | 10s Default interval: 180s |