Difference between revisions of "DES/Current/DESPEGuide/DAS Metrics"
(Published) |
(Published) |
||
Line 29: | Line 29: | ||
|Metric=data_tables_request_duration | |Metric=data_tables_request_duration | ||
|Type=Histogram | |Type=Histogram | ||
− | |MetricDescription= | + | |Unit=seconds |
+ | |MetricDescription=Data table requests latency in seconds, since DAS is running (provided for each CCID). | ||
|SampleValue=189 | |SampleValue=189 | ||
}}{{PEMetric | }}{{PEMetric | ||
Line 38: | Line 39: | ||
|Metric=business_hours_request_duration | |Metric=business_hours_request_duration | ||
|Type=Histogram | |Type=Histogram | ||
− | |MetricDescription= | + | |Unit=seconds |
+ | |MetricDescription=Business hours requests latency in seconds, since DAS is running (provided for each CCID). | ||
|SampleValue=26 | |SampleValue=26 | ||
}}{{PEMetric | }}{{PEMetric | ||
Line 47: | Line 49: | ||
|Metric=special_days_request_duration | |Metric=special_days_request_duration | ||
|Type=Histogram | |Type=Histogram | ||
− | |MetricDescription= | + | |Unit=seconds |
+ | |MetricDescription=Special days requests latency in seconds, since DAS is running (provided for each CCID). | ||
|SampleValue=34 | |SampleValue=34 | ||
}}{{PEMetric | }}{{PEMetric | ||
Line 60: | Line 63: | ||
|Metric=external_requests_duration | |Metric=external_requests_duration | ||
|Type=Histogram | |Type=Histogram | ||
− | |MetricDescription= | + | |Unit=seconds |
+ | |MetricDescription=External requests latency in seconds, since DAS is running. | ||
}}{{PEMetric | }}{{PEMetric | ||
|Metric=das_http_request_duration_seconds | |Metric=das_http_request_duration_seconds | ||
|Type=Histogram | |Type=Histogram | ||
|Unit=seconds | |Unit=seconds | ||
− | |MetricDescription=HTTP request latency in seconds (provided for each request type and CCID) | + | |MetricDescription=HTTP request latency in seconds (provided for each request type and CCID). |
|SampleValue=40 | |SampleValue=40 | ||
}}{{PEMetric | }}{{PEMetric |
Latest revision as of 20:08, February 15, 2022
Find the metrics DAS exposes and the alerts defined for DAS.
Service | CRD or annotations? | Port | Endpoint/Selector | Metrics update interval |
---|---|---|---|---|
DAS | ServiceMonitor | 8081 | selector:
matchLabels:
{{- include "das.serviceSelectorLabels" . | nindent 6 }} Path: |
10 seconds |
See details about:
Metrics[edit source]
Given below are some of the metrics exposed by the DAS service:
Metric and description | Metric details | Indicator of |
---|---|---|
sdr_ Number of requests received since DAS is running (provided for each CCID). |
Unit: Type: Counter |
|
sdr_ Number requests rejected since DAS is running (provided for each CCID). |
Unit: Type: Counter |
|
data_ Number of failed data table requests since DAS is running (provided for each CCID). |
Unit: Type: Counter |
|
data_ Data table requests latency in seconds, since DAS is running (provided for each CCID). |
Unit: seconds Type: Histogram |
|
business_ Number of failed business hours requests since DAS is running. |
Unit: Type: Counter |
|
business_ Business hours requests latency in seconds, since DAS is running (provided for each CCID). |
Unit: seconds Type: Histogram |
|
special_ Number of failed special days requests since DAS is running. |
Unit: Type: Counter |
|
special_ Special days requests latency in seconds, since DAS is running (provided for each CCID). |
Unit: seconds Type: Histogram |
|
external_ Number of failed external requests since DAS is running. |
Unit: Type: Counter |
|
external_ Number of timed out external requests since DAS is running. |
Unit: Type: Counter |
|
external_ External requests latency in seconds, since DAS is running. |
Unit: seconds Type: Histogram |
|
das_ HTTP request latency in seconds (provided for each request type and CCID). |
Unit: seconds Type: Histogram |
|
das_ Number of HTTP requests (provided for each request type and CCID). |
Unit: Type: Counter |
|
nginx_ Number of nginx-lua-prometheus errors. |
Unit: Type: Counter |
Alerts[edit source]
The following alerts are defined for DAS.
Alert | Severity | Description | Based on | Threshold |
---|---|---|---|---|
CPUUtilization (Alarm: Pod CPU Usage) |
CRITICAL | Triggered when a pod's CPU utilization is beyond the threshold. | 75% Default interval: 180s
| |
MemoryUtilization (Alarm: Pod Memory Usage) |
CRITICAL | Triggered when a pod's memory utilization is beyond the threshold. | 75% Default interval: 180s
| |
containerRestartAlert (Alarm: Pod Restarts Count) |
CRITICAL | Triggered when a pod's restart count is beyond the threshold. | 5 Default interval: 180s
| |
containerReadyAlert (Alarm: Pod Ready Count) |
CRITICAL | Triggered when a pod's ready count is less than the threshold (1). | 1 Default interval: 60s
| |
AbsentAlert (Alarm: Deployment availability) |
CRITICAL | Triggered when DAS pod metrics are unavailable. | 1 Default interval: 60s
| |
WorkspaceUtilization (Alarm: Azure Fileshare PVC Usage) |
HIGH | Triggered when file share usage is greater than the threshold. | 80% Default interval: 180s
| |
Health (Alarm: Health Status) |
CRITICAL | Triggered when DAS health status is 0. | 0 Default interval: 60s
| |
WorkspaceHealth (Alarm: Workspace Health Status) |
CRITICAL | Triggered when DAS is not able to communicate with the workspace. | 0 Default interval: 60s
| |
PHPHealth (Alarm: PHP Health Status) |
CRITICAL | Triggered when Designer/DAS experiences a PHP Health check failure. | 0 Default interval: 60s
| |
ProxyHealth (Alarm: Proxy Health Status) |
CRITICAL | Triggered when Designer/DAS experiences a Proxy Health check failure. | 0 Default interval: 60s
| |
HTTP5XXCount (Alarm: Application 5XX Error) |
HIGH | Triggered when DAS exceeds the allowed 5xx error count threshold specified here. | 10 Default interval: 180s
| |
HTTP4XXCount (Alarm: Application 4XX Error) |
HIGH | Triggered when DAS exceeds the 4xx error count threshold specified here. | 100 Default interval: 180s
| |
PhpLatency (Alarm: DAS PHP Latency Alert) |
HIGH | Triggered when the average time taken by a PHP request is greater than the threshold (in seconds) specified here. | 10s Default interval: 180s
| |
HTTPLatency (Alarm: DAS HTTP Latency Alert) |
HIGH | Triggered when the average time taken by a HTTP request is greater than the threshold (in seconds) specified here. | 10s Default interval: 180s |