Difference between revisions of "PEC-OU/Current/CXCPEGuide/APIAMetrics"
(Published) |
(Published) |
||
Line 6: | Line 6: | ||
|Endpoint=/metrics | |Endpoint=/metrics | ||
|MetricsUpdateInterval=15 seconds | |MetricsUpdateInterval=15 seconds | ||
− | |MetricsIntro= | + | |MetricsDefined=Yes |
+ | |MetricsIntro=Here are some of the metrics exposed by API aggregator. | ||
|PEMetric={{PEMetric | |PEMetric={{PEMetric | ||
|Metric=cxc_api_aggregator_schedules_created_total | |Metric=cxc_api_aggregator_schedules_created_total | ||
|Type=Counter | |Type=Counter | ||
− | |||
|MetricDescription=Total schedules created. | |MetricDescription=Total schedules created. | ||
|SampleValue=42 | |SampleValue=42 | ||
Line 16: | Line 16: | ||
|Metric=cxc_api_aggregator_schedules_removed_total | |Metric=cxc_api_aggregator_schedules_removed_total | ||
|Type=Counter | |Type=Counter | ||
− | |||
|MetricDescription=Total schedules removed. | |MetricDescription=Total schedules removed. | ||
|SampleValue=42 | |SampleValue=42 | ||
Line 22: | Line 21: | ||
|Metric=cxc_api_aggregator_campaign_template_created_total | |Metric=cxc_api_aggregator_campaign_template_created_total | ||
|Type=Counter | |Type=Counter | ||
− | |||
|MetricDescription=Total campaign templates created. | |MetricDescription=Total campaign templates created. | ||
|SampleValue=42 | |SampleValue=42 | ||
Line 33: | Line 31: | ||
|Metric=cxc_api_aggregator_users_logged_in_total | |Metric=cxc_api_aggregator_users_logged_in_total | ||
|Type=Gauge | |Type=Gauge | ||
− | + | |MetricDescription=Total number of users who are logged in. | |
− | |MetricDescription=Total logged in | ||
|SampleValue=4.2 | |SampleValue=4.2 | ||
}}{{PEMetric | }}{{PEMetric | ||
|Metric=cxc_api_aggregator_users_logged_out_total | |Metric=cxc_api_aggregator_users_logged_out_total | ||
|Type=Gauge | |Type=Gauge | ||
− | + | |MetricDescription=Total number of users who are logged out. | |
− | |MetricDescription=Total logged out | ||
|SampleValue=4.2 | |SampleValue=4.2 | ||
}}{{PEMetric | }}{{PEMetric | ||
Line 104: | Line 100: | ||
}} | }} | ||
|AlertsDefined=Yes | |AlertsDefined=Yes | ||
+ | |PEAlert={{PEAlert | ||
+ | |Alert=CXC-API-LatencyHigh | ||
+ | |Severity=HIGH | ||
+ | |AlertDescription=Triggered when the latency for API responses is beyond the defined threshold. | ||
+ | |Threshold=2500ms for 5m | ||
+ | }}{{PEAlert | ||
+ | |Alert=CXC-API-Redis-Connection-Failed | ||
+ | |Severity=HIGH | ||
+ | |AlertDescription=Triggered when the connection to redis fails for more than 1 minute. | ||
+ | |Threshold=1m | ||
+ | }}{{PEAlert | ||
+ | |Alert=CXC-EXT-Ingress-Error-Rate | ||
+ | |Severity=HIGH | ||
+ | |AlertDescription=Triggered when the Ingress error rate is above the specified threshold. | ||
+ | |Threshold=20% for 5m | ||
+ | }}{{PEAlert | ||
+ | |Alert=cxc_api_too_many_errors_from_auth | ||
+ | |Severity=HIGH | ||
+ | |AlertDescription=Triggered when there are too many error responses from the auth service for more than the specified time threshold. | ||
+ | |Threshold=1m | ||
+ | }}{{PEAlert | ||
+ | |Alert=CXC-CPUUsage | ||
+ | |Severity=HIGH | ||
+ | |AlertDescription=Triggered when the CPU utilization of a pod is beyond the threshold | ||
+ | |Threshold=300% for 5m | ||
+ | }}{{PEAlert | ||
+ | |Alert=CXC-MemoryUsage | ||
+ | |Severity=HIGH | ||
+ | |AlertDescription=Triggered when the memory utilization of a pod is beyond the threshold. | ||
+ | |Threshold=70% for 5m | ||
+ | }}{{PEAlert | ||
+ | |Alert=CXC-PodNotReadyCount | ||
+ | |Severity=HIGH | ||
+ | |AlertDescription=Triggered when the number of pods ready for a CX Contact deployment is less than or equal to the threshold. | ||
+ | |Threshold=1 for 5m | ||
+ | }}{{PEAlert | ||
+ | |Alert=CXC-PodRestartsCount | ||
+ | |Severity=HIGH | ||
+ | |AlertDescription=Triggered when the restart count for a pod is beyond the threshold. | ||
+ | |Threshold=1 for 5m | ||
+ | }}{{PEAlert | ||
+ | |Alert=CXC-MemoryUsagePD | ||
+ | |Severity=HIGH | ||
+ | |AlertDescription=Triggered when the memory usage of a pod is above the critical threshold. | ||
+ | |Threshold=90% for 5m | ||
+ | }}{{PEAlert | ||
+ | |Alert=CXC-PodRestartsCountPD | ||
+ | |Severity=HIGH | ||
+ | |AlertDescription=Triggered when the restart count is beyond the critical threshold. | ||
+ | |Threshold=5 for 5m | ||
+ | }}{{PEAlert | ||
+ | |Alert=CXC-PodsNotReadyPD | ||
+ | |Severity=HIGH | ||
+ | |AlertDescription=Triggered when there are no pods ready for CX Contact deployment. | ||
+ | |Threshold=0 for 1m | ||
+ | }} | ||
}} | }} |
Latest revision as of 14:23, February 7, 2022
Find the metrics APIA exposes and the alerts defined for APIA.
Service | CRD or annotations? | Port | Endpoint/Selector | Metrics update interval |
---|---|---|---|---|
API Aggregator | ServiceMonitor | 9102 | /metrics | 15 seconds |
See details about:
Metrics[edit source]
Here are some of the metrics exposed by API aggregator.
Metric and description | Metric details | Indicator of |
---|---|---|
cxc_ Total schedules created. |
Unit: Type: Counter |
|
cxc_ Total schedules removed. |
Unit: Type: Counter |
|
cxc_ Total campaign templates created. |
Unit: Type: Counter |
|
cxc_ Total campaign templates removed. |
Unit: Type: Counter |
|
cxc_ Total number of users who are logged in. |
Unit: Type: Gauge |
|
cxc_ Total number of users who are logged out. |
Unit: Type: Gauge |
|
cxc_ Total count of requests. |
Unit: Type: Counter |
|
cxc_ Healthy instance. |
Unit: Type: Gauge |
|
cxc_ Total count of success requests. |
Unit: Type: Counter |
|
cxc_ Top api requests. |
Unit: Type: Counter |
|
cxc_ Failed Redis connection. |
Unit: Type: Gauge |
|
cxc_ Total requests by verb and code. |
Unit: Type: Counter |
|
cxc_ Request latencies histogram by verb, in milliseconds. |
Unit: Type: Histogram |
|
cxc_ Total out requests by verb, destination and code. |
Unit: Type: Counter |
|
cxc_ Out Request latencies histogram by verb, destination and code, in milliseconds. |
Unit: Type: Histogram |
|
cxc_ Elasticsearch Request latencies histogram by verb, destination and code, in milliseconds. |
Unit: Type: Histogram |
Alerts[edit source]
The following alerts are defined for API Aggregator.
Alert | Severity | Description | Based on | Threshold |
---|---|---|---|---|
CXC-API-LatencyHigh | HIGH | Triggered when the latency for API responses is beyond the defined threshold. | 2500ms for 5m
| |
CXC-API-Redis-Connection-Failed | HIGH | Triggered when the connection to redis fails for more than 1 minute. | 1m
| |
CXC-EXT-Ingress-Error-Rate | HIGH | Triggered when the Ingress error rate is above the specified threshold. | 20% for 5m
| |
cxc_api_too_many_errors_from_auth | HIGH | Triggered when there are too many error responses from the auth service for more than the specified time threshold. | 1m
| |
CXC-CPUUsage | HIGH | Triggered when the CPU utilization of a pod is beyond the threshold | 300% for 5m
| |
CXC-MemoryUsage | HIGH | Triggered when the memory utilization of a pod is beyond the threshold. | 70% for 5m
| |
CXC-PodNotReadyCount | HIGH | Triggered when the number of pods ready for a CX Contact deployment is less than or equal to the threshold. | 1 for 5m
| |
CXC-PodRestartsCount | HIGH | Triggered when the restart count for a pod is beyond the threshold. | 1 for 5m
| |
CXC-MemoryUsagePD | HIGH | Triggered when the memory usage of a pod is above the critical threshold. | 90% for 5m
| |
CXC-PodRestartsCountPD | HIGH | Triggered when the restart count is beyond the critical threshold. | 5 for 5m
| |
CXC-PodsNotReadyPD | HIGH | Triggered when there are no pods ready for CX Contact deployment. | 0 for 1m |