Difference between revisions of "PEC-OU/Current/CXCPEGuide/APIAMetrics"

Latest revision as of 14:23, February 7, 2022

This topic is part of the manual Outbound (CX Contact) Private Edition Guide for version Current of Outbound (CX Contact).

Metrics[edit source]

Here are some of the metrics exposed by API aggregator.

Metric and description	Metric details	Indicator of
cxc_api_aggregator_schedules_created_total Total schedules created.	Unit: Type: Counter Label: Sample value: 42
cxc_api_aggregator_schedules_removed_total Total schedules removed.	Unit: Type: Counter Label: Sample value: 42
cxc_api_aggregator_campaign_template_created_total Total campaign templates created.	Unit: Type: Counter Label: Sample value: 42
cxc_api_aggregator_campaign_template_removed_total Total campaign templates removed.	Unit: Type: Counter Label: Sample value: 42
cxc_api_aggregator_users_logged_in_total Total number of users who are logged in.	Unit: Type: Gauge Label: Sample value: 4.2
cxc_api_aggregator_users_logged_out_total Total number of users who are logged out.	Unit: Type: Gauge Label: Sample value: 4.2
cxc_api_aggregator_api_requests_total Total count of requests.	Unit: Type: Counter Label: "'ccid', 'tenant_name'" Sample value: 42
cxc_api_healthy_instance Healthy instance.	Unit: Type: Gauge Label: "'ccid', 'tenant_name'" Sample value: 4.2
cxc_api_aggregator_api_requests_processed_success Total count of success requests.	Unit: Type: Counter Label: "'ccid', 'tenant_name'" Sample value: 42
cxc_api_aggregator_top_api_requests Top api requests.	Unit: Type: Counter Label: "'path', 'method', 'id', 'name', 'ccid', 'tenant_name', 'code'" Sample value: 42
cxc_api_aggregator_redis_connection_failed Failed Redis connection.	Unit: Type: Gauge Label: "'ccid', 'tenant_name'" Sample value: 4.2
cxc_api_aggregator_request_count Total requests by verb and code.	Unit: Type: Counter Label: "'method', 'path', 'code'" Sample value: 42
cxc_api_aggregator_request_latencies_ms Request latencies histogram by verb, in milliseconds.	Unit: Type: Histogram Label: "'method', 'path', 'code'" Sample value: [1, 2, 3]
cxc_api_aggregator_request_out_count Total out requests by verb, destination and code.	Unit: Type: Counter Label: "'method', 'destination', 'code'" Sample value: 42
cxc_api_aggregator_request_out_latencies_ms Out Request latencies histogram by verb, destination and code, in milliseconds.	Unit: Type: Histogram Label: "'method', 'destination', 'code'" Sample value: [1, 2, 3]
cxc_api_aggregator_elasticsearch_service_latencies_ms Elasticsearch Request latencies histogram by verb, destination and code, in milliseconds.	Unit: Type: Histogram Label: "'method', 'destination', 'code'" Sample value: [1, 2, 3]

Alerts[edit source]

The following alerts are defined for API Aggregator.

Alert	Severity	Description	Threshold
CXC-API-LatencyHigh	HIGH	Triggered when the latency for API responses is beyond the defined threshold.	2500ms for 5m
CXC-API-Redis-Connection-Failed	HIGH	Triggered when the connection to redis fails for more than 1 minute.	1m
CXC-EXT-Ingress-Error-Rate	HIGH	Triggered when the Ingress error rate is above the specified threshold.	20% for 5m
cxc_api_too_many_errors_from_auth	HIGH	Triggered when there are too many error responses from the auth service for more than the specified time threshold.	1m
CXC-CPUUsage	HIGH	Triggered when the CPU utilization of a pod is beyond the threshold	300% for 5m
CXC-MemoryUsage	HIGH	Triggered when the memory utilization of a pod is beyond the threshold.	70% for 5m
CXC-PodNotReadyCount	HIGH	Triggered when the number of pods ready for a CX Contact deployment is less than or equal to the threshold.	1 for 5m
CXC-PodRestartsCount	HIGH	Triggered when the restart count for a pod is beyond the threshold.	1 for 5m
CXC-MemoryUsagePD	HIGH	Triggered when the memory usage of a pod is above the critical threshold.	90% for 5m
CXC-PodRestartsCountPD	HIGH	Triggered when the restart count is beyond the critical threshold.	5 for 5m
CXC-PodsNotReadyPD	HIGH	Triggered when there are no pods ready for CX Contact deployment.	0 for 1m

@@ Line 6: / Line 6: @@
 |Endpoint=/metrics
 |MetricsUpdateInterval=15 seconds
-|MetricsIntro=Add some introductory text... TBD.
+|MetricsDefined=Yes
+|MetricsIntro=Here are some of the metrics exposed by API aggregator.
 |PEMetric={{PEMetric
 |Metric=cxc_api_aggregator_schedules_created_total
 |Type=Counter
-|Label="'ccid', 'tenant_name'"
 |MetricDescription=Total schedules created.
 |SampleValue=42
@@ Line 16: / Line 16: @@
 |Metric=cxc_api_aggregator_schedules_removed_total
 |Type=Counter
-|Label="'ccid', 'tenant_name'"
 |MetricDescription=Total schedules removed.
 |SampleValue=42
@@ Line 22: / Line 21: @@
 |Metric=cxc_api_aggregator_campaign_template_created_total
 |Type=Counter
-|Label="'ccid', 'tenant_name'"
 |MetricDescription=Total campaign templates created.
 |SampleValue=42
@@ Line 33: / Line 31: @@
 |Metric=cxc_api_aggregator_users_logged_in_total
 |Type=Gauge
-|Label="'ccid', 'tenant_name'"
+|MetricDescription=Total number of users who are logged in.
-|MetricDescription=Total logged in users.
 |SampleValue=4.2
 }}{{PEMetric
 |Metric=cxc_api_aggregator_users_logged_out_total
 |Type=Gauge
-|Label="'ccid', 'tenant_name', 'service_name'"
+|MetricDescription=Total number of users who are logged out.
-|MetricDescription=Total logged out users.
 |SampleValue=4.2
 }}{{PEMetric
@@ Line 104: / Line 100: @@
 }}
 |AlertsDefined=Yes
+|PEAlert={{PEAlert
+|Alert=CXC-API-LatencyHigh
+|Severity=HIGH
+|AlertDescription=Triggered when the latency for API responses is beyond the defined threshold.
+|Threshold=2500ms for 5m
+}}{{PEAlert
+|Alert=CXC-API-Redis-Connection-Failed
+|Severity=HIGH
+|AlertDescription=Triggered when the connection to redis fails for more than 1 minute.
+|Threshold=1m
+}}{{PEAlert
+|Alert=CXC-EXT-Ingress-Error-Rate
+|Severity=HIGH
+|AlertDescription=Triggered when the Ingress error rate is above the specified threshold.
+|Threshold=20% for 5m
+}}{{PEAlert
+|Alert=cxc_api_too_many_errors_from_auth
+|Severity=HIGH
+|AlertDescription=Triggered when there are too many error responses from the auth service for more than the specified time threshold.
+|Threshold=1m
+}}{{PEAlert
+|Alert=CXC-CPUUsage
+|Severity=HIGH
+|AlertDescription=Triggered when the CPU utilization of a pod is beyond the threshold
+|Threshold=300% for 5m
+}}{{PEAlert
+|Alert=CXC-MemoryUsage
+|Severity=HIGH
+|AlertDescription=Triggered when the memory utilization of a pod is beyond the threshold.
+|Threshold=70% for 5m
+}}{{PEAlert
+|Alert=CXC-PodNotReadyCount
+|Severity=HIGH
+|AlertDescription=Triggered when the number of pods ready for a CX Contact deployment is less than or equal to the threshold.
+|Threshold=1 for 5m
+}}{{PEAlert
+|Alert=CXC-PodRestartsCount
+|Severity=HIGH
+|AlertDescription=Triggered when the restart count for a pod is beyond the threshold.
+|Threshold=1 for 5m
+}}{{PEAlert
+|Alert=CXC-MemoryUsagePD
+|Severity=HIGH
+|AlertDescription=Triggered when the memory usage of a pod is above the critical threshold.
+|Threshold=90% for 5m
+}}{{PEAlert
+|Alert=CXC-PodRestartsCountPD
+|Severity=HIGH
+|AlertDescription=Triggered when the restart count is beyond the critical threshold.
+|Threshold=5 for 5m
+}}{{PEAlert
+|Alert=CXC-PodsNotReadyPD
+|Severity=HIGH
+|AlertDescription=Triggered when there are no pods ready for CX Contact deployment.
+|Threshold=0 for 1m
+}}
 }}

Outbound (CX Contact) Private Edition Guide

Overview

Configure and deploy

Upgrade, roll back, or uninstall

Observability

Difference between revisions of "PEC-OU/Current/CXCPEGuide/APIAMetrics"

Latest revision as of 14:23, February 7, 2022

Contents

Metrics[edit source]

Alerts[edit source]