Call State Service metrics and alerts
Find the metrics Call State Service exposes and the alerts defined for Call State Service.
Service | CRD or annotations? | Port | Endpoint/Selector | Metrics update interval |
---|---|---|---|---|
Call State Service | Supports both CRD and annotations | 11900 | http://<pod-ipaddress>:11900/metrics | 30 seconds |
See details about:
Metrics[edit source]
Voice Call State Service exposes Genesys-defined, Call State Service–specific metrics as well as some standard Kafka metrics. You can query Prometheus directly to see all the metrics that the Call State Service exposes. The following metrics are likely to be particularly useful. Genesys does not commit to maintain other currently available Call State Service metrics not documented on this page.
Metric and description | Metric details | Indicator of |
---|---|---|
callthread_ Number of monitored call threads. |
Unit: N/A Type: counter |
Saturation |
callthread_ Status of the envoy proxy: -1 - error |
Unit: N/A Type: gauge |
|
callthread_ Health level of the agent node: -1 - error |
Unit: N/A Type: gauge |
|
callthread_ Generic error during health check. |
Unit: N/A Type: gauge |
|
callthread_ Current Redis connection state: -1 – error |
Unit: N/A Type: gauge |
Errors |
http_ HTTP client time from request to response, in seconds. |
Unit: seconds Type: histogram |
|
http_ The number of HTTP client responses received. |
Unit: N/A Type: counter |
|
kafka_ Number of messages received from Kafka. |
Unit: N/A Type: counter |
Traffic |
kafka_ Number of Kafka consumer errors. |
Unit: N/A Type: counter |
Errors |
kafka_ Consumer latency is the time difference between when the message is produced and when the message is consumed. That is, the time when the consumer received the message minus the time when the producer produced the message. |
Unit: Type: histogram |
Latency |
kafka_ Number of Kafka consumer re-balance events. |
Unit: N/A Type: counter |
|
kafka_ Current state of Kafka consumer. |
Unit: N/A Type: gauge |
|
kafka_ Number of messages received from Kafka. |
Unit: N/A Type: counter |
Traffic |
kafka_ Number of Kafka producer pending events. |
Unit: N/A Type: gauge |
Saturation |
kafka_ Age of the oldest producer pending event, in seconds. |
Unit: seconds Type: gauge |
|
kafka_ Number of Kafka producer errors. |
Unit: N/A Type: counter |
Errors |
kafka_ Current state of the Kafka producer. |
Unit: N/A Type: gauge |
|
log_ Total amount of log output, in bytes. |
Unit: bytes Type: counter |
Alerts[edit source]
The following alerts are defined for Call State Service.
Alert | Severity | Description | Based on | Threshold |
---|---|---|---|---|
Kafka events latency is too high | Critical | Actions:
|
kafka_consumer_latency_bucket | Latency for more than 5% of messages is more than 0.5 seconds for topic {{ $labels.topic }}.
|
Too many Kafka consumer failed health checks | Warning | Actions:
|
kafka_consumer_error_total | Health check failed more than 10 times in 5 minutes for Kafka consumer for topic {{ $labels.topic }}.
|
Too many Kafka consumer request timeouts | Warning | Actions:
|
kafka_consumer_error_total | More than 10 request timeouts appeared in 5 minutes for Kafka consumer for topic {{ $labels.topic }}.
|
Too many Kafka consumer crashes | Critical | Actions:
|
kafka_consumer_error_total | More than 3 Kafka consumer crashes in 5 minutes for topic {{ $labels.topic }}.
|
Pod status Failed | Warning | Actions:
|
kube_pod_status_phase | Pod {{ $labels.pod }} is in Failed state.
|
Pod status Unknown | Warning | Actions:
|
kube_pod_status_phase | Pod {{ $labels.pod }} is in Unknown state for 5 minutes.
|
Pod status Pending | Warning | Actions:
|
kube_pod_status_phase | Pod {{ $labels.pod }} is in Pending state for 5 minutes.
|
Pod status NotReady | Critical | Actions:
|
kube_pod_status_ready | Pod {{ $labels.pod }} is in NotReady status for 5 minutes.
|
Container restarted repeatedly | Critical | Actions:
|
kube_pod_container_status_restarts_total | Container {{ $labels.container }} was restarted 5 or more times within 15 minutes.
|
Max replicas is not sufficient for 5 mins | Critical | The desired number of replicas is higher than the current available replicas for the past 5 minutes. | kube_statefulset_replicas, kube_statefulset_status_replicas | The desired number of replicas is higher than the current available replicas for the past 5 minutes.
|
Kafka not available | Critical | Actions:
|
kafka_producer_state, kafka_consumer_state | Kafka is not available for pod {{ $labels.pod }} for 5 consecutive minutes.
|
Redis not available | Critical | Actions:
|
callthread_redis_state | Redis is not available for pod {{ $labels.pod }} for 5 consecutive minutes.
|
Pod CPU greater than 65% | Warning | High CPU load for pod {{ $labels.pod }}. | container_cpu_usage_seconds_total, container_spec_cpu_period | Container {{ $labels.container }} CPU usage exceeded 65% for 5 minutes.
|
Pod CPU greater than 80% | Critical | Critical CPU load for pod {{ $labels.pod }}. | container_cpu_usage_seconds_total, container_spec_cpu_period | Container {{ $labels.container }} CPU usage exceeded 80% for 5 minutes.
|
Pod memory greater than 65% | Warning | High memory usage for pod {{ $labels.pod }}. | container_memory_working_set_bytes, kube_pod_container_resource_requests_memory_bytes | Container {{ $labels.container }} memory usage exceeded 65% for 5 minutes.
|
Pod memory greater than 80% | Critical | Critical memory usage for pod {{ $labels.pod }}. | container_memory_working_set_bytes, kube_pod_container_resource_requests_memory_bytes | Container {{ $labels.container }} memory usage exceeded 80% for 5 minutes.
|
Too many Kafka pending events | Critical | Actions:
|
kafka_producer_queue_depth | Too many Kafka producer pending events for service {{ $labels.container }} (more than 100 in 5 minutes). |