Voice RQ Service metrics and alerts

This topic is part of the manual Voice Microservices Private Edition Guide for version Current of Voice Microservices.

Metrics[edit source]

You can query Prometheus directly to see all the metrics that the Voice RQ Service exposes. The following metrics are likely to be particularly useful. Genesys does not commit to maintain other currently available Voice RQ Service metrics not documented on this page.

Metric and description	Metric details	Indicator of
rqnode_clients Number of clients connected.	Unit: N/A Type: gauge Label: Sample value:	Traffic
rqnode_streams Number of active streams present.	Unit: N/A Type: gauge Label: Sample value:	Traffic
rqnode_xreads Number of XREAD requests received.	Unit: N/A Type: counter Label: Sample value:	Traffic
rqnode_xadds Number of XADD requests received.	Unit: N/A Type: counter Label: Sample value:	Traffic
rqnode_redis_state Current Redis connection state.	Unit: N/A Type: gauge Label: Sample value:	Errors
rqnode_redis_disconnects The number of Redis disconnects that occurred for the RQ node.	Unit: Type: counter Label: Sample value:	Errors
rqnode_consul_leader_error Number of errors received from Consul during the leadership process.	Unit: N/A Type: counter Label: Sample value:	Errors
rqnode_active_master Service master role is active.	Unit: N/A Type: gauge Label: Sample value:	Saturation
rqnode_active_backup Service backup role is active.	Unit: N/A Type: gauge Label: Sample value:	Saturation
rqnode_read_latency RQ latency; that is, the time duration between when an event is added to Redis and when it's read via XREAD.	Unit: Type: histogram Label: le, healthcheck Sample value:	Latency
rqnode_add_latency RQ latency; that is, the time duration between when a message is received and when it's added to the list.	Unit: Type: histogram Label: le, healthcheck Sample value:	Latency
rqnode_redis_latency Latency caused by Redis read/write.	Unit: Type: histogram Label: le Sample value:	Latency

Alerts[edit source]

The following alerts are defined for Voice RQ Service.

Alert	Severity	Description	Based on	Threshold
Number of Redis streams is too high	Warning	Too many active sessions. Actions: Check whether the horizontal pod autoscaler has triggered and if the maximum number of pods has reached. Check the number of voice, digital, and callback calls in the system.	rqnode_streams	More than 10000 active streams running.
Redis disconnected for 5 minutes	Warning	Redis is not available for the pod {{ $labels.pod }}. Actions: If the alarm is triggered for multiple services, make sure there are no issues with Redis, restart Redis. If the alarm is triggered only for the pod {{ $labels.pod }}, check to see if there is any issue with the pod.	redis_state	Redis is not available for the pod {{ $labels.pod }} for 5 minutes.
Redis disconnected for 10 minutes	Critical	Redis is not available for the pod {{ $labels.pod }}. Actions: If the alarm is triggered for multiple services, make sure there are no issues with Redis, and then restart Redis. If the alarm is triggered only for the pod {{ $labels.pod }}, check to see if there is any issue with the pod.	redis_state	Redis is not available for the pod {{ $labels.pod }} for 10 minutes.
Pod failed	Warning	Pod {{ $labels.pod }} failed. Actions: One of the containers in the pod has entered a Failed state. Check the Kibana logs for the reason.	kube_pod_status_phase	Pod {{ $labels.pod }} is in Failed state.
Pod Unknown state	Warning	Pod {{ $labels.pod }} in Unknown state. Actions: If the alarm is triggered for multiple services, make sure there are no issues with the Kubernetes cluster. If the alarm is triggered only for the pod {{ $labels.pod }}, check whether the image is correct and if the container is starting up.	kube_pod_status_phase	Pod {{ $labels.pod }} in Unknown state for 5 minutes.
Pod Pending state	Warning	Pod {{ $labels.pod }} is in the Pending state. Actions: If the alarm is triggered for multiple services, make sure the Kubernetes nodes where the pod is running are alive in the cluster. If the alarm is triggered only for the pod {{ $labels.pod }}, check the health of the pod.	kube_pod_status_phase	Pod {{ $labels.pod }} is in the Pending state for 5 minutes.
Pod not ready for 10 minutes	Critical	Pod {{ $labels.pod }} in NotReady state. Actions: If this alarm is triggered, check whether the CPU is available for the pods. Check whether the port of the pod is running and serving the request.	kube_pod_status_ready	Pod {{ $labels.pod }} in NotReady state for 10 minutes.
Container restored repeatedly	Critical	Container {{ $labels.container }} was repeatedly restarted. Actions: One of the containers in the pod has entered a failed state. Check the Kibana logs for the reason.	kube_pod_container_status_restarts_total	Container {{ $labels.container }} was restarted 5 or more times within 15 minutes.
Pod memory greater than 65%	Warning	High memory usage for pod {{ $labels.pod }}. Actions: Check whether the horizontal pod autoscaler has triggered and if the maximum number of pods has been reached. Check Grafana for abnormal load. Collect the service logs; raise an investigation ticket.	container_memory_working_set_bytes, kube_pod_container_resource_requests_memory_bytes	Container {{ $labels.container }} memory usage exceeded 65% for 5 minutes.
Pod memory greater than 80%	Critical	Critical memory usage for pod {{ $labels.pod }}. Actions: Check whether the horizontal pod autoscaler has triggered and if the maximum number of pods has been reached. Check Grafana for abnormal load. Restart the service. Collect the service logs; raise an investigation ticket.	container_memory_working_set_bytes, kube_pod_container_resource_requests_memory_bytes	Container {{ $labels.container }} memory usage exceeded 80% for 5 minutes.
Pod CPU greater than 65%	Warning	High CPU load for pod {{ $labels.pod }}. Actions: Check whether the horizontal pod autoscaler has triggered and if the maximum number of pods has been reached. Check Grafana for abnormal load. Collect the service logs; raise an investigation ticket	container_cpu_usage_seconds_total, container_spec_cpu_period	Container {{ $labels.container }} CPU usage exceeded 65% for 5 minutes.
Pod CPU greater than 80%	Critical	Critical CPU load for pod {{ $labels.pod }}. Actions: Check whether the horizontal pod autoscaler has triggered and if the maximum number of pods has been reached. Check Grafana for abnormal load. Restart the service. Collect the service logs; raise an investigation ticket.	container_cpu_usage_seconds_total, container_spec_cpu_period	Container {{ $labels.container }} CPU usage exceeded 80% for 5 minutes.

Voice Microservices Private Edition Guide

Overview

Configure and deploy

Configure and deploy Voicemail

Observability

Functionality

Voice RQ Service metrics and alerts

Contents

Metrics[edit source]

Alerts[edit source]