Cargo query

Showing below up to 500 results in range #101 to #600.

View (previous 500 | next 500) (20 | 50 | 100 | 250 | 500)

Page	Alert	Severity	AlertDescription	BasedOn	Threshold
Draft:GVP/Current/GVPPEGuide/GVP MCP Metrics	NGI_LOG_FETCH_RESOURCE_TIMEOUT	MEDIUM	Number of VXMLi fetch timeouts exceeded limit	gvp_mcp_log_parser_eror_total {LogID="40026",endpoint="mcplog"...}	1min
Draft:GVP/Current/GVPPEGuide/GVP MCP Metrics	NGI_LOG_PARSE_ERROR	WARNING	Number of VXMLi parse errors exceeded limit	gvp_mcp_log_parser_eror_total {LogID="40028",endpoint="mcplog"...}	1min
Draft:GVP/Current/GVPPEGuide/Reporting Server Metrics	ContainerCPUreached80percent	HIGH	The trigger will flag an alarm when the RS container CPU utilization goes beyond 80% for 15 mins	container_cpu_usage_seconds_total, container_spec_cpu_quota, container_spec_cpu_period	15mins
Draft:GVP/Current/GVPPEGuide/Reporting Server Metrics	ContainerMemoryUsage80percent	HIGH	The trigger will flag an alarm when the RS container Memory utilization goes beyond 80% for 15 mins	container_memory_usage_bytes, kube_pod_container_resource_limits_memory_bytes	15mins
Draft:GVP/Current/GVPPEGuide/Reporting Server Metrics	ContainerRestartedRepeatedly	CRITICAL	The trigger will flag an alarm when the RS or RS SNMP container gets restarted 5 or more times within 15 mins	kube_pod_container_status_restarts_total	15mins
Draft:GVP/Current/GVPPEGuide/Reporting Server Metrics	InitContainerFailingRepeatedly	CRITICAL	The trigger will flag an alarm when the RS init container gets failed 5 or more times within 15 mins	kube_pod_init_container_status_restarts_total	15mins
Draft:GVP/Current/GVPPEGuide/Reporting Server Metrics	PodStatusNotReady	CRITICAL	The trigger will flag an alarm when RS pod status is Not ready for 30 mins and this will be controlled through override-value.yaml file.	kube_pod_status_ready	30mins
Draft:GVP/Current/GVPPEGuide/Reporting Server Metrics	PVC50PercentFilled	HIGH	This trigger will flag an alarm when the RS PVC size is 50% filled	kubelet_volume_stats_used_bytes, kubelet_volume_stats_capacity_bytes	15mins
Draft:GVP/Current/GVPPEGuide/Reporting Server Metrics	PVC80PercentFilled	CRITICAL	This trigger will flag an alarm when the RS PVC size is 80% filled	kubelet_volume_stats_used_bytes, kubelet_volume_stats_capacity_bytes	5mins
Draft:GVP/Current/GVPPEGuide/Reporting Server Metrics	RSQueueSizeCritical	HIGH	The trigger will flag an alarm when RS JMS message queue size goes beyond 15000 (3GB approx. backlog) for 15 mins	rsQueueSize	15mins
Draft:GVP/Current/GVPPEGuide/Resource Manager Metrics	ContainerCPUreached80percentForRM0	HIGH	The trigger will flag an alarm when the RM container CPU utilization goes beyond 80% for 15 mins	container_cpu_usage_seconds_total, container_spec_cpu_quota, container_spec_cpu_period	15mins
Draft:GVP/Current/GVPPEGuide/Resource Manager Metrics	ContainerCPUreached80percentForRM1	HIGH	The trigger will flag an alarm when the RM container CPU utilization goes beyond 80% for 15 mins	container_cpu_usage_seconds_total, container_spec_cpu_quota, container_spec_cpu_period	15mins
Draft:GVP/Current/GVPPEGuide/Resource Manager Metrics	ContainerMemoryUsage80percentForRM0	HIGH	The trigger will flag an alarm when the RM container Memory utilization goes beyond 80% for 15 mins	container_memory_rss, kube_pod_container_resource_limits_memory_bytes	15mins
Draft:GVP/Current/GVPPEGuide/Resource Manager Metrics	ContainerMemoryUsage80percentForRM1	HIGH	The trigger will flag an alarm when the RM container Memory utilization goes beyond 80% for 15 mins	container_memory_rss, kube_pod_container_resource_limits_memory_bytes	15mins
Draft:GVP/Current/GVPPEGuide/Resource Manager Metrics	ContainerRestartedRepeatedly	CRITICAL	The trigger will flag an alarm when the RM or RM SNMP container gets restarted 5 or more times within 15 mins	kube_pod_container_status_restarts_total	15 mins
Draft:GVP/Current/GVPPEGuide/Resource Manager Metrics	InitContainerFailingRepeatedly	CRITICAL	The trigger will flag an alarm when the RM init container gets failed 5 or more times within 15 mins.	kube_pod_init_container_status_restarts_total	15 mins
Draft:GVP/Current/GVPPEGuide/Resource Manager Metrics	MCPPortsExceeded	HIGH	All the MCP ports in MCP LRG are exceeded	gvp_rm_log_parser_eror_total	1min
Draft:GVP/Current/GVPPEGuide/Resource Manager Metrics	PodStatusNotReady	CRITICAL	The trigger will flag an alarm when RM pod status is Not ready for 30 mins and this will be controlled by override-value.yaml.	kube_pod_status_ready	30mins
Draft:GVP/Current/GVPPEGuide/Resource Manager Metrics	RM Service Down	CRITICAL	RM pods are not in ready state and RM service is not available	kube_pod_container_status_running	0
Draft:GVP/Current/GVPPEGuide/Resource Manager Metrics	RMConfigServerConnectionLost	HIGH	RM lost connection to GVP Configuration Server for 5mins.	gvp_rm_log_parser_warn_total	5 mins
Draft:GVP/Current/GVPPEGuide/Resource Manager Metrics	RMInterNodeConnectivityBroken	HIGH	Inter-node connectivity between RM nodes is lost for 5mins.	gvp_rm_log_parser_warn_total	5 mins
Draft:GVP/Current/GVPPEGuide/Resource Manager Metrics	RMMatchingIVRTenantNotFound	MEDIUM	Matching IVR profile tenant could not be found for 2mins	gvp_rm_log_parser_eror_total	2mins
Draft:GVP/Current/GVPPEGuide/Resource Manager Metrics	RMResourceAllocationFailed	MEDIUM	RM Resource allocation failed for 1mins	gvp_rm_log_parser_eror_total	1min
Draft:GVP/Current/GVPPEGuide/Resource Manager Metrics	RMServiceDegradedTo50Percentage	HIGH	One of the RM container is not in running state for 5mins	kube_pod_container_status_running	5mins
Draft:GVP/Current/GVPPEGuide/Resource Manager Metrics	RMSocketInterNodeError	HIGH	RM Inter node Socket Error for 5mins.	gvp_rm_log_parser_eror_total	5mins
Draft:GVP/Current/GVPPEGuide/Resource Manager Metrics	RMTotal4XXErrorForINVITE	MEDIUM	The RM mib counter stats will be collected for every 60 seconds and if the mib counter total4xxInviteSent increments from its previous value by 10 within 60 seconds the trigger will flag an alarm.	rmTotal4xxInviteSent	1min
Draft:GVP/Current/GVPPEGuide/Resource Manager Metrics	RMTotal5XXErrorForINVITE	HIGH	The RM mib counter stats will be collected for every 30 seconds and if the mib counter total5xxInviteSent increments from its previous value by 5 within 5 minutes the trigger will flag an alarm.	rmTotal5xxInviteSent	5 mins
Draft:GWS/Current/GWSPEGuide/GWSMetrics	CPUThrottling	Critical	Containers are being throttled more than 1 time per second.	container_cpu_cfs_throttled_periods_total	1
Draft:GWS/Current/GWSPEGuide/GWSMetrics	gws_high_500_responces_java	Critical	Too many 500 responses.	gws_responses_total	10
Draft:GWS/Current/GWSPEGuide/GWSMetrics	gws_high_5xx_responces_count	Critical	Too many 5xx responses.	gws_responses_total	60
Draft:GWS/Current/GWSPEGuide/GWSMetrics	gws_high_cpu_usage	Warning	High container CPU usage.	container_cpu_usage_seconds_total	300%
Draft:GWS/Current/GWSPEGuide/GWSMetrics	gws_high_jvm_gc_pause_seconds_count	Critical	JVM garbage collection occurs too often.	jvm_gc_pause_seconds_count	10
Draft:GWS/Current/GWSPEGuide/GWSMetrics	gws_jvm_threads_deadlocked	Critical	Deadlocked JVM threads exist.	jvm_threads_deadlocked	0
Draft:GWS/Current/GWSPEGuide/GWSMetrics	netstat_Tcp_RetransSegs	Warning	High number of TCP RetransSegs (retransmitted segments).	node_netstat_Tcp_RetransSegs	2000
Draft:GWS/Current/GWSPEGuide/GWSMetrics	total_count_of_errors_during_context_initialization	Warning	Total count of errors during context initialization.	gws_context_error_total	1200
Draft:GWS/Current/GWSPEGuide/GWSMetrics	total_count_of_errors_in_PSDK_connections	Warning	Total count of errors in PSDK connections.	psdk_conn_error_total	3
Draft:GWS/Current/GWSPEGuide/WorkspaceMetrics	DesiredPodsDontMatchSpec	Critical	The Workspace Service deployment doesn't have the desired number of replicas.	kube_deployment_status_replicas_available, kube_deployment_spec_replicas	Fired when number of available replicas does not equal to configured number.
Draft:GWS/Current/GWSPEGuide/WorkspaceMetrics	gws_app_workspace_incoming_requests	Critical	High rate of incoming requests from Workspace Web Edition.	gws_app_workspace_incoming_requests	10
Draft:GWS/Current/GWSPEGuide/WorkspaceMetrics	gws_high_500_responces_workspace	Critical	The Workspace Service has too many 500 responses.	gws_app_workspace_requests	10
Draft:GWS/Current/GWSPEGuide/WorkspaceMetrics	gws_high_cpu_usage	Warning	High container CPU usage.	container_cpu_usage_seconds_total	300%
Draft:GWS/Current/GWSPEGuide/WorkspaceMetrics	gws_high_nodejs_eventloop_lag_seconds	Critical	The Node.js event loop is too slow.	nodejs_eventloop_lag_seconds	0.2
Draft:PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES-NODE-JS-DELAY-WARNING	Warning	Triggers if the base NodeJS event loop becomes excessive. This indicates significant resource and performance issues with the deployment.	application_ccecp_nodejs_eventloop_lag_seconds	Triggered when the event loop is greater than 5 milliseconds for a period exceeding 5 minutes.
Draft:PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_CB_ENQUEUE_LIMIT_REACHED	Info	GES is throttling callbacks to a given phone number.	CB_ENQUEUE_LIMIT_REACHED	Triggered when GES has begun throttling callbacks to a given number within the past 2 minutes.
Draft:PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_CB_SUBMIT_FAILED	Info	GES has failed to submit a callback to ORS.	CB_SUBMIT_FAILED	Triggered when GES has failed to submit a callback to ORS in the past 2 minutes for any reason.
Draft:PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_CB_TTL_LIMIT_REACHED	Info	GES is throttling callbacks for a specific tenant.	CB_TTL_LIMIT_REACHED	Triggered when GES has started throttling callbacks within the past 2 minutes.
Draft:PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_CPU_USAGE	Info	GES has high CPU usage for 1 minute.	ges_process_cpu_seconds_total	Triggered when the average CPU usage (measured by ges_process_cpu_seconds_total) is greater than 90% for 1 minute.
Draft:PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_DNS_FAILURE	Warning	A GES pod has encountered difficulty resolving DNS requests.	DNS_FAILURE	Triggered when GES encounters any DNS failures within the last 30 minutes.
Draft:PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_GWS_AUTH_DOWN	Warning	Connection to the Genesys Authentication Service is down.	GWS_AUTH_STATUS	Triggered when the connection to the Genesys Authentication Service is down for 5 minutes.
Draft:PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_GWS_CONFIG_DOWN	Warning	Connection to the GWS Configuration Service is down.	GWS_CONFIG_STATUS	Triggered when the connection to the GWS Configuration Service is down.
Draft:PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_GWS_ENVIRONMENT_DOWN	Warning	Connection to the GWS Environment Service is down.	GWS_ENV_STATUS	Triggered when the connection to the GWS Environment Service is down.
Draft:PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_GWS_INCORRECT_CLIENT_CREDENTIALS	Warning	The GWS client credentials provided to GES are incorrect.	GWS_INCORRECT_CLIENT_CREDENTIALS	Triggered when GWS has had any issue with the GES client credentials in the last 5 minutes.
Draft:PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_GWS_SERVER_ERROR	Warning	GES has encountered server or connection errors with GWS.	GWS_SERVER_ERROR	Triggered when there has been a GWS server error in the past 5 minutes.
Draft:PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_HEALTH	Critical	One or more downstream components (PostGres, Config Server, GWS, ORS) are down. '''Note:''' Because GES goes into a crash loop when Redis is down, this does not fire when Redis is down.	GES_HEALTH	Triggered when any component is down for any length of time.
Draft:PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_HTTP_400_POD	Info	An individual GES pod is returning excessive HTTP 400 results.	ges_http_failed_requests_total, http_400_tolerance	Triggered when two or more HTTP 400 results are returned from a pod within a 5-minute period.
Draft:PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_HTTP_401_POD	Info	An individual GES pod is returning excessive HTTP 401 results.	ges_http_failed_requests_total, http_401_tolerance	Triggered when two or more HTTP 401 results are returned from a pod within a 5-minute period.
Draft:PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_HTTP_404_POD	Info	An individual GES pod is returning excessive HTTP 404 results.	ges_http_failed_requests_total, http_404_tolerance	Triggered when two or more HTTP 404 results are returned from a pod within a 5-minute period.
Draft:PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_HTTP_500_POD	Info	An individual GES pod is returning excessive HTTP 500 results.	ges_http_failed_requests_total, http_500_tolerance	Triggered when two or more HTTP 500 results are returned from a pod within a 5-minute period.
Draft:PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_INVALID_CONTENT_LENGTH	Info	Fires if GES encounters any incoming requests that have exceeded the maximum content length of 10mb on the internal port and 500kb for the external, public-facing port.	INVALID_CONTENT_LENGTH, invalid_content_length_tolerance	Triggered when one instance of a message with an invalid length is received. Silenced after 2 minutes.
Draft:PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_LOGGING_FAILURE	Warning	GES has failed to write a message to the log.	LOGGING_FAILURE	Triggered when there are any failures writing to the logs. Silenced after 1 minute.
Draft:PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_MEMORY_USAGE	Info	GES has high memory usage for a period of 90 seconds.	ges_nodejs_heap_space_size_used_bytes, ges_nodejs_heap_space_size_available_bytes	Triggered when memory usage (measured as a ratio of Used Heap Space vs Available Heap Space) is above 80% for a 90-second interval.
Draft:PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_NEXUS_ACCESS_FAILURE	Warning	GES has been having difficulties contacting Nexus. This alert is only relevant for customers who leverage the Push Notification feature in Genesys Callback.	NEXUS_ACCESS_FAILURE	Triggered when GES has failed to connect or communicate with Nexus more than 30 times over the last hour.
Draft:PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_NOT_READY_CRITICAL	Critical	GES pods are not in the `Ready` state. Indicative of issues with the Redis connection or other problems with the Helm deployment.	kube_pod_container_status_ready	Triggered when more than 50% of GES pods have not been in a Ready state for 5 minutes.
Draft:PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_NOT_READY_WARNING	Warning	GES pods are not in the `Ready` state. Indicative of issues with the Redis connection or other problems with the Helm deployment.	kube_pod_container_status_ready	Triggered when 25% (or more) of GES pods have not been in a Ready state for 10 minutes.
Draft:PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_ORS_REDIS_DOWN	Critical	Connection to ORS_REDIS is down.	ORS_REDIS_STATUS	Triggered when the ORS_REDIS connection is down for 5 consecutive minutes.
Draft:PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_PODS_RESTART	Critical	GES pods have been excessively crashing and restarting.	kube_pod_container_status_restarts_total	Triggered when there have been more than five pod restarts in the past 15 minutes.
Draft:PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_RBAC_CREATE_VQ_PROXY_ERROR	Info	Fires if there are issues with GES managing VQ Proxy Objects.	RBAC_CREATE_VQ_PROXY_ERROR, rbac_create_vq_proxy_error_tolerance	Triggered when there are at least 1000 instances of issues managing VQ Proxy objects within a 10-minute period.
Draft:PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_SLOW_HTTP_RESPONSE_TIME	Warning	Fired if the average response time for incoming requests begins to lag.	ges_http_request_duration_seconds_sum, ges_http_request_duration_seconds_count	Triggered when the average response time for incoming requests is above 1.5 seconds for a sustained period of 15 minutes.
Draft:PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_UNCAUGHT_EXCEPTION	Warning	There has been an uncaught exception within GES.	UNCAUGHT_EXCEPTION	Triggered when GES encounters any uncaught exceptions. Silenced after 1 minute.
Draft:PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_UP	Critical	Fires when fewer than two GES pods have been up for the last 15 minutes.		Triggered when fewer than two GES pods are up for 15 consecutive minutes.
Draft:PEC-DC/Current/DCPEGuide/DCMetrics	Memory usage is above 3000 Mb	Critical	Triggered when the memory usage on this pod is above 3000 Mb for 15 minutes.	nexus_process_resident_memory_bytes	For 15 minutes
Draft:PEC-DC/Current/DCPEGuide/DCMetrics	Nexus error rate	Critical	Triggered when the error rate on this pod is greater than 20% for 15 minutes.	nexus_errors_total, nexus_request_total	For 15 minutes
Draft:PEC-IWD/Current/IWDPEGuide/IWD metrics and alerts	Database connections above 75	HIGH	Triggered when pod database connections number is above 75.		Default number of connections: 75
Draft:PEC-IWD/Current/IWDPEGuide/IWD metrics and alerts	IWD DB errors	CRITICAL	Triggered when IWD experiences more than 2 errors within 1 minute during operations with database.		Default number of errors: 2
Draft:PEC-IWD/Current/IWDPEGuide/IWD metrics and alerts	IWD error rate	CRITICAL	Triggered when the number of errors in IWD exceeds the threshold for 15 min period.		Default number of errors: 2
Draft:PEC-IWD/Current/IWDPEGuide/IWD metrics and alerts	Memory usage is above 3000 Mb	CRITICAL	Triggered when the pod memory usage is above 3000 MB.		Default memory usage: 3000 MB
Draft:PEC-OU/Current/CXCPEGuide/APIAMetrics	CXC-API-LatencyHigh	HIGH	Triggered when the latency for API responses is beyond the defined threshold.		2500ms for 5m
Draft:PEC-OU/Current/CXCPEGuide/APIAMetrics	CXC-API-Redis-Connection-Failed	HIGH	Triggered when the connection to redis fails for more than 1 minute.		1m
Draft:PEC-OU/Current/CXCPEGuide/APIAMetrics	CXC-CPUUsage	HIGH	Triggered when the CPU utilization of a pod is beyond the threshold		300% for 5m
Draft:PEC-OU/Current/CXCPEGuide/APIAMetrics	CXC-EXT-Ingress-Error-Rate	HIGH	Triggered when the Ingress error rate is above the specified threshold.		20% for 5m
Draft:PEC-OU/Current/CXCPEGuide/APIAMetrics	CXC-MemoryUsage	HIGH	Triggered when the memory utilization of a pod is beyond the threshold.		70% for 5m
Draft:PEC-OU/Current/CXCPEGuide/APIAMetrics	CXC-MemoryUsagePD	HIGH	Triggered when the memory usage of a pod is above the critical threshold.		90% for 5m
Draft:PEC-OU/Current/CXCPEGuide/APIAMetrics	CXC-PodNotReadyCount	HIGH	Triggered when the number of pods ready for a CX Contact deployment is less than or equal to the threshold.		1 for 5m
Draft:PEC-OU/Current/CXCPEGuide/APIAMetrics	CXC-PodRestartsCount	HIGH	Triggered when the restart count for a pod is beyond the threshold.		1 for 5m
Draft:PEC-OU/Current/CXCPEGuide/APIAMetrics	CXC-PodRestartsCountPD	HIGH	Triggered when the restart count is beyond the critical threshold.		5 for 5m
Draft:PEC-OU/Current/CXCPEGuide/APIAMetrics	CXC-PodsNotReadyPD	HIGH	Triggered when there are no pods ready for CX Contact deployment.		0 for 1m
Draft:PEC-OU/Current/CXCPEGuide/APIAMetrics	cxc_api_too_many_errors_from_auth	HIGH	Triggered when there are too many error responses from the auth service for more than the specified time threshold.		1m
Draft:PEC-OU/Current/CXCPEGuide/CPGMMetrics	CXC-CM-Redis-Connection-Failed	HIGH	Triggered when the connection to redis fails for more than 1 minute.		1m
Draft:PEC-OU/Current/CXCPEGuide/CPGMMetrics	CXC-CPUUsage	HIGH	Triggered when a the CPU utilization of a pod is beyond the threshold		300% for 5m
Draft:PEC-OU/Current/CXCPEGuide/CPGMMetrics	CXC-MemoryUsage	HIGH	Triggered when the memory utilization of a pod is beyond the threshold.		70% for 5m
Draft:PEC-OU/Current/CXCPEGuide/CPGMMetrics	CXC-MemoryUsagePD	HIGH	Triggered when the memory usage of a pod is above the critical threshold.		90% for 5m
Draft:PEC-OU/Current/CXCPEGuide/CPGMMetrics	CXC-PodNotReadyCount	HIGH	Triggered when the number of pods ready for a CX Contact deployment is less than or equal to the threshold.		1 for 5m
Draft:PEC-OU/Current/CXCPEGuide/CPGMMetrics	CXC-PodRestartsCount	HIGH	Triggered when the restart count for a pod is beyond the threshold.		1 for 5m
Draft:PEC-OU/Current/CXCPEGuide/CPGMMetrics	CXC-PodRestartsCountPD	HIGH	Triggered when the restart count is beyond the critical threshold.		5 for 5m
Draft:PEC-OU/Current/CXCPEGuide/CPGMMetrics	CXC-PodsNotReadyPD	HIGH	Triggered when there are no pods ready for CX Contact deployment.		0 for 1m
Draft:PEC-OU/Current/CXCPEGuide/CPLMMetrics	CXC-CoM-Redis-no-active-connections	HIGH	Triggered when CX Contact compliance has no active redis connection for 2 minutes		2m
Draft:PEC-OU/Current/CXCPEGuide/CPLMMetrics	CXC-Compliance-LatencyHigh	HIGH	Triggered when the latency for API responses is beyond the defined threshold.		5000ms for 5m
Draft:PEC-OU/Current/CXCPEGuide/CPLMMetrics	CXC-CPUUsage	HIGH	Triggered when the CPU utilization of a pod is beyond the threshold.		300% for 5m
Draft:PEC-OU/Current/CXCPEGuide/CPLMMetrics	CXC-MemoryUsage	HIGH	Triggered when the memory utilization of a pod is beyond the threshold.		70% for 5m
Draft:PEC-OU/Current/CXCPEGuide/CPLMMetrics	CXC-MemoryUsagePD	HIGH	Triggered when the memory usage of a pod is above the critical threshold.		90% for 5m
Draft:PEC-OU/Current/CXCPEGuide/CPLMMetrics	CXC-PodNotReadyCount	HIGH	Triggered when the number of pods ready for a CX Contact deployment is less than or equal to the threshold.		1 for 5m
Draft:PEC-OU/Current/CXCPEGuide/CPLMMetrics	CXC-PodRestartsCount	HIGH	Triggered when the restart count for a pod is beyond the threshold.		1 for 5m
Draft:PEC-OU/Current/CXCPEGuide/CPLMMetrics	CXC-PodRestartsCountPD	HIGH	Triggered when the restart count is beyond the critical threshold.		5 for 5m
Draft:PEC-OU/Current/CXCPEGuide/CPLMMetrics	CXC-PodsNotReadyPD	HIGH	Triggered when there are no pods ready for CX Contact deployment.		0 for 1m
Draft:PEC-OU/Current/CXCPEGuide/DMMetrics	CXC-CPUUsage	HIGH	Triggered when the CPU utilization of a pod is beyond the threshold		300% for 5m
Draft:PEC-OU/Current/CXCPEGuide/DMMetrics	CXC-DM-LatencyHigh	HIGH	Triggered when the latency for dial manager is above the defined threshold.		5000ms for 5m
Draft:PEC-OU/Current/CXCPEGuide/DMMetrics	CXC-MemoryUsage	HIGH	Triggered when the memory utilization of a pod is beyond the threshold.		70% for 5m
Draft:PEC-OU/Current/CXCPEGuide/DMMetrics	CXC-MemoryUsagePD	HIGH	Triggered when the memory usage of a pod is above the critical threshold.		90% for 5m
Draft:PEC-OU/Current/CXCPEGuide/DMMetrics	CXC-PodNotReadyCount	HIGH	Triggered when the number of pods ready for a CX Contact deployment is less than or equal to the threshold.		1 for 5m
Draft:PEC-OU/Current/CXCPEGuide/DMMetrics	CXC-PodRestartsCount	HIGH	Triggered when the restart count for a pod is beyond the threshold.		1 for 5m
Draft:PEC-OU/Current/CXCPEGuide/DMMetrics	CXC-PodRestartsCountPD	HIGH	Triggered when the restart count is beyond the critical threshold.		5 for 5m
Draft:PEC-OU/Current/CXCPEGuide/DMMetrics	CXC-PodsNotReadyPD	HIGH	Triggered when there are no pods ready for CX Contact deployment.		0 for 1m
Draft:PEC-OU/Current/CXCPEGuide/JSMetrics	CXC-CPUUsage	HIGH	Triggered when the CPU utilization of a pod is beyond the threshold		300% for 5m
Draft:PEC-OU/Current/CXCPEGuide/JSMetrics	CXC-JS-LatencyHigh	HIGH	Triggered when the latency for job scheduler is above the defined threshold.		5000ms for 5m
Draft:PEC-OU/Current/CXCPEGuide/JSMetrics	CXC-MemoryUsage	HIGH	Triggered when the memory utilization of a pod is beyond the threshold.		70% for 5m
Draft:PEC-OU/Current/CXCPEGuide/JSMetrics	CXC-MemoryUsagePD	HIGH	Triggered when the memory usage of a pod is above the critical threshold.		90% for 5m
Draft:PEC-OU/Current/CXCPEGuide/JSMetrics	CXC-PodNotReadyCount	HIGH	Triggered when the number of pods ready for a CX Contact deployment is less than or equal to the threshold.		1 for 5m
Draft:PEC-OU/Current/CXCPEGuide/JSMetrics	CXC-PodRestartsCount	HIGH	Triggered when the restart count for a pod is beyond the threshold.		1 for 5m
Draft:PEC-OU/Current/CXCPEGuide/JSMetrics	CXC-PodRestartsCountPD	HIGH	Triggered when the restart count is beyond the critical threshold.		5 for 5m
Draft:PEC-OU/Current/CXCPEGuide/JSMetrics	CXC-PodsNotReadyPD	HIGH	Triggered when there are no pods ready for CX Contact deployment.		0 for 1m
Draft:PEC-OU/Current/CXCPEGuide/LBMetrics	CXC-CPUUsage	HIGH	Triggered when the CPU utilization of a pod is beyond the threshold		300% for 5m
Draft:PEC-OU/Current/CXCPEGuide/LBMetrics	CXC-LB-LatencyHigh	HIGH	Triggered when the latency for list builder is above the defined threshold.		5000ms for 5m
Draft:PEC-OU/Current/CXCPEGuide/LBMetrics	CXC-MemoryUsage	HIGH	Triggered when the memory utilization of a pod is beyond the threshold.		70% for 5m
Draft:PEC-OU/Current/CXCPEGuide/LBMetrics	CXC-MemoryUsagePD	HIGH	Triggered when the memory usage of a pod is above the critical threshold.		90% for 5m
Draft:PEC-OU/Current/CXCPEGuide/LBMetrics	CXC-PodNotReadyCount	HIGH	Triggered when the number of pods ready for a CX Contact deployment is less than or equal to the threshold.		1 for 5m
Draft:PEC-OU/Current/CXCPEGuide/LBMetrics	CXC-PodRestartsCount	HIGH	Triggered when the restart count for a pod is beyond the threshold.		1 for 5m
Draft:PEC-OU/Current/CXCPEGuide/LBMetrics	CXC-PodRestartsCountPD	HIGH	Triggered when the restart count is beyond the critical threshold.		5 for 5m
Draft:PEC-OU/Current/CXCPEGuide/LBMetrics	CXC-PodsNotReadyPD	HIGH	Triggered when there are no pods ready for CX Contact deployment.		0 for 1m
Draft:PEC-OU/Current/CXCPEGuide/LMMetrics	CXC-CPUUsage	HIGH	Triggered when the CPU utilization of a pod is beyond the threshold		300% for 5m
Draft:PEC-OU/Current/CXCPEGuide/LMMetrics	CXC-LM-LatencyHigh	HIGH	Triggered when the latency for list manager is above the defined threshold		5000ms for 5m
Draft:PEC-OU/Current/CXCPEGuide/LMMetrics	CXC-MemoryUsage	HIGH	Triggered when the memory utilization of a pod is beyond the threshold.		70% for 5m
Draft:PEC-OU/Current/CXCPEGuide/LMMetrics	CXC-MemoryUsagePD	HIGH	Triggered when the memory usage of a pod is above the critical threshold.		90% for 5m
Draft:PEC-OU/Current/CXCPEGuide/LMMetrics	CXC-PodNotReadyCount	HIGH	Triggered when the number of pods ready for a CX Contact deployment is less than or equal to the threshold.		1 for 5m
Draft:PEC-OU/Current/CXCPEGuide/LMMetrics	CXC-PodRestartsCount	HIGH	Triggered when the restart count for a pod is beyond the threshold.		1 for 5m
Draft:PEC-OU/Current/CXCPEGuide/LMMetrics	CXC-PodRestartsCountPD	HIGH	Triggered when the restart count is beyond the critical threshold.		5 for 5m
Draft:PEC-OU/Current/CXCPEGuide/LMMetrics	CXC-PodsNotReadyPD	HIGH	Triggered when there are no pods ready for CX Contact deployment.		0 for 1m
Draft:PEC-OU/Current/CXCPEGuide/LMMetrics	cxc_list_manager_too_many_errors_from_auth	HIGH	Triggered when there are too many error responses from the auth service (list manager) for more than the specified time threshold.		1m
Draft:PEC-REP/Current/GCXIPEGuide/GCXIMetrics	gcxi__cluster__info		This alert indicates problems with the cluster states. Applicable only if you have two or more nodes in a cluster.	gcxi__cluster__info
Draft:PEC-REP/Current/GCXIPEGuide/GCXIMetrics	gcxi__projects__status		If the value of cxi__projects__status is greater than 0, this alarm is set, indicating that reporting is not functioning properly.	cxi__projects__status	< 0
Draft:PEC-REP/Current/GCXIPEGuide/RAAMetrics	raa-errors	'''Specified by''': raa.prometheusRule.alerts.raa-errors.labels.severity in values.yaml. '''Recommended value''': warning	A nonzero value indicates that errors have been logged during the scrape interval.	gcxi_raa_error_count	>0
Draft:PEC-REP/Current/GCXIPEGuide/RAAMetrics	raa-health	'''Specified by''': raa.prometheusRule.alerts.labels.severity '''Recommended value:''' severe	A zero value for a recent period (several scrape intervals) indicates that RAA is not operating.	gcxi_raa_health_level	Specified by: raa.prometheusRule.alerts.health.for '''Recommended value''': 30m
Draft:PEC-REP/Current/GCXIPEGuide/RAAMetrics	raa-long-aggregation	'''Specified by''': raa.prometheusRule.alerts.longAggregation.labels.severity in values.yaml. '''Recommended value''': warning	Indicates that the average duration of aggregation queries specified by the hierarchy, level, and mediaType labels is greater than the deadlock-threshold.	gcxi_raa_aggregated_duration_ms/ gcxi_raa_aggregated_count	Greater than the value (seconds) of raa.prometheusRule.alerts.longAggregation.thresholdSec in values.yaml. '''Recommended value''': 300
Draft:PEC-REP/Current/GIMPEGuide/GCAMetrics	GcaOOMKilled	Critical	Triggered when a GCA pod is restarted because of OOMKilled.	kube_pod_container_status_restarts_total and kube_pod_container_status_last_terminated_reason	1
Draft:PEC-REP/Current/GIMPEGuide/GCAMetrics	GcaPodCrashLooping	Critical	Triggered when a GCA pod is crash looping.	kube_pod_container_status_restarts_total	The restart rate is greater than 0 for 5 minutes
Draft:PEC-REP/Current/GIMPEGuide/GSPMetrics	GspFlinkJobDown	Critical	Triggered when the GSP Flink job is not running (number of running jobs equals to 0 or metric is not available)	flink_jobmanager_numRunningJobs	For 5 minutes
Draft:PEC-REP/Current/GIMPEGuide/GSPMetrics	GspNoTmRegistered	Critical	Triggered when there are no registered TaskManagers (or metric not available)	flink_jobmanager_numRegisteredTaskManagers	For 5 minutes
Draft:PEC-REP/Current/GIMPEGuide/GSPMetrics	GspOOMKilled	Critical	Triggered when a GSP pod is restarted because of OOMKilled	kube_pod_container_status_restarts_total	0
Draft:PEC-REP/Current/GIMPEGuide/GSPMetrics	GspUnknownPerson	High	Triggered when GSP encounters unknown person(s)	flink_taskmanager_job_task_operator_tenant_error_total{error="unknown_person",service="gsp"}	For 5 minutes
Draft:PEC-REP/Current/PulsePEGuide/dcuMetrics	pulse_dcu_critical_col_connected_configservers	Critical	Pulse DCU Collector is not connected to ConfigServer.	pulse_collector_connection_status	for 15 minutes
Draft:PEC-REP/Current/PulsePEGuide/dcuMetrics	pulse_dcu_critical_col_connected_dbservers	Critical	Pulse DCU Collector is not connected to DbServer.	pulse_collector_connection_status	for 15 minutes
Draft:PEC-REP/Current/PulsePEGuide/dcuMetrics	pulse_dcu_critical_col_connected_statservers	Critical	Pulse DCU Collector is not connected to Stat Server.	pulse_collector_connection_status	for 15 minutes
Draft:PEC-REP/Current/PulsePEGuide/dcuMetrics	pulse_dcu_critical_col_snapshot_writing	Critical	Pulse DCU Collector does not write snapshots.	pulse_collector_snapshot_writing_status	for 15 minutes
Draft:PEC-REP/Current/PulsePEGuide/dcuMetrics	pulse_dcu_critical_cpu	Critical	Detected critical CPU usage by Pulse DCU Pod.	container_cpu_usage_seconds_total, kube_pod_container_resource_limits	90%
Draft:PEC-REP/Current/PulsePEGuide/dcuMetrics	pulse_dcu_critical_disk	Critical	Detected critical disk usage by Pulse DCU Pod.	kubelet_volume_stats_available_bytes, kubelet_volume_stats_capacity_bytes	90%
Draft:PEC-REP/Current/PulsePEGuide/dcuMetrics	pulse_dcu_critical_memory	Critical	Detected critical memory usage by Pulse DCU Pod.	container_memory_working_set_bytes, kube_pod_container_resource_limits	90%
Draft:PEC-REP/Current/PulsePEGuide/dcuMetrics	pulse_dcu_critical_nonrunning_instances	Critical	Triggered when Pulse DCU instances are down.	kube_statefulset_status_replicas_ready, kube_statefulset_status_replicas	for 15 minutes
Draft:PEC-REP/Current/PulsePEGuide/dcuMetrics	pulse_dcu_critical_ss_connected_configservers	Critical	Pulse DCU Stat Server is not connected to ConfigServer.	pulse_statserver_server_connected_seconds	for 15 minutes
Draft:PEC-REP/Current/PulsePEGuide/dcuMetrics	pulse_dcu_critical_ss_connected_ixnservers	Critical	Pulse DCU Stat Server is not connected to IxnServers.	pulse_statserver_server_connected_seconds	2
Draft:PEC-REP/Current/PulsePEGuide/dcuMetrics	pulse_dcu_critical_ss_connected_tservers	Critical	Pulse DCU Stat Server is not connected to T-Servers.	pulse_statserver_server_connected_number	2
Draft:PEC-REP/Current/PulsePEGuide/dcuMetrics	pulse_dcu_critical_ss_failed_dn_registrations	Critical	Detected critical DN registration failures on Pulse DCU Stat Server.	pulse_statserver_dn_failed, pulse_statserver_dn_registered	0.5%
Draft:PEC-REP/Current/PulsePEGuide/dcuMetrics	pulse_dcu_monitor_data_unavailable	Critical	Pulse DCU Monitor Agents do not provide data.	pulse_monitor_check_duration_seconds, kube_statefulset_replicas	for 15 minutes
Draft:PEC-REP/Current/PulsePEGuide/dcuMetrics	pulse_dcu_too_frequent_restarts	Critical	Detected too frequent restarts of DCU Pod container.	kube_pod_container_status_restarts_total	2 for 1 hour
Draft:PEC-REP/Current/PulsePEGuide/ldsMetrics	pulse_lds_critical_cpu	Critical	Detected critical CPU usage by Pulse LDS Pod.	container_cpu_usage_seconds_total, kube_pod_container_resource_limits	90%
Draft:PEC-REP/Current/PulsePEGuide/ldsMetrics	pulse_lds_critical_memory	Critical	Detected critical memory usage by Pulse LDS Pod.	container_memory_working_set_bytes, kube_pod_container_resource_limits	90%
Draft:PEC-REP/Current/PulsePEGuide/ldsMetrics	pulse_lds_critical_nonrunning_instances	Critical	Triggered when Pulse LDS instances are down.	kube_statefulset_status_replicas_ready, kube_statefulset_status_replicas	for 15 minutes
Draft:PEC-REP/Current/PulsePEGuide/ldsMetrics	pulse_lds_monitor_data_unavailable	Critical	Pulse LDS Monitor Agents do not provide data.	pulse_monitor_check_duration_seconds, kube_statefulset_replicas	for 15 minutes
Draft:PEC-REP/Current/PulsePEGuide/ldsMetrics	pulse_lds_no_connected_senders	Critical	Pule LDS is not connected to upstream servers.	pulse_lds_senders_number	for 15 minutes
Draft:PEC-REP/Current/PulsePEGuide/ldsMetrics	pulse_lds_no_registered_dns	Critical	No DNs are registered on Pulse LDS.	pulse_lds_sender_registered_dns_number	for 30 minutes
Draft:PEC-REP/Current/PulsePEGuide/ldsMetrics	pulse_lds_too_frequent_restarts	Critical	Detected too frequent restarts of LDS Pod container.	kube_pod_container_status_restarts_total	2 for 1 hour
Draft:PEC-REP/Current/PulsePEGuide/PulseMetrics	pulse_critical_5xx	Critical	Detected critical 5xx errors per second for Pulse container.	http_server_requests_seconds_count	15%
Draft:PEC-REP/Current/PulsePEGuide/PulseMetrics	pulse_critical_cpu	Critical	Detected critical CPU usage by Pulse Pod.	container_cpu_usage_seconds_total, kube_pod_container_resource_limits	90%
Draft:PEC-REP/Current/PulsePEGuide/PulseMetrics	pulse_critical_hikari_cp	Critical	Detected critical Hikari connections pool usage by Pulse container.	hikaricp_connections_active, hikaricp_connections	90%
Draft:PEC-REP/Current/PulsePEGuide/PulseMetrics	pulse_critical_memory	Critical	Detected critical memory usage by Pulse Pod.	container_memory_working_set_bytes, kube_pod_container_resource_limits	90%
Draft:PEC-REP/Current/PulsePEGuide/PulseMetrics	pulse_critical_pulse_health	Critical	Detected critical number of healthy Pulse containers.	pulse_health_all_Boolean	50%
Draft:PEC-REP/Current/PulsePEGuide/PulseMetrics	pulse_critical_running_instances	Critical	Triggered when Pulse instances are down.	kube_deployment_status_replicas_available, kube_deployment_status_replicas	75%
Draft:PEC-REP/Current/PulsePEGuide/PulseMetrics	pulse_service_down	Critical	All Pulse instances are down.	up	for 15 minutes
Draft:PEC-REP/Current/PulsePEGuide/PulseMetrics	pulse_too_frequent_restarts	Critical	Detected too frequent restarts of Pulse Pod container.	kube_pod_container_status_restarts_total	2 for 1 hour
Draft:PEC-REP/Current/PulsePEGuide/PulsePermissionsMetrics	pulse_permissions_critical_cpu	Critical	Detected critical CPU usage by Pulse Permissions Pod.	container_cpu_usage_seconds_total, kube_pod_container_resource_limits	90%
Draft:PEC-REP/Current/PulsePEGuide/PulsePermissionsMetrics	pulse_permissions_critical_memory	Critical	Detected critical memory usage by Pulse Permissions Pod.	container_memory_working_set_bytes, kube_pod_container_resource_limits	90%
Draft:PEC-REP/Current/PulsePEGuide/PulsePermissionsMetrics	pulse_permissions_critical_running_instances	Critical	Triggered when Pulse Permissions instances are down.	kube_deployment_status_replicas_available, kube_deployment_status_replicas	75%
Draft:PEC-REP/Current/PulsePEGuide/PulsePermissionsMetrics	pulse_permissions_too_frequent_restarts	Critical	Detected too frequent restarts of Permissions Pod container.	kube_pod_container_status_restarts_total	2 for 1 hour
Draft:STRMS/Current/STRMSPEGuide/ServiceMetrics	streams_GWS_AUTH_DOWN	critical	Unable to connect to GWS auth service	gws_auth_down	10 seconds
Draft:STRMS/Current/STRMSPEGuide/STRMSMetrics	streams_BATCH_LAG_TIME	warning	Message handling exceeds 2 secs		30 seconds
Draft:STRMS/Current/STRMSPEGuide/STRMSMetrics	streams_DOWN	critical	The number of running instances is 0	sum(up) < 1	10 seconds
Draft:STRMS/Current/STRMSPEGuide/STRMSMetrics	streams_ENDPOINT_CONNECTION_DOWN	warning	Unable to connect to a customer endpoint	endpoint_connection_down	30 seconds
Draft:STRMS/Current/STRMSPEGuide/STRMSMetrics	streams_ENGAGE_KAFKA_CONNECTION_DOWN	critical	Unable to connect to Engage Kafka	engage_kafka_main_connection_down	10 seconds
Draft:STRMS/Current/STRMSPEGuide/STRMSMetrics	streams_GWS_AUTH_DOWN	Critical	Unable to connect to GWS auth service	gws_auth_down	30 seconds
Draft:STRMS/Current/STRMSPEGuide/STRMSMetrics	streams_GWS_CONFIG_DOWN	critical	Unable to connect to GWS config service	gws_config_down
Draft:STRMS/Current/STRMSPEGuide/STRMSMetrics	streams_GWS_ENV_DOWN	critical	Unable to connect to GWS environment service	gws_env_down	30 seconds
Draft:STRMS/Current/STRMSPEGuide/STRMSMetrics	streams_INIT_ERROR	critical	Aborted due to initialization error e.g., KAFKA_FQDN is not defined	application_streams_init_error > 0	10 seconds
Draft:STRMS/Current/STRMSPEGuide/STRMSMetrics	streams_REDIS_DOWN	critical		redis_connection_down	10 seconds
Draft:TLM/Current/TLMPEGuide/TLMMetrics	Http Errors Occurrences Exceeded Threshold	High	Triggered when the number of HTTP errors exceeds 500 responses in 5 minutes	telemetry_events{eventName=~"http_error_.*", eventName!="http_error_404"}	>500 in 5 minutes
Draft:TLM/Current/TLMPEGuide/TLMMetrics	Telemetry CPU Utilization is Greater Than Threshold	High	Triggered when average CPU usage is more than 60%	node_cpu_seconds_total	>60%
Draft:TLM/Current/TLMPEGuide/TLMMetrics	Telemetry Dependency Status	Low	Triggered when there is no connection to one of the dependent services - GAuth, Config, Prometheus	telemetry_dependency_status	<80
Draft:TLM/Current/TLMPEGuide/TLMMetrics	Telemetry GAuth Time Alert	High	Triggered when there is no connection to the GAuth service	telemetry_gws_auth_req_time	>10000
Draft:TLM/Current/TLMPEGuide/TLMMetrics	Telemetry Healthy Pod Count Alert	High	Triggered when the number of healthy pods drops to critical level	kube_pod_container_status_ready	<2
Draft:TLM/Current/TLMPEGuide/TLMMetrics	Telemetry High Network Traffic	High	Triggered when network traffic exceeds 10MB/second for 5 minutes	node_network_transmit_bytes_total, node_network_receive_bytes_total	>10MBps
Draft:TLM/Current/TLMPEGuide/TLMMetrics	Telemetry Memory Usage is Greater Than Threshold	High	Triggered when average memory usage is more than 60%	container_cpu_usage_seconds_total, kube_pod_container_resource_limits_cpu_cores	>60%
Draft:UCS/Current/UCSPEGuide/UCSMetrics	ucsx_elasticsearch_health_status	critical	Triggered when there is no connection to ElasticSearch	ucsx_elasticsearch_health_status	2 minutes
Draft:UCS/Current/UCSPEGuide/UCSMetrics	ucsx_elasticsearch_slow_processing_time	critical	Triggered when Elasticsearch internal processing time > 500 ms	ucsx_elastic_search_sum, ucsx_elastic_search_count	5 minutes
Draft:UCS/Current/UCSPEGuide/UCSMetrics	ucsx_instance_high_cpu_utilization	warning	Triggered when average CPU usage is more than 80%	ucsx_performance	5 minutes
Draft:UCS/Current/UCSPEGuide/UCSMetrics	ucsx_instance_high_http_request_rate	warning	Triggered when request rate is more than 120 requests per seconds on one UCS-X instance	ucsx_http_request_duration_count	30 minutes
Draft:UCS/Current/UCSPEGuide/UCSMetrics	ucsx_instance_high_memory_usage	warning	Triggered when average CPU usage is more than 800 Mb	ucsx_memory	5 minutes
Draft:UCS/Current/UCSPEGuide/UCSMetrics	ucsx_instance_overloaded	warning	Triggered when overload protection rate is more than 0	ucsx_overload_protection_count	5 minutes
Draft:UCS/Current/UCSPEGuide/UCSMetrics	ucsx_instance_slow_http_response	critical	Triggered when average http response time > 500 ms	ucsx_http_request_duration_sum, ucsx_http_request_duration_count	5 minutes
Draft:UCS/Current/UCSPEGuide/UCSMetrics	ucsx_masterdb_health_status	warning	Triggered when there is no connection to master DB	ucsx_masterdb_health_status	2 minutes
Draft:UCS/Current/UCSPEGuide/UCSMetrics	ucsx_tenantdb_health_status	critical	Triggered when there is no connection to tenant DB	ucsx_tenantdb_health_status	2 minutes
Draft:VM/Current/VMPEGuide/VoiceAgentStateServiceMetrics	Agent service fail	Critical	Actions: *Check if there is any problem with pod '"`UNIQ--nowiki-0000005B-QINU`"', then restart the pod.	agent_health_level	Agent health level is Fail for pod '"`UNIQ--nowiki-0000005C-QINU`"' for 5 consecutive minutes.
Draft:VM/Current/VMPEGuide/VoiceAgentStateServiceMetrics	Config node fail	Warning	Actions: *Check if there is any problem with pod '"`UNIQ--nowiki-0000005D-QINU`"' and the config node.	http_client_response_count	Requests to the config node fail for 5 consecutive minutes.
Draft:VM/Current/VMPEGuide/VoiceAgentStateServiceMetrics	Container restarted repeatedly	Critical	Actions: Check if the new version of the image was deployed. Check for issues with the Kubernetes cluster.	kube_pod_container_status_restarts_total	Container '"`UNIQ--nowiki-00000056-QINU`"' was restarted 5 or more times within 15 minutes.
Draft:VM/Current/VMPEGuide/VoiceAgentStateServiceMetrics	Kafka events latency is too high	Warning	Actions: If the alarm is triggered for multiple topics, ensure there are no issues with Kafka (CPU, memory, or network overload). If the alarm is triggered only for topic '"`UNIQ--nowiki-00000048-QINU`"', check if there is an issue with the service related to the topic (CPU, memory, or network	kafka_consumer_latency_bucket	Latency for more than 5% of messages is more than 0.5 seconds for topic '"`UNIQ--nowiki-00000049-QINU`"'.
Draft:VM/Current/VMPEGuide/VoiceAgentStateServiceMetrics	Kafka not available	Critical	Actions: If the alarm is triggered for multiple services, ensure there are no issues with Kafka. Restart Kafka. If the alarm is triggered only for pod '"`UNIQ--nowiki-00000057-QINU`"', check if there is an issue with the pod.	kafka_producer_state, kafka_consumer_state	Kafka is not available for pod '"`UNIQ--nowiki-00000058-QINU`"' for 5 consecutive minutes.
Draft:VM/Current/VMPEGuide/VoiceAgentStateServiceMetrics	Max replicas is not sufficient for 5 mins	Critical	The desired number of replicas is higher than the current available replicas for the past 5 minutes.	kube_statefulset_replicas, kube_statefulset_status_replicas	The desired number of replicas is higher than the current available replicas for the past 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceAgentStateServiceMetrics	Pod CPU greater than 65%	Warning	High CPU load for pod '"`UNIQ--nowiki-0000005E-QINU`"'.	container_cpu_usage_seconds_total, container_spec_cpu_period	Container '"`UNIQ--nowiki-0000005F-QINU`"' CPU usage exceeded 65% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceAgentStateServiceMetrics	Pod CPU greater than 80%	Critical	Critical CPU load for pod '"`UNIQ--nowiki-00000060-QINU`"'.	container_cpu_usage_seconds_total, container_spec_cpu_period	Container '"`UNIQ--nowiki-00000061-QINU`"' CPU usage exceeded 80% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceAgentStateServiceMetrics	Pod memory greater than 65%	Warning	High memory usage for pod '"`UNIQ--nowiki-00000062-QINU`"'.	container_memory_working_set_bytes, kube_pod_container_resource_requests_memory_bytes	Container '"`UNIQ--nowiki-00000063-QINU`"' memory usage exceeded 65% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceAgentStateServiceMetrics	Pod memory greater than 80%	Critical	Critical memory usage for pod '"`UNIQ--nowiki-00000064-QINU`"'.	container_memory_working_set_bytes, kube_pod_container_resource_requests_memory_bytes	Container '"`UNIQ--nowiki-00000065-QINU`"' memory usage exceeded 80% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceAgentStateServiceMetrics	Pod status Failed	Warning	Actions: *Restart the pod. Check if there are any issues with the pod after restart.	kube_pod_status_phase	Pod '"`UNIQ--nowiki-00000052-QINU`"' is in Failed state.
Draft:VM/Current/VMPEGuide/VoiceAgentStateServiceMetrics	Pod status NotReady	Critical	Actions: *Restart the pod. Check if there are any issues with the pod after restart.	kube_pod_status_ready	Pod '"`UNIQ--nowiki-00000055-QINU`"' is in NotReady status for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceAgentStateServiceMetrics	Pod status Pending	Warning	Actions: *Restart the pod. Check if there are any issues with the pod after restart.	kube_pod_status_phase	Pod '"`UNIQ--nowiki-00000054-QINU`"' is in Pending state for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceAgentStateServiceMetrics	Pod status Unknown	Warning	Actions: *Restart the pod. Check if there are any issues with pod after restart.	kube_pod_status_phase	Pod '"`UNIQ--nowiki-00000053-QINU`"' is in Unknown state for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceAgentStateServiceMetrics	Possible messages lost	Critical	Actions: *Check Kafka and '"`UNIQ--nowiki-0000004A-QINU`"' service overload, network degradation.	kafka_consumer_recv_messages_total, kafka_producer_sent_messages_total	Number of sent requests is two times higher than received for topic '"`UNIQ--nowiki-0000004B-QINU`"'.
Draft:VM/Current/VMPEGuide/VoiceAgentStateServiceMetrics	Redis not available	Critical	Actions: If the alarm is triggered for multiple services, ensure there are no issues with Redis. Restart Redis. If the alarm is triggered only for pod '"`UNIQ--nowiki-00000059-QINU`"', check if there is an issue with the pod.	agent_redis_state, agent_stream_redis_state	Redis is not available for pod '"`UNIQ--nowiki-0000005A-QINU`"' for 5 consecutive minutes.
Draft:VM/Current/VMPEGuide/VoiceAgentStateServiceMetrics	Too many Kafka consumer crashes	Critical	Actions: If the alarm is triggered for multiple services, ensure there are no issues with Kafka. Restart Kafka. If the alarm is triggered only for container '"`UNIQ--nowiki-00000050-QINU`"', check if there is an issue with the service.	kafka_consumer_error_total	More than 3 Kafka consumer crashes in 5 minutes for service '"`UNIQ--nowiki-00000051-QINU`"'.
Draft:VM/Current/VMPEGuide/VoiceAgentStateServiceMetrics	Too many Kafka consumer failed health checks	Warning	Actions: If the alarm is triggered for multiple services, ensure there are no issues with Kafka. Restart Kafka. If the alarm is triggered only for container '"`UNIQ--nowiki-0000004C-QINU`"', check if there is an issue with the service.	kafka_consumer_error_total	Health check failed more than 10 times in 5 minutes for Kafka consumer for topic '"`UNIQ--nowiki-0000004D-QINU`"'.
Draft:VM/Current/VMPEGuide/VoiceAgentStateServiceMetrics	Too many Kafka consumer request timeouts	Warning	Actions: If the alarm is triggered for multiple services, ensure there are no issues with Kafka. Restart Kafka. If the alarm is triggered only for container '"`UNIQ--nowiki-0000004E-QINU`"', check if there is an issue with the service.	kafka_consumer_error_total	More than 10 request timeouts appeared in 5 minutes for Kafka consumer for topic '"`UNIQ--nowiki-0000004F-QINU`"'.
Draft:VM/Current/VMPEGuide/VoiceAgentStateServiceMetrics	Too many Kafka pending events	Critical	Actions: *Ensure there are no issues with Kafka or '"`UNIQ--nowiki-00000066-QINU`"' pod's CPU and network.	kafka_producer_queue_depth	Too many Kafka producer pending events for pod '"`UNIQ--nowiki-00000067-QINU`"' (more than 100 in 5 minutes).
Draft:VM/Current/VMPEGuide/VoiceCallStateServiceMetrics	Container restarted repeatedly	Critical	Actions: Check if the new version of the image was deployed. Check for issues with the Kubernetes cluster.	kube_pod_container_status_restarts_total	Container '"`UNIQ--nowiki-0000004A-QINU`"' was restarted 5 or more times within 15 minutes.
Draft:VM/Current/VMPEGuide/VoiceCallStateServiceMetrics	Kafka events latency is too high	Critical	Actions: If the alarm is triggered for multiple topics, ensure there are no issues with Kafka (CPU, memory, or network overload). If the alarm is triggered only for topic '"`UNIQ--nowiki-0000003E-QINU`"', check if there is an issue with the service related to the topic (CPU, memory, or network	kafka_consumer_latency_bucket	Latency for more than 5% of messages is more than 0.5 seconds for topic '"`UNIQ--nowiki-0000003F-QINU`"'.
Draft:VM/Current/VMPEGuide/VoiceCallStateServiceMetrics	Kafka not available	Critical	Actions: If the alarm is triggered for multiple services, make sure there are no issues with Kafka, and then restart Kafka. If the alarm is triggered only for pod '"`UNIQ--nowiki-0000004B-QINU`"', check if there is an issue with the pod.	kafka_producer_state, kafka_consumer_state	Kafka is not available for pod '"`UNIQ--nowiki-0000004C-QINU`"' for 5 consecutive minutes.
Draft:VM/Current/VMPEGuide/VoiceCallStateServiceMetrics	Max replicas is not sufficient for 5 mins	Critical	The desired number of replicas is higher than the current available replicas for the past 5 minutes.	kube_statefulset_replicas, kube_statefulset_status_replicas	The desired number of replicas is higher than the current available replicas for the past 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceCallStateServiceMetrics	Pod CPU greater than 65%	Warning	High CPU load for pod '"`UNIQ--nowiki-0000004F-QINU`"'.	container_cpu_usage_seconds_total, container_spec_cpu_period	Container '"`UNIQ--nowiki-00000050-QINU`"' CPU usage exceeded 65% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceCallStateServiceMetrics	Pod CPU greater than 80%	Critical	Critical CPU load for pod '"`UNIQ--nowiki-00000051-QINU`"'.	container_cpu_usage_seconds_total, container_spec_cpu_period	Container '"`UNIQ--nowiki-00000052-QINU`"' CPU usage exceeded 80% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceCallStateServiceMetrics	Pod memory greater than 65%	Warning	High memory usage for pod '"`UNIQ--nowiki-00000053-QINU`"'.	container_memory_working_set_bytes, kube_pod_container_resource_requests_memory_bytes	Container '"`UNIQ--nowiki-00000054-QINU`"' memory usage exceeded 65% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceCallStateServiceMetrics	Pod memory greater than 80%	Critical	Critical memory usage for pod '"`UNIQ--nowiki-00000055-QINU`"'.	container_memory_working_set_bytes, kube_pod_container_resource_requests_memory_bytes	Container '"`UNIQ--nowiki-00000056-QINU`"' memory usage exceeded 80% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceCallStateServiceMetrics	Pod status Failed	Warning	Actions: *Restart the pod. Check if there are any issues with the pod after restart.	kube_pod_status_phase	Pod '"`UNIQ--nowiki-00000046-QINU`"' is in Failed state.
Draft:VM/Current/VMPEGuide/VoiceCallStateServiceMetrics	Pod status NotReady	Critical	Actions: *Restart the pod. Check if there are any issues with the pod after restart.	kube_pod_status_ready	Pod '"`UNIQ--nowiki-00000049-QINU`"' is in NotReady status for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceCallStateServiceMetrics	Pod status Pending	Warning	Actions: *Restart the pod. Check if there are any issues with the pod after restart.	kube_pod_status_phase	Pod '"`UNIQ--nowiki-00000048-QINU`"' is in Pending state for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceCallStateServiceMetrics	Pod status Unknown	Warning	Actions: *Restart the pod. Check if there are any issues with pod after restart.	kube_pod_status_phase	Pod '"`UNIQ--nowiki-00000047-QINU`"' is in Unknown state for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceCallStateServiceMetrics	Redis not available	Critical	Actions: If the alarm is triggered for multiple services, make sure there are no issues with Redis, and then restart Redis. If the alarm is triggered only for pod '"`UNIQ--nowiki-0000004D-QINU`"', check if there is an issue with the pod.	callthread_redis_state	Redis is not available for pod '"`UNIQ--nowiki-0000004E-QINU`"' for 5 consecutive minutes.
Draft:VM/Current/VMPEGuide/VoiceCallStateServiceMetrics	Too many Kafka consumer crashes	Critical	Actions: If the alarm is triggered for multiple services, make sure there are no issues with Kafka, and then restart Kafka. If the alarm is triggered only for '"`UNIQ--nowiki-00000044-QINU`"', check if there is an issue with the service.	kafka_consumer_error_total	More than 3 Kafka consumer crashes in 5 minutes for topic '"`UNIQ--nowiki-00000045-QINU`"'.
Draft:VM/Current/VMPEGuide/VoiceCallStateServiceMetrics	Too many Kafka consumer failed health checks	Warning	Actions: If the alarm is triggered for multiple services, make sure there are no issues with Kafka, and then restart Kafka. If the alarm is triggered only for '"`UNIQ--nowiki-00000040-QINU`"', check if there is an issue with the service.	kafka_consumer_error_total	Health check failed more than 10 times in 5 minutes for Kafka consumer for topic '"`UNIQ--nowiki-00000041-QINU`"'.
Draft:VM/Current/VMPEGuide/VoiceCallStateServiceMetrics	Too many Kafka consumer request timeouts	Warning	Actions: If the alarm is triggered for multiple services, make sure there are no issues with Kafka, and then restart Kafka. If the alarm is triggered only for '"`UNIQ--nowiki-00000042-QINU`"', check if there is an issue with the service.	kafka_consumer_error_total	More than 10 request timeouts appeared in 5 minutes for Kafka consumer for topic '"`UNIQ--nowiki-00000043-QINU`"'.
Draft:VM/Current/VMPEGuide/VoiceCallStateServiceMetrics	Too many Kafka pending events	Critical	Actions: *Ensure there are no issues with Kafka or '"`UNIQ--nowiki-00000057-QINU`"' service's CPU and network.	kafka_producer_queue_depth	Too many Kafka producer pending events for service '"`UNIQ--nowiki-00000058-QINU`"' (more than 100 in 5 minutes).
Draft:VM/Current/VMPEGuide/VoiceConfigServiceMetrics	Container restarted repeatedly	Critical	Actions: *One of the containers in the pod has entered a failed state. Check the Kibana logs for the reason.	kube_pod_container_status_restarts_total	Container '"`UNIQ--nowiki-00000038-QINU`"' was restarted 5 or more times within 15 minutes.
Draft:VM/Current/VMPEGuide/VoiceConfigServiceMetrics	Pod CPU greater than 65%	Warning	High CPU load for pod '"`UNIQ--nowiki-0000003D-QINU`"'. Actions: Check whether the horizontal pod autoscaler has triggered and if the maximum number of pods has been reached. Check Grafana for abnormal load. *Collect the service logs; raise an investigation ticket.	container_cpu_usage_seconds_total, container_spec_cpu_period	Container '"`UNIQ--nowiki-0000003E-QINU`"' CPU usage exceeded 65% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceConfigServiceMetrics	Pod CPU greater than 80%	Critical	Critical CPU load for pod '"`UNIQ--nowiki-0000003F-QINU`"'. Actions: Check whether the horizontal pod autoscaler has triggered and if the maximum number of pods has been reached. Check Grafana for abnormal load. Restart the service. Collect the service logs; raise an investigation ticket.	container_cpu_usage_seconds_total, container_spec_cpu_period	Container '"`UNIQ--nowiki-00000040-QINU`"' CPU usage exceeded 80% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceConfigServiceMetrics	Pod Failed	Warning	Actions: *One of the containers in the pod has entered a failed state. Check the Kibana logs for the reason.	kube_pod_status_phase	Pod failed '"`UNIQ--nowiki-00000032-QINU`"'.
Draft:VM/Current/VMPEGuide/VoiceConfigServiceMetrics	Pod memory greater than 65%	Warning	High memory usage for pod '"`UNIQ--nowiki-00000039-QINU`"'. Actions: Check whether the horizontal pod autoscaler has triggered and if the maximum number of pods has been reached. Check Grafana for abnormal load. *Collect the service logs; raise an investigation ticket.	container_memory_working_set_bytes, kube_pod_container_resource_requests_memory_bytes	Container '"`UNIQ--nowiki-0000003A-QINU`"' memory usage exceeded 65% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceConfigServiceMetrics	Pod memory greater than 80%	Critical	Critical memory usage for pod '"`UNIQ--nowiki-0000003B-QINU`"'. Actions: Check whether the horizontal pod autoscaler has triggered and if the maximum number of pods has been reached. Check Grafana for abnormal load. Restart the service. Collect the service logs; raise an investigation ticket	container_memory_working_set_bytes, kube_pod_container_resource_requests_memory_bytes	Container '"`UNIQ--nowiki-0000003C-QINU`"' memory usage exceeded 80% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceConfigServiceMetrics	Pod Not ready for 10 minutes	Critical	Actions: If this alarm is triggered, check whether the CPU is available for the pods. Check whether the port of the pod is running and serving the request.	kube_pod_status_ready	Pod '"`UNIQ--nowiki-00000037-QINU`"' is in NotReady state for 10 minutes.
Draft:VM/Current/VMPEGuide/VoiceConfigServiceMetrics	Pod Pending state	Warning	Actions: If the alarm is triggered for multiple services, make sure the Kubernetes nodes where the pod is running are alive in the cluster. If the alarm is triggered only for the pod '"`UNIQ--nowiki-00000035-QINU`"', check the health of the pod.	kube_pod_status_phase	Pod '"`UNIQ--nowiki-00000036-QINU`"' is in Pending state for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceConfigServiceMetrics	Pod Unknown state	Warning	Actions: If the alarm is triggered for multiple services, make sure there are no issues with the Kubernetes cluster. If the alarm is triggered only for the pod '"`UNIQ--nowiki-00000033-QINU`"', check to see whether the image is correct and if the container is starting up.	kube_pod_status_phase	Pod '"`UNIQ--nowiki-00000034-QINU`"' is in Unknown state for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceConfigServiceMetrics	Redis disconnected for 10 minutes	Critical	Actions: If the alarm is triggered for multiple services, make sure there are no issues with Redis, then restart Redis. If the alarm is triggered only for the pod '"`UNIQ--nowiki-00000030-QINU`"', check to see if there is an issue with the pod.	redis_state	Redis is not available for the pod '"`UNIQ--nowiki-00000031-QINU`"' for 10 minutes.
Draft:VM/Current/VMPEGuide/VoiceConfigServiceMetrics	Redis disconnected for 5 minutes	Warning	Actions: If the alarm is triggered for multiple services, make sure there are no issues with Redis, then restart Redis. If the alarm is triggered only for the pod '"`UNIQ--nowiki-0000002E-QINU`"', check to see if there is an issue with the pod.	redis_state	Redis is not available for pod '"`UNIQ--nowiki-0000002F-QINU`"' for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceDialPlanServiceMetrics	Aggregated service health failing for 5 minutes	Critical	Actions: Check the dialplan dashboard for Aggregated Service Health errors and, in case of a Redis error, first check for any issues/crashes in the pod and then restart Redis. In the case of an Envoy error, the dialplan container will be restarted by the liveness probe. If the issue still exists	dialplan_health_level	Dependent services or the Envoy sidecar is not available for 5 minutes in the pod '"`UNIQ--nowiki-00000032-QINU`"'.
Draft:VM/Current/VMPEGuide/VoiceDialPlanServiceMetrics	DialPlan processing time > 0.5 seconds	Warning	Actions: If the alarm is generated for all dialplan pods, then Redis or network delay might be the most probable cause. If the alarm is generated in a single dialplan pod, then it might be due to Envoy or a network issue.	dialplan_response_time	When the latency for 95% of the dial plan messages is more than 0.5 seconds for a duration of 5 minutes, then this warning alarm is raised for the '"`UNIQ--nowiki-00000030-QINU`"'.
Draft:VM/Current/VMPEGuide/VoiceDialPlanServiceMetrics	DialPlan processing time > 2 seconds	Critical	Actions: If the alarm is generated for all dialplan pods, then Redis or network delay might be the most probable cause. If the alarm is generated in a single dialplan pod, then it might be due to Envoy or a network issue.	dialplan_response_time	If the latency for 95% of the dial plan messages is more than 2 seconds for a duration of 5 minutes, then this warning alarm is raised for the '"`UNIQ--nowiki-00000031-QINU`"'.
Draft:VM/Current/VMPEGuide/VoiceDialPlanServiceMetrics	Pod CPU greater than 65%	Warning	High CPU load for pod '"`UNIQ--nowiki-00000041-QINU`"'. Actions: Check whether the horizontal pod autoscaler has triggered and if the maximum number of pods has been reached. Check Grafana for abnormal load. *Collect the service logs; raise an investigation ticket.	container_cpu_usage_seconds_total, kube_pod_container_resource_limits	Container '"`UNIQ--nowiki-00000042-QINU`"' CPU usage exceeded 65% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceDialPlanServiceMetrics	Pod CPU greater than 80%	Critical	Critical CPU load for pod '"`UNIQ--nowiki-00000043-QINU`"'. Actions: Check whether the horizontal pod autoscaler has triggered and if the maximum number of pods has been reached. Check Grafana for abnormal load. Restart the service. Collect the service logs; raise an investigation ticket.	container_cpu_usage_seconds_total, kube_pod_container_resource_limits	Container '"`UNIQ--nowiki-00000044-QINU`"' CPU usage exceeded 80% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceDialPlanServiceMetrics	Pod Failed	Warning	Actions: *One of the containers in the pod has entered a failed state. Check the Kibana logs for the reason.	kube_pod_status_phase	Pod '"`UNIQ--nowiki-00000037-QINU`"' failed.
Draft:VM/Current/VMPEGuide/VoiceDialPlanServiceMetrics	Pod memory greater than 65%	Warning	High memory usage for pod '"`UNIQ--nowiki-0000003D-QINU`"'. Actions: Check whether the horizontal pod autoscaler has triggered and if the maximum number of pods has been reached. Check Grafana for abnormal load. *Collect the service logs; raise an investigation ticket	container_memory_working_set_bytes, kube_pod_container_resource_limits	Container '"`UNIQ--nowiki-0000003E-QINU`"' memory usage exceeded 65% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceDialPlanServiceMetrics	Pod memory greater than 80%	Critical	Critical memory usage for pod '"`UNIQ--nowiki-0000003F-QINU`"'. Actions: Check whether the horizontal pod autoscaler has triggered and if the maximum number of pods has been reached. Check Grafana for abnormal load. Restart the service. Collect the service logs; raise an investigation ticket	container_memory_working_set_bytes, kube_pod_container_resource_limits	Container '"`UNIQ--nowiki-00000040-QINU`"' memory usage exceeded 80% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceDialPlanServiceMetrics	Pod Not ready for 10 minutes	Critical	Actions: If this alarm is triggered, check whether the CPU is available for the pods. Check whether the port of the pod is running and serving the request.	kube_pod_status_ready	Pod '"`UNIQ--nowiki-0000003C-QINU`"' is in the NotReady state for 10 minutes.
Draft:VM/Current/VMPEGuide/VoiceDialPlanServiceMetrics	Pod Pending state	Warning	Actions: If the alarm is triggered for multiple services, make sure the Kubernetes nodes where the pod is running are alive in the cluster. If the alarm is triggered only for the pod '"`UNIQ--nowiki-0000003A-QINU`"', check the health of the pod.	kube_pod_status_phase	Pod '"`UNIQ--nowiki-0000003B-QINU`"' is in the Pending state for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceDialPlanServiceMetrics	Pod Unknown state	Warning	Actions: If the alarm is triggered for multiple services, make sure there are no issues with the Kubernetes cluster. If the alarm is triggered only for the pod '"`UNIQ--nowiki-00000038-QINU`"', check whether the image is correct and if the container is starting up.	kube_pod_status_phase	Pod '"`UNIQ--nowiki-00000039-QINU`"' is in Unknown state for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceDialPlanServiceMetrics	Redis disconnected for 10 minutes	Critical	Actions: If the alarm is triggered for multiple services, make sure there are no issues with Redis and then restart Redis. If the alarm is triggered only for the pod '"`UNIQ--nowiki-00000035-QINU`"', check to see if there is an issue with the pod.	redis_state	Redis is not available for the pod '"`UNIQ--nowiki-00000036-QINU`"' for 10 minutes.
Draft:VM/Current/VMPEGuide/VoiceDialPlanServiceMetrics	Redis disconnected for 5 minutes	Warning	Actions: If the alarm is triggered for multiple services, make sure there are no issues with Redis and then restart Redis. If the alarm is triggered only for the pod '"`UNIQ--nowiki-00000033-QINU`"', check to see if there is an issue with the pod.	redis_state	Redis is not available for the pod '"`UNIQ--nowiki-00000034-QINU`"' for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceFrontEndServiceMetrics	Container restarted repeatedly	Critical	Container '"`UNIQ--nowiki-00000076-QINU`"' was restarted 5 or more times within 15 minutes. Actions: Check if a new version of the image was deployed. Check for issues with the Kubernetes cluster.	kube_pod_container_status_restarts_total	Container '"`UNIQ--nowiki-00000077-QINU`"' was restarted 5 or more times within 15 minutes.
Draft:VM/Current/VMPEGuide/VoiceFrontEndServiceMetrics	Kafka not available	Critical	Kafka is not available for pod '"`UNIQ--nowiki-00000068-QINU`"'. Actions: If the alarm is triggered for multiple services, make sure there are no issues with Kafka, and then restart Kafka. If the alarm is triggered only for pod '"`UNIQ--nowiki-00000069-QINU`"', check if there is an issue wit	kafka_producer_state	Kafka is not available for pod '"`UNIQ--nowiki-0000006A-QINU`"' for 5 consecutive minutes.
Draft:VM/Current/VMPEGuide/VoiceFrontEndServiceMetrics	Max replicas is not sufficient for 5 mins	Critical	For the past 5 minutes, the desired number of replicas is higher than the number of replicas currently available. Actions: *Check resources available for Kubernetes. Increase resources, if necessary.	kube_statefulset_replicas, kube_statefulset_status_replicas	Desired number of replicas is higher than current available replicas for the past 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceFrontEndServiceMetrics	No requests received	Critical	Absence of received requests for pod '"`UNIQ--nowiki-00000060-QINU`"'. Actions: *For pod '"`UNIQ--nowiki-00000061-QINU`"', make sure there are no issues with Orchestration Service and Tenant Service or the network to them.	sipfe_requests_total	increase(sipfe_requests_total{pod=~"sipfe-.+"}[5m]) <= 0 and increase(sipfe_requests_total{pod=~"sipfe-.+"}[10m]) > 100
Draft:VM/Current/VMPEGuide/VoiceFrontEndServiceMetrics	Pod CPU greater than 65%	Warning	High CPU load for pod '"`UNIQ--nowiki-00000078-QINU`"'. Actions: Check whether the horizontal pod autoscaler has triggered and if the maximum number of pods has been reached. Check Grafana for abnormal load. *Collect the service logs for pod '"`UNIQ--nowiki-00000079-QINU`"'; raise an investi	container_cpu_usage_seconds_total, container_spec_cpu_period	Container '"`UNIQ--nowiki-0000007A-QINU`"' CPU usage exceeded 65% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceFrontEndServiceMetrics	Pod CPU greater than 80%	Critical	Critical CPU load for pod '"`UNIQ--nowiki-0000007B-QINU`"'. Actions: Check whether the horizontal pod autoscaler has triggered and if the maximum number of pods has been reached. Check Grafana for abnormal load. *Restart the service.	container_cpu_usage_seconds_total, container_spec_cpu_period	Container '"`UNIQ--nowiki-0000007C-QINU`"' CPU usage exceeded 80% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceFrontEndServiceMetrics	Pod memory greater than 65%	Warning	High memory usage for pod '"`UNIQ--nowiki-0000007D-QINU`"'. Actions: Check whether the horizontal pod autoscaler has triggered and if the maximum number of pods has been reached. Check Grafana for abnormal load. *Collect the service logs for pod '"`UNIQ--nowiki-0000007E-QINU`"'; raise an inv	container_memory_working_set_bytes, kube_pod_container_resource_requests_memory_bytes	Container '"`UNIQ--nowiki-0000007F-QINU`"' memory usage exceeded 65% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceFrontEndServiceMetrics	Pod memory greater than 80%	Critical	Critical memory usage for pod '"`UNIQ--nowiki-00000080-QINU`"'. Actions: Check whether the horizontal pod autoscaler has triggered and if the maximum number of pods has been reached. Check Grafana for abnormal load. *Restart the service for pod '"`UNIQ--nowiki-00000081-QINU`"'.	container_memory_working_set_bytes, kube_pod_container_resource_requests_memory_bytes	Container '"`UNIQ--nowiki-00000082-QINU`"' memory usage exceeded 80% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceFrontEndServiceMetrics	Pod status Failed	Warning	Pod '"`UNIQ--nowiki-0000006E-QINU`"' is in Failed state. Actions: *Restart the pod. Check to see if there are any issues with the pod after restart.	kube_pod_status_phase	Pod '"`UNIQ--nowiki-0000006F-QINU`"' is in Failed state.
Draft:VM/Current/VMPEGuide/VoiceFrontEndServiceMetrics	Pod status NotReady	Critical	Pod '"`UNIQ--nowiki-00000074-QINU`"' is in the NotReady state for 5 minutes. Actions: *Restart the pod. Check to see if there are any issues with the pod after restart.	kube_pod_status_ready	Pod '"`UNIQ--nowiki-00000075-QINU`"' is in the NotReady state for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceFrontEndServiceMetrics	Pod status Pending	Warning	Pod '"`UNIQ--nowiki-00000072-QINU`"' is in Pending state for 5 minutes. Actions: *Restart the pod. Check to see if there are any issues with the pod after restart.	kube_pod_status_phase	Pod '"`UNIQ--nowiki-00000073-QINU`"' is in Pending state for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceFrontEndServiceMetrics	Pod status Unknown	Warning	Pod '"`UNIQ--nowiki-00000070-QINU`"' is in Unknown state for 5 minutes. Actions: *Restart the pod. Check to see if there are any issues with the pod after restart.	kube_pod_status_phase	Pod '"`UNIQ--nowiki-00000071-QINU`"' is in Unknown state for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceFrontEndServiceMetrics	Pods less than Min Replicas	Critical	The current number of replicas is lower than the minimum number of replicas that should be available. Actions: *Check if Kubernetes cannot deploy new pods or if pods are failing in their status to be active/read.	kube_hpa_status_current_replicas, kube_hpa_spec_min_replicas	For the past 5 minutes, the current number of replicas is lower than the minimum number of replicas that should be available.
Draft:VM/Current/VMPEGuide/VoiceFrontEndServiceMetrics	Pods scaled up greater than 80%	Critical	For the past 5 minutes, the desired number of replicas is greater than the number of replicas currently available. Actions: *Check resources available for Kubernetes. Increase resources, if necessary.	kube_hpa_status_current_replicas, kube_hpa_spec_max_replicas	(kube_hpa_status_current_replicas{namespace="voice",hpa="sipfe-node-hpa"} * 100) / kube_hpa_spec_max_replicas{namespace="voice",hpa="sipfe-node-hpa"} > 80 for: 5m
Draft:VM/Current/VMPEGuide/VoiceFrontEndServiceMetrics	SIP Cluster Service response latency is too high	Critical	Actions: If the alarm is triggered for multiple pods, make sure there are no issues with the SIP Cluster Service (CPU, memory, or network overload). If the alarm is triggered only for pod '"`UNIQ--nowiki-0000005E-QINU`"', check if there is an issue with the pod (CPU, memory, or network overload	sipfe_sip_node_request_duration_seconds_bucket	Latency for 95% of messages is more than 0.5 seconds for service '"`UNIQ--nowiki-0000005F-QINU`"'.
Draft:VM/Current/VMPEGuide/VoiceFrontEndServiceMetrics	SIP Node(s) is not available	Critical	No available SIP Nodes for pod '"`UNIQ--nowiki-0000006B-QINU`"'. Actions: If the alarm is triggered for multiple services, make sure there are no issues with SIP Nodes, and then restart SIP Nodes. If the alarm is triggered only for pod '"`UNIQ--nowiki-0000006C-QINU`"', check if there is an i	sipfe_sip_nodes_total	No available SIP Nodes for pod '"`UNIQ--nowiki-0000006D-QINU`"' for 5 consecutive minutes.
Draft:VM/Current/VMPEGuide/VoiceFrontEndServiceMetrics	Too many failure responses sent	Critical	Too many failure responses are sent by the Front End service at pod '"`UNIQ--nowiki-00000062-QINU`"'. Actions: *For pod '"`UNIQ--nowiki-00000063-QINU`"', make sure received requests are valid.	sipfe_responses_total	More than 100 failure responses in 5 consecutive minutes.
Draft:VM/Current/VMPEGuide/VoiceFrontEndServiceMetrics	Too many Kafka pending producer events	Critical	Actions: *Make sure there are no issues with Kafka or '"`UNIQ--nowiki-0000005A-QINU`"' pod's CPU and network.	kafka_producer_queue_depth	Too many Kafka producer pending events for pod '"`UNIQ--nowiki-0000005B-QINU`"' (more than 100 in 5 minutes).
Draft:VM/Current/VMPEGuide/VoiceFrontEndServiceMetrics	Too many Kafka producer errors	Critical	Kafka responds with errors at pod '"`UNIQ--nowiki-00000064-QINU`"'. Actions: *For pod '"`UNIQ--nowiki-00000065-QINU`"', make sure there are no issues with Kafka.	kafka_producer_error_total	More than 100 errors in 5 consecutive minutes.
Draft:VM/Current/VMPEGuide/VoiceFrontEndServiceMetrics	Too many received requests without a response	Critical	Actions: Collect the service logs for pod '"`UNIQ--nowiki-0000005C-QINU`"'; raise an investigation ticket. Restart the service.	sipfe_requests_total	For too many requests, the Front End service at pod '"`UNIQ--nowiki-0000005D-QINU`"' did not send any response (more than 100 requests without a response, measured over 5 minutes).
Draft:VM/Current/VMPEGuide/VoiceFrontEndServiceMetrics	Too many SIP Cluster Service error responses	Critical	SIP Cluster Service responds with errors at pod '"`UNIQ--nowiki-00000066-QINU`"'. Actions: If the alarm is triggered for multiple pods, make sure there are no issues with the SIP Cluster Service (CPU, memory, or network overload). If the alarm is triggered only for pod '"`UNIQ--nowiki-0000006	sipfe_sip_node_responses_total	More than 100 errors in 5 consecutive minutes.
Draft:VM/Current/VMPEGuide/VoiceOrchestrationServiceMetrics	Container restored repeatedly	Critical	Actions: *One of the containers in the pod has entered a failed state. Check the Kibana logs for the reason.	kube_pod_container_status_restarts_total	Container '"`UNIQ--nowiki-00000042-QINU`"' was restarted 5 or more times within 15 minutes.
Draft:VM/Current/VMPEGuide/VoiceOrchestrationServiceMetrics	Number of running strategies is critical	Critical	Too many active sessions. Actions: Check whether the horizontal pod autoscaler has triggered and if the maximum number of pods has been reached. Check the number of voice, digital, and callback calls in the system.	orsnode_strategies	More than 600 strategies running in 5 consecutive seconds.
Draft:VM/Current/VMPEGuide/VoiceOrchestrationServiceMetrics	Number of running strategies is too high	Warning	Too many active sessions. Actions: Check whether the horizontal pod autoscaler has triggered and if the maximum number of pods has been reached. Check the number of voice, digital, and callback calls in the system.	orsnode_strategies	More than 400 strategies running in 5 consecutive seconds.
Draft:VM/Current/VMPEGuide/VoiceOrchestrationServiceMetrics	Pod CPU greater than 65%	Warning	High CPU load for pod '"`UNIQ--nowiki-00000047-QINU`"'. Actions: Check whether the horizontal pod autoscaler has triggered and if the maximum number of pods has been reached. Check Grafana for abnormal load. *Collect the service logs; raise an investigation ticket.	container_cpu_usage_seconds_total, container_spec_cpu_period	Container '"`UNIQ--nowiki-00000048-QINU`"' CPU usage exceeded 65% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceOrchestrationServiceMetrics	Pod CPU greater than 80%	Critical	Critical CPU load for pod '"`UNIQ--nowiki-00000049-QINU`"'. Actions: Check whether the horizontal pod autoscaler has triggered and if the maximum number of pods has been reached. Check Grafana for abnormal load. Restart the service. Collect the service logs; raise an investigation ticket.	container_cpu_usage_seconds_total, container_spec_cpu_period	Container '"`UNIQ--nowiki-0000004A-QINU`"' CPU usage exceeded 80% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceOrchestrationServiceMetrics	Pod in Pending state	Warning	Pod '"`UNIQ--nowiki-0000003D-QINU`"' is in Pending state. Actions: If the alarm is triggered for multiple services, make sure the Kubernetes nodes where the pod is running are alive in the cluster. If the alarm is triggered only for the pod '"`UNIQ--nowiki-0000003E-QINU`"', check the health	kube_pod_status_phase	Pod '"`UNIQ--nowiki-0000003F-QINU`"' is in Pending state for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceOrchestrationServiceMetrics	Pod in Unknown state	Warning	Pod '"`UNIQ--nowiki-0000003A-QINU`"' is in Unknown state. Actions: If the alarm is triggered for multiple services, make sure there are no issues with the Kubernetes cluster. If the alarm is triggered only for pod '"`UNIQ--nowiki-0000003B-QINU`"', check whether the image is correct and if th	kube_pod_status_phase	Pod '"`UNIQ--nowiki-0000003C-QINU`"' is in Unknown state for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceOrchestrationServiceMetrics	Pod memory greater than 65%	Warning	High memory usage for pod '"`UNIQ--nowiki-00000043-QINU`"'. Actions: Check whether the horizontal pod autoscaler has triggered and if the maximum number of pods has been reached. Check Grafana for abnormal load. *Collect the service logs; raise an investigation ticket.	container_memory_working_set_bytes, kube_pod_container_resource_requests_memory_bytes	Container '"`UNIQ--nowiki-00000044-QINU`"' memory usage exceeded 65% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceOrchestrationServiceMetrics	Pod memory greater than 80%	Critical	Critical memory usage for pod '"`UNIQ--nowiki-00000045-QINU`"'. Actions: Check whether the horizontal pod autoscaler has triggered and if the maximum number of pods has been reached. Check Grafana for abnormal load. Restart the service. Collect the service logs; raise an investigation ticket	container_memory_working_set_bytes, kube_pod_container_resource_requests_memory_bytes	Container '"`UNIQ--nowiki-00000046-QINU`"' memory usage exceeded 80% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceOrchestrationServiceMetrics	Pod Not ready for 10 minutes	Critical	Pod '"`UNIQ--nowiki-00000040-QINU`"' in NotReady state. Actions: If this alarm is triggered, check whether the CPU is available for the pods. Check whether the port of the pod is running and serving the request.	kube_pod_status_ready	Pod '"`UNIQ--nowiki-00000041-QINU`"' in NotReady state for 10 minutes.
Draft:VM/Current/VMPEGuide/VoiceOrchestrationServiceMetrics	Pod status Failed	Warning	Pod '"`UNIQ--nowiki-00000038-QINU`"' failed. Actions: *One of the containers in the pod has entered a Failed state. Check the Kibana logs for the reason.	kube_pod_status_phase	Pod '"`UNIQ--nowiki-00000039-QINU`"' is in Failed state.
Draft:VM/Current/VMPEGuide/VoiceOrchestrationServiceMetrics	Redis disconnected for 10 minutes	Critical	Actions: If the alarm is triggered for multiple services, make sure there are no issues with Redis, and then restart Redis. If the alarm is triggered only for pod '"`UNIQ--nowiki-00000036-QINU`"', check if there is an issue with the pod.	redis_state	Redis is not available for the pod '"`UNIQ--nowiki-00000037-QINU`"' for 10 minutes.
Draft:VM/Current/VMPEGuide/VoiceOrchestrationServiceMetrics	Redis disconnected for 5 minutes	Warning	Actions: If the alarm is triggered for multiple services, make sure there are no issues with Redis, and then restart Redis. If alarm is triggered only for pod '"`UNIQ--nowiki-00000034-QINU`"', check if there is an issue with the pod.	redis_state	Redis is not available for the pod '"`UNIQ--nowiki-00000035-QINU`"' for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceRegistrarServiceMetrics	Container restarted repeatedly	Critical	Actions: *One of the container in the pod has entered a Failed state. Check the Kibana logs for the reason.	kube_pod_container_status_restarts_total	Container '"`UNIQ--nowiki-00000060-QINU`"' was restarted 5 or more times within 15 minutes.
Draft:VM/Current/VMPEGuide/VoiceRegistrarServiceMetrics	Kafka events latency is too high	Warning	Actions: If the alarm is triggered for multiple topics, make sure there are no issues with Kafka (CPU, memory, or network overload). If the alarm is triggered only for topic '"`UNIQ--nowiki-00000048-QINU`"', check if there is an issue with the service related to the topic (CPU, memory, or netwo	kafka_consumer_latency_bucket	Latency for more than 5% of messages is more than 0.5 seconds for topic '"`UNIQ--nowiki-00000049-QINU`"'.
Draft:VM/Current/VMPEGuide/VoiceRegistrarServiceMetrics	Kafka not available	Critical	Kafka is not available for pod '"`UNIQ--nowiki-00000050-QINU`"'. Actions: If the alarm is triggered for multiple services, make sure there are no issues with Kafka, and then restart Kafka. If the alarm is triggered only for pod '"`UNIQ--nowiki-00000051-QINU`"', check if there is an issue wit	kafka_producer_state, kafka_consumer_state	Kafka is not available for pod '"`UNIQ--nowiki-00000052-QINU`"' for 5 consecutive minutes.
Draft:VM/Current/VMPEGuide/VoiceRegistrarServiceMetrics	Pod CPU greater than 65%	Warning	High CPU load for pod '"`UNIQ--nowiki-00000061-QINU`"'. Actions: Check whether the horizontal pod autoscaler has triggered and if the maximum number of pods has been reached. Check Grafana for abnormal load. *Collect the service logs; raise an investigation ticket.	container_cpu_usage_seconds_total, kube_pod_container_resource_limits	Container '"`UNIQ--nowiki-00000062-QINU`"' CPU usage exceeded 65% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceRegistrarServiceMetrics	Pod CPU greater than 80%	Critical	Critical CPU load for pod '"`UNIQ--nowiki-00000067-QINU`"'. Actions: Check whether the horizontal pod autoscaler has triggered and if the maximum number of pods has been reached. Check Grafana for abnormal load.	container_cpu_usage_seconds_total, kube_pod_container_resource_limits	Container '"`UNIQ--nowiki-00000068-QINU`"' CPU usage exceeded 80% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceRegistrarServiceMetrics	Pod Failed	Warning	Pod '"`UNIQ--nowiki-00000057-QINU`"' failed. Actions: *One of the containers in the pod has entered a failed state. Check the Kibana logs for the reason.	kube_pod_status_phase	Pod '"`UNIQ--nowiki-00000058-QINU`"' is in Failed state.
Draft:VM/Current/VMPEGuide/VoiceRegistrarServiceMetrics	Pod memory greater than 65%	Warning	High memory usage for pod '"`UNIQ--nowiki-00000063-QINU`"'. Actions: Check whether the horizontal pod autoscaler has triggered and if the maximum number of pods has been reached. Check Grafana for abnormal load. *Collect the service logs; raise an investigation ticket.	container_memory_working_set_bytes, kube_pod_container_resource_limits	Container '"`UNIQ--nowiki-00000064-QINU`"' memory usage exceeded 65% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceRegistrarServiceMetrics	Pod memory greater than 80%	Critical	Critical memory usage for pod '"`UNIQ--nowiki-00000065-QINU`"'. Actions: Check whether the horizontal pod autoscaler has triggered and if the maximum number of pods has been reached. Check Grafana for abnormal load. Restart the service. Collect the service logs: raise an investigation ticket	container_memory_working_set_bytes, kube_pod_container_resource_limits	Container '"`UNIQ--nowiki-00000066-QINU`"' memory usage exceeded 80% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceRegistrarServiceMetrics	Pod Not ready for 10 minutes	Critical	Actions: If this alarm is triggered, check whether the CPU is available for the pods. Check whether the port of the pod is running and serving the request.	kube_pod_status_ready	Pod '"`UNIQ--nowiki-0000005F-QINU`"' is in the NotReady state for 10 minutes.
Draft:VM/Current/VMPEGuide/VoiceRegistrarServiceMetrics	Pod Pending state	Warning	Pod '"`UNIQ--nowiki-0000005C-QINU`"' is in Pending state. Actions: If the alarm is triggered for multiple services, make sure the Kubernetes nodes where the pod is running are alive in the cluster. If the alarm is triggered only for pod '"`UNIQ--nowiki-0000005D-QINU`"', check the health of t	kube_pod_status_phase	Pod '"`UNIQ--nowiki-0000005E-QINU`"' is in Pending state for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceRegistrarServiceMetrics	Pod Unknown state	Warning	Pod '"`UNIQ--nowiki-00000059-QINU`"' is in Unknown state. Actions: If the alarm is triggered for multiple services, make sure there are no issues with the Kubernetes cluster. If the alarm is triggered only for pod '"`UNIQ--nowiki-0000005A-QINU`"', check whether the image is correct and if th	kube_pod_status_phase	Pod '"`UNIQ--nowiki-0000005B-QINU`"' is in Unknown state for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceRegistrarServiceMetrics	Redis disconnected for 10 minutes	Critical	Actions: If the alarm is triggered for multiple services, make sure there are no issues with Redis, and then restart Redis. If the alarm is triggered only for pod '"`UNIQ--nowiki-00000055-QINU`"', check if there is an issue with the pod.	redis_state	Redis is not available for pod '"`UNIQ--nowiki-00000056-QINU`"' for 10 minutes.
Draft:VM/Current/VMPEGuide/VoiceRegistrarServiceMetrics	Redis disconnected for 5 minutes	Warning	Actions: If the alarm is triggered for multiple services, make sure there are no issues with Redis, and then restart Redis. If the alarm is triggered only for pod '"`UNIQ--nowiki-00000053-QINU`"', check if there is an issue with the pod.	redis_state	Redis is not available for pod '"`UNIQ--nowiki-00000054-QINU`"' for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceRegistrarServiceMetrics	Too many Kafka consumer crashes	Critical	Actions: If the alarm is triggered for multiple services, make sure there are no issues with Kafka, and then restart Kafka. If the alarm is triggered only for '"`UNIQ--nowiki-0000004E-QINU`"', check if there is an issue with the service.	kafka_consumer_error_total	There were more than 3 Kafka consumer crashes within 5 minutes for service '"`UNIQ--nowiki-0000004F-QINU`"'.
Draft:VM/Current/VMPEGuide/VoiceRegistrarServiceMetrics	Too many Kafka consumer failed health checks	Warning	Actions: If the alarm is triggered for multiple services, make sure there are no issues with Kafka, and then restart Kafka. If the alarm is triggered only for '"`UNIQ--nowiki-0000004A-QINU`"', check if there is an issue with the service.	kafka_consumer_error_total	Health check failed more than 10 times in 5 minutes for Kafka consumer for topic '"`UNIQ--nowiki-0000004B-QINU`"'.
Draft:VM/Current/VMPEGuide/VoiceRegistrarServiceMetrics	Too many Kafka consumer request timeouts	Warning	Actions: If the alarm is triggered for multiple services, make sure there are no issues with Kafka, and then restart Kafka. If the alarm is triggered only for '"`UNIQ--nowiki-0000004C-QINU`"', check if there is an issue with the service.	kafka_consumer_error_total	There were more than 10 request timeouts within 5 minutes for the Kafka consumer for topic '"`UNIQ--nowiki-0000004D-QINU`"'.
Draft:VM/Current/VMPEGuide/VoiceRQServiceMetrics	Container restored repeatedly	Critical	Container '"`UNIQ--nowiki-0000004A-QINU`"' was repeatedly restarted. Actions: *One of the containers in the pod has entered a failed state. Check the Kibana logs for the reason.	kube_pod_container_status_restarts_total	Container '"`UNIQ--nowiki-0000004B-QINU`"' was restarted 5 or more times within 15 minutes.
Draft:VM/Current/VMPEGuide/VoiceRQServiceMetrics	Number of Redis streams is too high	Warning	Too many active sessions. Actions: Check whether the horizontal pod autoscaler has triggered and if the maximum number of pods has reached. Check the number of voice, digital, and callback calls in the system.	rqnode_streams	More than 10000 active streams running.
Draft:VM/Current/VMPEGuide/VoiceRQServiceMetrics	Pod CPU greater than 65%	Warning	High CPU load for pod '"`UNIQ--nowiki-00000050-QINU`"'. Actions: Check whether the horizontal pod autoscaler has triggered and if the maximum number of pods has been reached. Check Grafana for abnormal load. *Collect the service logs; raise an investigation ticket	container_cpu_usage_seconds_total, container_spec_cpu_period	Container '"`UNIQ--nowiki-00000051-QINU`"' CPU usage exceeded 65% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceRQServiceMetrics	Pod CPU greater than 80%	Critical	Critical CPU load for pod '"`UNIQ--nowiki-00000052-QINU`"'. Actions: Check whether the horizontal pod autoscaler has triggered and if the maximum number of pods has been reached. Check Grafana for abnormal load. Restart the service. Collect the service logs; raise an investigation ticket.	container_cpu_usage_seconds_total, container_spec_cpu_period	Container '"`UNIQ--nowiki-00000053-QINU`"' CPU usage exceeded 80% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceRQServiceMetrics	Pod failed	Warning	Pod '"`UNIQ--nowiki-00000040-QINU`"' failed. Actions: *One of the containers in the pod has entered a Failed state. Check the Kibana logs for the reason.	kube_pod_status_phase	Pod '"`UNIQ--nowiki-00000041-QINU`"' is in Failed state.
Draft:VM/Current/VMPEGuide/VoiceRQServiceMetrics	Pod memory greater than 65%	Warning	High memory usage for pod '"`UNIQ--nowiki-0000004C-QINU`"'. Actions: Check whether the horizontal pod autoscaler has triggered and if the maximum number of pods has been reached. Check Grafana for abnormal load. *Collect the service logs; raise an investigation ticket.	container_memory_working_set_bytes, kube_pod_container_resource_requests_memory_bytes	Container '"`UNIQ--nowiki-0000004D-QINU`"' memory usage exceeded 65% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceRQServiceMetrics	Pod memory greater than 80%	Critical	Critical memory usage for pod '"`UNIQ--nowiki-0000004E-QINU`"'. Actions: Check whether the horizontal pod autoscaler has triggered and if the maximum number of pods has been reached. Check Grafana for abnormal load. Restart the service. Collect the service logs; raise an investigation ticket	container_memory_working_set_bytes, kube_pod_container_resource_requests_memory_bytes	Container '"`UNIQ--nowiki-0000004F-QINU`"' memory usage exceeded 80% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceRQServiceMetrics	Pod not ready for 10 minutes	Critical	Pod '"`UNIQ--nowiki-00000048-QINU`"' in NotReady state. Actions: If this alarm is triggered, check whether the CPU is available for the pods. Check whether the port of the pod is running and serving the request.	kube_pod_status_ready	Pod '"`UNIQ--nowiki-00000049-QINU`"' in NotReady state for 10 minutes.
Draft:VM/Current/VMPEGuide/VoiceRQServiceMetrics	Pod Pending state	Warning	Pod '"`UNIQ--nowiki-00000045-QINU`"' is in the Pending state. Actions: If the alarm is triggered for multiple services, make sure the Kubernetes nodes where the pod is running are alive in the cluster. If the alarm is triggered only for the pod '"`UNIQ--nowiki-00000046-QINU`"', check the hea	kube_pod_status_phase	Pod '"`UNIQ--nowiki-00000047-QINU`"' is in the Pending state for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceRQServiceMetrics	Pod Unknown state	Warning	Pod '"`UNIQ--nowiki-00000042-QINU`"' in Unknown state. Actions: If the alarm is triggered for multiple services, make sure there are no issues with the Kubernetes cluster. If the alarm is triggered only for the pod '"`UNIQ--nowiki-00000043-QINU`"', check whether the image is correct and if t	kube_pod_status_phase	Pod '"`UNIQ--nowiki-00000044-QINU`"' in Unknown state for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceRQServiceMetrics	Redis disconnected for 10 minutes	Critical	Redis is not available for the pod '"`UNIQ--nowiki-0000003D-QINU`"'. Actions: If the alarm is triggered for multiple services, make sure there are no issues with Redis, and then restart Redis. If the alarm is triggered only for the pod '"`UNIQ--nowiki-0000003E-QINU`"', check to see if there	redis_state	Redis is not available for the pod '"`UNIQ--nowiki-0000003F-QINU`"' for 10 minutes.
Draft:VM/Current/VMPEGuide/VoiceRQServiceMetrics	Redis disconnected for 5 minutes	Warning	Redis is not available for the pod '"`UNIQ--nowiki-0000003A-QINU`"'. Actions: If the alarm is triggered for multiple services, make sure there are no issues with Redis, restart Redis. If the alarm is triggered only for the pod '"`UNIQ--nowiki-0000003B-QINU`"', check to see if there is any is	redis_state	Redis is not available for the pod '"`UNIQ--nowiki-0000003C-QINU`"' for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceSIPClusterServiceMetrics	Calls activity drop	Warning	A noticeable reduction in the number of active calls on a specific SIP Server and no new calls are arriving for processing. Actions: If a problematic SIP Server is primary, do a switchover, and then restart the former primary server. If a problematic SIP Server is backup, restart the backup serv	sips_calls, sips_calls_created	The absolute value of active calls on a specific SIP Server dropped by more than 30 calls in 2 minutes and no new calls are arriving at the SIP Server for processing.
Draft:VM/Current/VMPEGuide/VoiceSIPClusterServiceMetrics	Container Restarted Repeatedly	Critical	Container '"`UNIQ--nowiki-00000053-QINU`"' was repeatedly restarted. Actions: Check if the new version of the image was deployed. Check for issues with the Kubernetes cluster.	kube_pod_container_status_restarts_total	Container '"`UNIQ--nowiki-00000054-QINU`"' was restarted 5 or more times within 15 minutes.
Draft:VM/Current/VMPEGuide/VoiceSIPClusterServiceMetrics	Dial Plan Node Down	Critical	No Dial Plan nodes are reachable from SIP Server and all connections to Dial Plan nodes are down. Actions: Check the network connection between SIP Server and the Dial Plan node host. Check the Dial Plan node CPU and memory usage.	sips_dp_active_connections	All connections to Dial Plan nodes are down.
Draft:VM/Current/VMPEGuide/VoiceSIPClusterServiceMetrics	Dial Plan node is overloaded	Critical	Dial Plan node is overloaded as the response latency increases. Actions: Check that the inbound call rate to SIP Server is not too high. Check the Dial Plan node CPU and memory usage. *Check the network connection between SIP Server and Dial Plan nodes.	sips_dp_average_response_latency	Dial Plan node is overloaded as the response latency increases (more than 1000).
Draft:VM/Current/VMPEGuide/VoiceSIPClusterServiceMetrics	Dial Plan Queue Increase	Critical	Because Dial Plan requests are huge in size or there is a connection issue with the Dial Plan node, the processing queue size increases in size. Actions: Check SIP Server inbound call rate. Check the connection between SIP Server and the Dial Plan node.	sips_dp_queue_size	The processing queue size is greater than 10 requests for 1 minute.
Draft:VM/Current/VMPEGuide/VoiceSIPClusterServiceMetrics	Dialplan Node problem	Warning	Dial Plan node rejects requests with an error or it doesn't respond to requests and requests are timed out. Actions: Check the network connection between SIP Server and the Dial Plan host. Check that Dial Plan nodes are running.	sips_dp_timeouts	During 1 minute, the Dial Plan node rejects more than 5 requests with an error or more than 5 requests time out because the Dial Plan node fails to respond.
Draft:VM/Current/VMPEGuide/VoiceSIPClusterServiceMetrics	Kafka not available	Critical	Kafka is not available for pod '"`UNIQ--nowiki-0000004D-QINU`"'. Actions: If the alarm is triggered for multiple services, ensure there are no issues with Kafka. Restart Kafka. If the alarm is triggered only for pod '"`UNIQ--nowiki-0000004E-QINU`"', check if there is an issue with the pod.	kafka_producer_state	Kafka is not available for pod '"`UNIQ--nowiki-0000004F-QINU`"' for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceSIPClusterServiceMetrics	Media service is out of service	Critical	Media service is out of service. Actions: Troubleshoot the SIP Server-to-Resource Manager (RM) network connection. Collect network stats and escalate to the Network team to resolve network issues, if necessary. Troubleshoot RM, consider RM restart. *After 5 minutes, redirect traffic to another s	sips_msml_in_service	Media service is out of service for more than 1 minute.
Draft:VM/Current/VMPEGuide/VoiceSIPClusterServiceMetrics	Pod CPU greater than 65%	Warning	High CPU load for pod '"`UNIQ--nowiki-0000005A-QINU`"'. Actions: Check whether the horizontal pod autoscaler has triggered and if the maximum number of pods has been reached. Check Grafana for abnormal load. *Collect the service logs for pod '"`UNIQ--nowiki-0000005B-QINU`"'; raise an investi	container_cpu_usage_seconds_total, container_spec_cpu_period	Container '"`UNIQ--nowiki-0000005C-QINU`"' CPU usage exceeded 65% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceSIPClusterServiceMetrics	Pod CPU greater than 80%	Critical	Critical CPU load for pod '"`UNIQ--nowiki-00000057-QINU`"'. Actions: Check whether the horizontal pod autoscaler has triggered and if the maximum number of pods has been reached. Check Grafana for abnormal load. *Collect the service logs for pod '"`UNIQ--nowiki-00000058-QINU`"'; raise an inv	container_cpu_usage_seconds_total, container_spec_cpu_period	Container '"`UNIQ--nowiki-00000059-QINU`"' CPU usage exceeded 80% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceSIPClusterServiceMetrics	Pod memory greater than 65%	Warning	High memory usage for pod '"`UNIQ--nowiki-00000060-QINU`"'. Actions: Check whether the horizontal pod autoscaler has triggered and if the maximum number of pods has been reached. Check Grafana for abnormal load. *Collect the service logs for pod '"`UNIQ--nowiki-00000061-QINU`"'; raise an inv	container_memory_working_set_bytes, kube_pod_container_resource_requests_memory_bytes	Container '"`UNIQ--nowiki-00000062-QINU`"' memory usage exceeded 65% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceSIPClusterServiceMetrics	Pod memory greater than 80%	Critical	Critical memory usage for pod '"`UNIQ--nowiki-0000005D-QINU`"'. Actions: Check whether the horizontal pod autoscaler has triggered and if the maximum number of pods has been reached. Check Grafana for abnormal load. *Restart the service for pod '"`UNIQ--nowiki-0000005E-QINU`"'.	container_memory_working_set_bytes, kube_pod_container_resource_requests_memory_bytes	Container '"`UNIQ--nowiki-0000005F-QINU`"' memory usage exceeded 80% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceSIPClusterServiceMetrics	Pod Status Error	Warning	Actions: *Restart the pod. Check if there are any issues with the pod after restart.	kube_pod_status_phase	Pod '"`UNIQ--nowiki-00000050-QINU`"' is in Failed, Unknown, or Pending state.
Draft:VM/Current/VMPEGuide/VoiceSIPClusterServiceMetrics	Pod Status NotReady	Warning	Pod '"`UNIQ--nowiki-00000051-QINU`"' is in NotReady state. Actions: *Restart the pod. Check if there are any issues with the pod after restart.	kube_pod_status_ready	Pod '"`UNIQ--nowiki-00000052-QINU`"' is in NotReady state for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceSIPClusterServiceMetrics	Pods less than Min Replicas	Critical	The current number of replicas is less than the minimum replicas that should be available. This might be because Kubernetes cannot deploy a new pod or pods are failing to be active/ready. Actions: If all services have the same issue, then check Kubernetes nodes and Consul health. If the issue is	kube_hpa_status_current_replicas, kube_hpa_spec_min_replicas	For 5 consecutive minutes, the number of replicas is less than the minimum replicas that should be available.
Draft:VM/Current/VMPEGuide/VoiceSIPClusterServiceMetrics	Pods scaled up greater than 80%	Critical	The current number of replicas is more than 80% of the maximum number of replicas. Actions: *Check if max replicas must be modified based on load.	kube_hpa_status_current_replicas, kube_hpa_spec_max_replicas	For 5 consecutive minutes, the number of replicas is more than 80% of the maximum number of replicas.
Draft:VM/Current/VMPEGuide/VoiceSIPClusterServiceMetrics	Ready Pods below 60%	Critical	The number of statefulset '"`UNIQ--nowiki-00000055-QINU`"' pods in the Ready state has dropped below 60%. Actions: Check if the new version of the image was deployed. Check for issues with the Kubernetes cluster.	kube_statefulset_status_replicas_ready, kube_statefulset_status_replicas_current	For the last 5 minutes, fewer than 60% of the currently available statefulset '"`UNIQ--nowiki-00000056-QINU`"' pods have been in the Ready state.
Draft:VM/Current/VMPEGuide/VoiceSIPClusterServiceMetrics	Redis not available	Critical	Redis is not available for pod '"`UNIQ--nowiki-00000063-QINU`"'. Actions: If the alarm is triggered for multiple services, ensure there are no issues with Redis. Restart Redis. If the alarm is triggered only for pod '"`UNIQ--nowiki-00000064-QINU`"', check if there is an issue with the pod.	redis_state	Redis is not available for pod '"`UNIQ--nowiki-00000065-QINU`"' for 5 consecutive minutes.
Draft:VM/Current/VMPEGuide/VoiceSIPClusterServiceMetrics	Routing timeout counter growth	Warning	The trigger detects that routing timeouts are increasing. Actions: Check the URS_RESPONSE_MORE5SEC stat value. If it's increasing, then investigate why URS doesn't respond to SIP Server in time. Check SIPS-to-URS network connectivity.	sips_routing_timeouts	The absolute value of NROUTINGTIMEOUTS on a specific SIP Server increased by more than 20 in 2 minutes.
Draft:VM/Current/VMPEGuide/VoiceSIPClusterServiceMetrics	SIP Node HealthCheck Fail	Critical	SIP Node health level fails for pod '"`UNIQ--nowiki-0000004B-QINU`"'. Actions: Check for failure of dependent services (Redis/Kafka/SIP Proxy/GVP/Dial Plan). Check for Envoy proxy failure, then restart the pod.	sipnode_health_level	SIP Node health level fails for pod '"`UNIQ--nowiki-0000004C-QINU`"' for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceSIPClusterServiceMetrics	SIP Proxy is out of service	Critical	Actions: Troubleshoot the SIP Server-to-SIP Proxy nodes network connections. Collect network stats and escalate to the Network team to resolve network issues, if necessary. Troubleshoot SIP Proxy nodes.	sips_sipproxy_in_service	SIP Proxy is out of service.
Draft:VM/Current/VMPEGuide/VoiceSIPClusterServiceMetrics	SIP Proxy overloaded	Critical	SIP Proxy is overloaded. Actions: Check SIP Proxy nodes for CPU and memory usage. If SIP Proxy nodes have acceptable CPU and memory usage, then check for errors or a "hang-up" state which could delay SIP Proxy in forwarding. *Check the SBC side for network delays.	sips_sip_response_time_ms_sum, sips_sip_response_time_ms_count	Response time is greater than 20 milliseconds for 1 minute.
Draft:VM/Current/VMPEGuide/VoiceSIPClusterServiceMetrics	SIP Server main thread consuming more than 65% CPU for 5 mins	Warning	Main thread consumes too much CPU. Actions: *Collect SIP Server Main thread logs; that is, log files without index in the file name (appname_date.log files). Raise an investigation ticket.	sips_cpu_usage_main	Main thread consumes too much CPU (more than 65% for 5 consecutive minutes).
Draft:VM/Current/VMPEGuide/VoiceSIPClusterServiceMetrics	SIP softswitch is out of service	Critical	Actions: Troubleshoot the SIP Server-to-SBC network connection. Collect network stats and escalate to the Network team to resolve network issues, if necessary. Troubleshoot the SBC.	sips_softswitch_in_service	SIP softswitch is out of service.
Draft:VM/Current/VMPEGuide/VoiceSIPClusterServiceMetrics	SIP trunk is out of service	Critical	SIP trunk is out of service. Actions: For Primary and Secondary trunks: Troubleshoot SIP Server-to-SBC network connection. Collect network stats and escalate to the Network team to resolve network issues, if necessary. *Troubleshoot the SBC. For Inter-SIP Server trunks: troubleshoot the SIP Se	sips_trunk_in_service	SIP trunk is out of service for more than 1 minute.
Draft:VM/Current/VMPEGuide/VoiceSIPClusterServiceMetrics	Too many Kafka pending events	Critical	Too many Kafka producer pending events for pod '"`UNIQ--nowiki-00000048-QINU`"'. Actions: *Ensure there are no issues with Kafka, '"`UNIQ--nowiki-00000049-QINU`"' pod's CPU, and network.	kafka_producer_queue_depth	Too many Kafka producer pending events for service '"`UNIQ--nowiki-0000004A-QINU`"' (more than 100 in 5 minutes).
Draft:VM/Current/VMPEGuide/VoiceSIPClusterServiceMetrics	Too many Kafka producer errors	Critical	Kafka responds with errors at pod '"`UNIQ--nowiki-00000066-QINU`"'. Actions: *For pod '"`UNIQ--nowiki-00000067-QINU`"', ensure there are no issues with Kafka.	kafka_producer_error_total	More than 100 errors for 5 consecutive minutes.
Draft:VM/Current/VMPEGuide/VoiceSIPProxyServiceMetrics	Config node fail	Warning	The request to the config node failed. Action: *Check if there is any problem with pod '"`UNIQ--nowiki-00000079-QINU`"' and config node.	http_client_response_count	Requests to the config node fail for 5 consecutive minutes.
Draft:VM/Current/VMPEGuide/VoiceSIPProxyServiceMetrics	Container restarted repeatedly	Critical	Container '"`UNIQ--nowiki-00000062-QINU`"' was repeatedly restarted. Actions: *Check to see if a new version of the image was deployed. Also check for issues with the Kubernetes cluster.	kube_pod_container_status_restarts_total	Container '"`UNIQ--nowiki-00000063-QINU`"' was restarted 5 or more times within 15 minutes.
Draft:VM/Current/VMPEGuide/VoiceSIPProxyServiceMetrics	No sip-nodes available for 2 minutes	Critical	No sip-nodes are available for the pod '"`UNIQ--nowiki-00000064-QINU`"'. Actions: If the alarm is triggered for multiple services, make sure there are no issues with sip-nodes. If the alarm is triggered only for pod '"`UNIQ--nowiki-00000065-QINU`"', check to see if there is any issues with t	sipproxy_active_sip_nodes_count	No sip-nodes are available for the pod '"`UNIQ--nowiki-00000066-QINU`"' for 2 minutes.
Draft:VM/Current/VMPEGuide/VoiceSIPProxyServiceMetrics	Pod CPU greater than 65%	Warning	High CPU load for pod '"`UNIQ--nowiki-00000070-QINU`"'. Actions: Check whether the horizontal pod autoscaler has triggered and the maximum number of pods has been reached. Check Grafana for abnormal load. *Collect the service logs for pod '"`UNIQ--nowiki-00000071-QINU`"' and raise an investi	container_cpu_usage_seconds_total, container_spec_cpu_period	Container '"`UNIQ--nowiki-00000072-QINU`"' CPU usage exceeded 65% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceSIPProxyServiceMetrics	Pod CPU greater than 80%	Critical	Critical CPU load for pod '"`UNIQ--nowiki-0000006D-QINU`"'. Actions: Check whether the horizontal pod autoscaler has triggered and the maximum number of pods has been reached. Check Grafana for abnormal load. *Collect the service logs for pod '"`UNIQ--nowiki-0000006E-QINU`"' and raise an inv	container_cpu_usage_seconds_total, container_spec_cpu_period	Container '"`UNIQ--nowiki-0000006F-QINU`"' CPU usage exceeded 80% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceSIPProxyServiceMetrics	Pod memory greater than 65%	Warning	Pod '"`UNIQ--nowiki-00000076-QINU`"' has high memory usage. Actions: Check whether the horizontal pod autoscaler has triggered and the maximum number of pods has been reached. Check Grafana for abnormal load. *Collect the service logs for pod '"`UNIQ--nowiki-00000077-QINU`"' and raise an inv	container_memory_working_set_bytes, kube_pod_container_resource_requests_memory_bytes	Container '"`UNIQ--nowiki-00000078-QINU`"' memory usage exceeded 65% for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceSIPProxyServiceMetrics	Pod memory greater than 80%	Critical	Critical memory usage for pod '"`UNIQ--nowiki-00000073-QINU`"'. Actions: Check whether the horizontal pod autoscaler has triggered and the maximum number of pods has been reached. Check Grafana for abnormal load. *Restart the service for pod '"`UNIQ--nowiki-00000074-QINU`"'.	container_memory_working_set_bytes, kube_pod_container_resource_requests_memory_bytes	Container '"`UNIQ--nowiki-00000075-QINU`"' memory usage exceeded 80% for 5 minutes
Draft:VM/Current/VMPEGuide/VoiceSIPProxyServiceMetrics	Pod status failed	Warning	Actions: *Restart the pod and check to see if there are any issues with the pod after restart.	kube_pod_status_phase	Pod '"`UNIQ--nowiki-0000005B-QINU`"' is in Failed state.
Draft:VM/Current/VMPEGuide/VoiceSIPProxyServiceMetrics	Pod status NotReady	Critical	Pod '"`UNIQ--nowiki-00000060-QINU`"' is in NotReady state. Actions: *Restart the pod and check to see if there are any issues with the pod after restart.	kube_pod_status_ready	Pod '"`UNIQ--nowiki-00000061-QINU`"' is in NotReady state for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceSIPProxyServiceMetrics	Pod status Pending	Warning	Pod '"`UNIQ--nowiki-0000005E-QINU`"' is in Pending state. Actions: *Restart the pod and check to see if there are any issues with the pod after restart.	kube_pod_status_phase	Pod '"`UNIQ--nowiki-0000005F-QINU`"' is in Pending state for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceSIPProxyServiceMetrics	Pod status Unknown	Warning	Pod '"`UNIQ--nowiki-0000005C-QINU`"' is in Unknown state. Actions: *Restart the pod and check to see if there are any issues with the pod after restart.	kube_pod_status_phase	Pod '"`UNIQ--nowiki-0000005D-QINU`"' is in Unknown state for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceSIPProxyServiceMetrics	SIP server response time too high	Warning	Actions: If the alarm is triggered for multiple sipproxy-nodes, make sure there are no issues on '"`UNIQ--nowiki-00000057-QINU`"'. If the alarm is triggered only for sipproxy-node '"`UNIQ--nowiki-00000058-QINU`"', check to see if there is an issue with the service related to the topic (CPU, m	sipproxy_response_latency_bucket	SIP response latency for more than 95% of messages forwarded to '"`UNIQ--nowiki-00000059-QINU`"' is more than 1 second for sipproxy-node '"`UNIQ--nowiki-0000005A-QINU`"'.
Draft:VM/Current/VMPEGuide/VoiceSIPProxyServiceMetrics	sip-node capacity limit reached	Warning	The sip-node '"`UNIQ--nowiki-00000067-QINU`"' hit capacity limit on '"`UNIQ--nowiki-00000068-QINU`"'. Actions: If alarm is triggered for multiple services make sure there is no issues with sip-node '"`UNIQ--nowiki-00000069-QINU`"'. If alarm is triggered only for pod '"`UNIQ--nowiki-000000	sipproxy_sip_node_is_capacity_available	The sip-node '"`UNIQ--nowiki-0000006B-QINU`"' hit capacity limit on '"`UNIQ--nowiki-0000006C-QINU`"' for 3 consecutive minutes.
Draft:VM/Current/VMPEGuide/VoiceSIPProxyServiceMetrics	Too many Kafka pending events	Critical	Too many Kafka producer pending events for pod '"`UNIQ--nowiki-00000054-QINU`"'. This alert means there are issues with SIP REGISTER processing on this voice-sipproxy. Actions: *Make sure there are no issues with Kafka or with the '"`UNIQ--nowiki-00000055-QINU`"' pod's CPU and network.	kafka_producer_queue_depth	Too many Kafka producer pending events for service '"`UNIQ--nowiki-00000056-QINU`"' (more than 100 in 5 minutes).
Draft:VM/Current/VMPEGuide/VoiceVoicemailServiceMetrics	ContainerRestartedRepeatedly	Critical	The Voicemail pod restarts repeatedly.	kube_pod_container_status_restarts_total	Container '"`UNIQ--nowiki-00000022-QINU`"' was restarted 5 or more times within 15 minutes.
Draft:VM/Current/VMPEGuide/VoiceVoicemailServiceMetrics	PodStatusNotReadyfor10mins	Critical	The Voicemail pod is down.	kube_pod_status_ready	The Voicemail pod is down for more than 10 minutes.
Draft:VM/Current/VMPEGuide/VoiceVoicemailServiceMetrics	VoicemailConfigHealthFailedCritical	Critical	Voicemail Service '"`UNIQ--nowiki-00000025-QINU`"' GWS service is not available.	voicemail_config_node_status	Voicemail Service '"`UNIQ--nowiki-00000026-QINU`"' GWS service is not available for 10 minutes.
Draft:VM/Current/VMPEGuide/VoiceVoicemailServiceMetrics	VoicemailConfigRequestFailureCritical	Critical	Voicemail Service '"`UNIQ--nowiki-0000001E-QINU`"' unable to connect to Config Node.	voicemail_config_request_failed_total	At least 6 requests failed per minute for the past 10 minutes.
Draft:VM/Current/VMPEGuide/VoiceVoicemailServiceMetrics	VoicemailEnvoyHealthFailedCritical	Critical	Voicemail Service '"`UNIQ--nowiki-00000023-QINU`"' Envoy service is not available.	voicemail_envoy_proxy_status	Voicemail Service '"`UNIQ--nowiki-00000024-QINU`"' Envoy service is not available for 10 minutes.
Draft:VM/Current/VMPEGuide/VoiceVoicemailServiceMetrics	VoicemailGWSHealthFailedCritical	Critical	Voicemail Service '"`UNIQ--nowiki-00000027-QINU`"' GWS service is not available.	voicemail_gws_status	Voicemail Service '"`UNIQ--nowiki-00000028-QINU`"' GWS service is not available for 15 minutes.
Draft:VM/Current/VMPEGuide/VoiceVoicemailServiceMetrics	VoicemailRedisConnectionDown	Critical	Voicemail Service '"`UNIQ--nowiki-0000001F-QINU`"' unable to connect to the Redis cluster.	voicemail_redis_connection_failure	At least 6 requests failed per minute for the past 10 minutes.
Draft:VM/Current/VMPEGuide/VoiceVoicemailServiceMetrics	voicemail_node_cpu_usage_80	Critical	Critical CPU load for pod '"`UNIQ--nowiki-00000021-QINU`"'.	container_cpu_usage_seconds_total, kube_pod_container_resource_requests_cpu_cores	The Voicemail pod exceeded 80% CPU usage for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceVoicemailServiceMetrics	voicemail_node_memory_usage_80	Critical	Critical memory usage for pod '"`UNIQ--nowiki-00000020-QINU`"'.	container_memory_working_set_bytes, kube_pod_container_resource_requests_memory_bytes	The Voicemail pod exceeded 80% memory usage for 5 minutes.
Draft:VM/Current/VMPEGuide/VoiceVoicemailServiceMetrics	voicemail_storage_failed_account	Outage	The Storage account is down and, as a result, the service will not be able to fetch the data.	voicemail_storage_failed_account	The Storage account is down.
Draft:WebRTC/Current/WebRTCPEGuide/WebRTC Metrics	webrtc-gateway-es	warning	Specifies that the Gateway Pod has lost connection to ElasticSearch	wrtc_system_error	Need input
Draft:WebRTC/Current/WebRTCPEGuide/WebRTC Metrics	webrtc-gateway-gauth	warning	Specifies that the Gateway Pod has lost connection to Auth service	wrtc_system_error	Need input
Draft:WebRTC/Current/WebRTCPEGuide/WebRTC Metrics	webrtc-gateway-gws	warning	Specifies that the Gateway Pod has lost connection to the Environment Service	wrtc_system_error	Need input
Draft:WebRTC/Current/WebRTCPEGuide/WebRTC Metrics	webrtc-gateway-signins	warning	Specifies the number of sign-ins	wrtc_current_signins	15mins
GVP/Current/GVPPEGuide/GVP Configuration Server Metrics	ContainerCPUreached70percentForConfigserver	HIGH	The trigger will flag an alarm when the Configserver container CPU utilization goes beyond 70% for 15 mins	container_cpu_usage_seconds_total, container_spec_cpu_quota, container_spec_cpu_period	15mins
GVP/Current/GVPPEGuide/GVP Configuration Server Metrics	ContainerMemoryUseOver1GBForConfigserver	HIGH	The trigger will flag an alarm when the Configserver container working memory has exceeded 1GB for 15 mins	container_memory_working_set_bytes	15mins
GVP/Current/GVPPEGuide/GVP Configuration Server Metrics	ContainerMemoryUseOver90PercentForConfigserver	HIGH	The trigger will flag an alarm when the Configserver container working memory use is over 90% of the limit for 15 mins	container_memory_working_set_bytes, kube_pod_container_resource_limits_memory_bytes	15mins
GVP/Current/GVPPEGuide/GVP Configuration Server Metrics	ContainerNotRunningForConfigserver	HIGH	This alert is triggered when the Configserver container has not been running for 15 minutes	kube_pod_container_status_running	15mins
GVP/Current/GVPPEGuide/GVP Configuration Server Metrics	ContainerNotRunningForServiceHandler	MEDIUM	This alert is triggered when the service-handler container has not been running for 15 minutes	kube_pod_container_status_running	15mins
GVP/Current/GVPPEGuide/GVP Configuration Server Metrics	ContainerRestartsOver4ForConfigserver	HIGH	This alert is triggered when the Configserver container restarts in 15 mins exceeded 4	kube_pod_container_status_restarts_total	15mins
GVP/Current/GVPPEGuide/GVP Configuration Server Metrics	ContainerRestartsOver4ForServiceHandler	MEDIUM	This alert is triggered when the service-handler container restarts exceeded 4 for 15 mins	kube_pod_container_status_running	15mins
GVP/Current/GVPPEGuide/GVP MCP Metrics	ContainerCPUreached70percentForMCP	HIGH	The trigger will flag an alarm when the MCP container CPU utilization goes beyond 70% for 5 mins	container_cpu_usage_seconds_total, container_spec_cpu_quota, container_spec_cpu_period	15mins
GVP/Current/GVPPEGuide/GVP MCP Metrics	ContainerMemoryUseOver7GBForMCP	HIGH	The trigger will flag an alarm when the MCP container working memory has exceeded 7GB for 5 mins	container_memory_working_set_bytes	15mins
GVP/Current/GVPPEGuide/GVP MCP Metrics	ContainerMemoryUseOver90PercentForMCP	HIGH	The trigger will flag an alarm when the MCP container working memory use is over 90% of the limit for 5 mins	container_memory_working_set_bytes, kube_pod_container_resource_limits_memory_bytes	15mins
GVP/Current/GVPPEGuide/GVP MCP Metrics	ContainerRestartsOver2ForMCP	HIGH	The trigger will flag an alarm when the MCP container restarts exceeded 2 for 15 mins	kube_pod_container_status_restarts_total	15mins
GVP/Current/GVPPEGuide/GVP MCP Metrics	MCP_MEDIA_ERROR_CRITICAL	CRITICAL	Number of LMSIP media errors exceeded critical limit	gvp_mcp_log_parser_eror_total {LogID="33008",endpoint="mcplog"...}	30mins
GVP/Current/GVPPEGuide/GVP MCP Metrics	MCP_SDP_PARSE_ERROR	WARNING	Number of SDP parse errors exceeded limit	gvp_mcp_log_parser_eror_total {LogID="33006",endpoint="mcplog"...}	N/A
GVP/Current/GVPPEGuide/GVP MCP Metrics	MCP_WEBSOCKET_CLIENT_OPEN_ERROR	HIGH	There are errors opening a session with a websocket client	gvp_mcp_log_parser_eror_total {LogID="40026",endpoint="mcplog"...}	N/A
GVP/Current/GVPPEGuide/GVP MCP Metrics	MCP_WEBSOCKET_CLIENT_PROTOCOL_ERROR	HIGH	There are protocol errors with a websocket client	gvp_mcp_log_parser_eror_total {LogID="40026",endpoint="mcplog"...}	N/A
GVP/Current/GVPPEGuide/GVP MCP Metrics	MCP_WEBSOCKET_TOKEN_CONFIG_ERROR	HIGH	There are errors getting information for Auth token with a websocket client	gvp_mcp_log_parser_eror_total {LogID="40026",endpoint="mcplog"...}	N/A
GVP/Current/GVPPEGuide/GVP MCP Metrics	MCP_WEBSOCKET_TOKEN_CREATE_ERROR	HIGH	There are errors creating a JWT token with a websocket client	gvp_mcp_log_parser_eror_total {LogID="40026",endpoint="mcplog"...}	N/A
GVP/Current/GVPPEGuide/GVP MCP Metrics	MCP_WEBSOCKET_TOKEN_FETCH_ERROR	HIGH	There are errors fetching Auth token with a websocket client	gvp_mcp_log_parser_eror_total {LogID="40026",endpoint="mcplog"...}	N/A
GVP/Current/GVPPEGuide/GVP MCP Metrics	NGI_LOG_FETCH_RESOURCE_ERROR	MEDIUM	Number of VXMLi fetch errors exceeded limit	gvp_mcp_log_parser_eror_total {LogID="40026",endpoint="mcplog"...}	1min
GVP/Current/GVPPEGuide/GVP MCP Metrics	NGI_LOG_FETCH_RESOURCE_ERROR_4XX	WARNING	Number of VXMLi 4xx fetch errors exceeded limit	gvp_mcp_log_parser_eror_total {LogID="40032",endpoint="mcplog"...}	1min
GVP/Current/GVPPEGuide/GVP MCP Metrics	NGI_LOG_FETCH_RESOURCE_TIMEOUT	MEDIUM	Number of VXMLi fetch timeouts exceeded limit	gvp_mcp_log_parser_eror_total {LogID="40026",endpoint="mcplog"...}	1min
GVP/Current/GVPPEGuide/GVP MCP Metrics	NGI_LOG_PARSE_ERROR	WARNING	Number of VXMLi parse errors exceeded limit	gvp_mcp_log_parser_eror_total {LogID="40028",endpoint="mcplog"...}	1min
GVP/Current/GVPPEGuide/Reporting Server Metrics	ContainerCPUreached80percent	HIGH	The trigger will flag an alarm when the RS container CPU utilization goes beyond 80% for 15 mins	container_cpu_usage_seconds_total, container_spec_cpu_quota, container_spec_cpu_period	15mins
GVP/Current/GVPPEGuide/Reporting Server Metrics	ContainerMemoryUsage80percent	HIGH	The trigger will flag an alarm when the RS container Memory utilization goes beyond 80% for 15 mins	container_memory_usage_bytes, kube_pod_container_resource_limits_memory_bytes	15mins
GVP/Current/GVPPEGuide/Reporting Server Metrics	ContainerRestartedRepeatedly	CRITICAL	The trigger will flag an alarm when the RS or RS SNMP container gets restarted 5 or more times within 15 mins	kube_pod_container_status_restarts_total	15mins
GVP/Current/GVPPEGuide/Reporting Server Metrics	InitContainerFailingRepeatedly	CRITICAL	The trigger will flag an alarm when the RS init container gets failed 5 or more times within 15 mins	kube_pod_init_container_status_restarts_total	15mins
GVP/Current/GVPPEGuide/Reporting Server Metrics	PodStatusNotReady	CRITICAL	The trigger will flag an alarm when RS pod status is Not ready for 30 mins and this will be controlled through override-value.yaml file.	kube_pod_status_ready	30mins
GVP/Current/GVPPEGuide/Reporting Server Metrics	PVC50PercentFilled	HIGH	This trigger will flag an alarm when the RS PVC size is 50% filled	kubelet_volume_stats_used_bytes, kubelet_volume_stats_capacity_bytes	15mins
GVP/Current/GVPPEGuide/Reporting Server Metrics	PVC80PercentFilled	CRITICAL	This trigger will flag an alarm when the RS PVC size is 80% filled	kubelet_volume_stats_used_bytes, kubelet_volume_stats_capacity_bytes	5mins
GVP/Current/GVPPEGuide/Reporting Server Metrics	RSQueueSizeCritical	HIGH	The trigger will flag an alarm when RS JMS message queue size goes beyond 15000 (3GB approx. backlog) for 15 mins	rsQueueSize	15mins
GVP/Current/GVPPEGuide/Resource Manager Metrics	ContainerCPUreached80percentForRM0	HIGH	The trigger will flag an alarm when the RM container CPU utilization goes beyond 80% for 15 mins	container_cpu_usage_seconds_total, container_spec_cpu_quota, container_spec_cpu_period	15mins
GVP/Current/GVPPEGuide/Resource Manager Metrics	ContainerCPUreached80percentForRM1	HIGH	The trigger will flag an alarm when the RM container CPU utilization goes beyond 80% for 15 mins	container_cpu_usage_seconds_total, container_spec_cpu_quota, container_spec_cpu_period	15mins
GVP/Current/GVPPEGuide/Resource Manager Metrics	ContainerMemoryUsage80percentForRM0	HIGH	The trigger will flag an alarm when the RM container Memory utilization goes beyond 80% for 15 mins	container_memory_rss, kube_pod_container_resource_limits_memory_bytes	15mins
GVP/Current/GVPPEGuide/Resource Manager Metrics	ContainerMemoryUsage80percentForRM1	HIGH	The trigger will flag an alarm when the RM container Memory utilization goes beyond 80% for 15 mins	container_memory_rss, kube_pod_container_resource_limits_memory_bytes	15mins
GVP/Current/GVPPEGuide/Resource Manager Metrics	ContainerRestartedRepeatedly	CRITICAL	The trigger will flag an alarm when the RM or RM SNMP container gets restarted 5 or more times within 15 mins	kube_pod_container_status_restarts_total	15 mins
GVP/Current/GVPPEGuide/Resource Manager Metrics	InitContainerFailingRepeatedly	CRITICAL	The trigger will flag an alarm when the RM init container gets failed 5 or more times within 15 mins.	kube_pod_init_container_status_restarts_total	15 mins
GVP/Current/GVPPEGuide/Resource Manager Metrics	MCPPortsExceeded	HIGH	All the MCP ports in MCP LRG are exceeded	gvp_rm_log_parser_eror_total	1min
GVP/Current/GVPPEGuide/Resource Manager Metrics	PodStatusNotReady	CRITICAL	The trigger will flag an alarm when RM pod status is Not ready for 30 mins and this will be controlled by override-value.yaml.	kube_pod_status_ready	30mins
GVP/Current/GVPPEGuide/Resource Manager Metrics	RM Service Down	CRITICAL	RM pods are not in ready state and RM service is not available	kube_pod_container_status_running	0
GVP/Current/GVPPEGuide/Resource Manager Metrics	RMConfigServerConnectionLost	HIGH	RM lost connection to GVP Configuration Server for 5mins.	gvp_rm_log_parser_warn_total	5 mins
GVP/Current/GVPPEGuide/Resource Manager Metrics	RMInterNodeConnectivityBroken	HIGH	Inter-node connectivity between RM nodes is lost for 5mins.	gvp_rm_log_parser_warn_total	5 mins
GVP/Current/GVPPEGuide/Resource Manager Metrics	RMMatchingIVRTenantNotFound	MEDIUM	Matching IVR profile tenant could not be found for 2mins	gvp_rm_log_parser_eror_total	2mins
GVP/Current/GVPPEGuide/Resource Manager Metrics	RMResourceAllocationFailed	MEDIUM	RM Resource allocation failed for 1mins	gvp_rm_log_parser_eror_total	1min
GVP/Current/GVPPEGuide/Resource Manager Metrics	RMServiceDegradedTo50Percentage	HIGH	One of the RM container is not in running state for 5mins	kube_pod_container_status_running	5mins
GVP/Current/GVPPEGuide/Resource Manager Metrics	RMSocketInterNodeError	HIGH	RM Inter node Socket Error for 5mins.	gvp_rm_log_parser_eror_total	5mins
GVP/Current/GVPPEGuide/Resource Manager Metrics	RMTotal4XXErrorForINVITE	MEDIUM	The RM mib counter stats will be collected for every 60 seconds and if the mib counter total4xxInviteSent increments from its previous value by 10 within 60 seconds the trigger will flag an alarm.	rmTotal4xxInviteSent	1min
GVP/Current/GVPPEGuide/Resource Manager Metrics	RMTotal5XXErrorForINVITE	HIGH	The RM mib counter stats will be collected for every 30 seconds and if the mib counter total5xxInviteSent increments from its previous value by 5 within 5 minutes the trigger will flag an alarm.	rmTotal5xxInviteSent	5 mins
PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES-NODE-JS-DELAY-WARNING	Warning	Triggers if the base NodeJS event loop becomes excessive. This indicates significant resource and performance issues with the deployment.	application_ccecp_nodejs_eventloop_lag_seconds	Triggered when the event loop is greater than 5 milliseconds for a period exceeding 5 minutes.
PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_CB_ENQUEUE_LIMIT_REACHED	Info	GES is throttling callbacks to a given phone number.	CB_ENQUEUE_LIMIT_REACHED	Triggered when GES has begun throttling callbacks to a given number within the past 2 minutes.
PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_CB_SUBMIT_FAILED	Info	GES has failed to submit a callback to ORS.	CB_SUBMIT_FAILED	Triggered when GES has failed to submit a callback to ORS in the past 2 minutes for any reason.
PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_CB_TTL_LIMIT_REACHED	Info	GES is throttling callbacks for a specific tenant.	CB_TTL_LIMIT_REACHED	Triggered when GES has started throttling callbacks within the past 2 minutes.
PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_CPU_USAGE	Info	GES has high CPU usage for 1 minute.	ges_process_cpu_seconds_total	Triggered when the average CPU usage (measured by ges_process_cpu_seconds_total) is greater than 90% for 1 minute.
PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_DNS_FAILURE	Warning	A GES pod has encountered difficulty resolving DNS requests.	DNS_FAILURE	Triggered when GES encounters any DNS failures within the last 30 minutes.
PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_GWS_AUTH_DOWN	Warning	Connection to the Genesys Authentication Service is down.	GWS_AUTH_STATUS	Triggered when the connection to the Genesys Authentication Service is down for 5 minutes.
PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_GWS_CONFIG_DOWN	Warning	Connection to the GWS Configuration Service is down.	GWS_CONFIG_STATUS	Triggered when the connection to the GWS Configuration Service is down.
PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_GWS_ENVIRONMENT_DOWN	Warning	Connection to the GWS Environment Service is down.	GWS_ENV_STATUS	Triggered when the connection to the GWS Environment Service is down.
PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_GWS_INCORRECT_CLIENT_CREDENTIALS	Warning	The GWS client credentials provided to GES are incorrect.	GWS_INCORRECT_CLIENT_CREDENTIALS	Triggered when GWS has had any issue with the GES client credentials in the last 5 minutes.
PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_GWS_SERVER_ERROR	Warning	GES has encountered server or connection errors with GWS.	GWS_SERVER_ERROR	Triggered when there has been a GWS server error in the past 5 minutes.
PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_HEALTH	Critical	One or more downstream components (PostGres, Config Server, GWS, ORS) are down. '''Note:''' Because GES goes into a crash loop when Redis is down, this does not fire when Redis is down.	GES_HEALTH	Triggered when any component is down for any length of time.
PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_HTTP_400_POD	Info	An individual GES pod is returning excessive HTTP 400 results.	ges_http_failed_requests_total, http_400_tolerance	Triggered when two or more HTTP 400 results are returned from a pod within a 5-minute period.
PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_HTTP_401_POD	Info	An individual GES pod is returning excessive HTTP 401 results.	ges_http_failed_requests_total, http_401_tolerance	Triggered when two or more HTTP 401 results are returned from a pod within a 5-minute period.
PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_HTTP_404_POD	Info	An individual GES pod is returning excessive HTTP 404 results.	ges_http_failed_requests_total, http_404_tolerance	Triggered when two or more HTTP 404 results are returned from a pod within a 5-minute period.
PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_HTTP_500_POD	Info	An individual GES pod is returning excessive HTTP 500 results.	ges_http_failed_requests_total, http_500_tolerance	Triggered when two or more HTTP 500 results are returned from a pod within a 5-minute period.
PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_INVALID_CONTENT_LENGTH	Info	Fires if GES encounters any incoming requests that have exceeded the maximum content length of 10mb on the internal port and 500kb for the external, public-facing port.	INVALID_CONTENT_LENGTH, invalid_content_length_tolerance	Triggered when one instance of a message with an invalid length is received. Silenced after 2 minutes.
PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_LOGGING_FAILURE	Warning	GES has failed to write a message to the log.	LOGGING_FAILURE	Triggered when there are any failures writing to the logs. Silenced after 1 minute.
PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_MEMORY_USAGE	Info	GES has high memory usage for a period of 90 seconds.	ges_nodejs_heap_space_size_used_bytes, ges_nodejs_heap_space_size_available_bytes	Triggered when memory usage (measured as a ratio of Used Heap Space vs Available Heap Space) is above 80% for a 90-second interval.
PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_NEXUS_ACCESS_FAILURE	Warning	GES has been having difficulties contacting Nexus. This alert is only relevant for customers who leverage the Push Notification feature in Genesys Callback.	NEXUS_ACCESS_FAILURE	Triggered when GES has failed to connect or communicate with Nexus more than 30 times over the last hour.
PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_NOT_READY_CRITICAL	Critical	GES pods are not in the `Ready` state. Indicative of issues with the Redis connection or other problems with the Helm deployment.	kube_pod_container_status_ready	Triggered when more than 50% of GES pods have not been in a Ready state for 5 minutes.
PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_NOT_READY_WARNING	Warning	GES pods are not in the `Ready` state. Indicative of issues with the Redis connection or other problems with the Helm deployment.	kube_pod_container_status_ready	Triggered when 25% (or more) of GES pods have not been in a Ready state for 10 minutes.
PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_ORS_REDIS_DOWN	Critical	Connection to ORS_REDIS is down.	ORS_REDIS_STATUS	Triggered when the ORS_REDIS connection is down for 5 consecutive minutes.
PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_PODS_RESTART	Critical	GES pods have been excessively crashing and restarting.	kube_pod_container_status_restarts_total	Triggered when there have been more than five pod restarts in the past 15 minutes.
PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_RBAC_CREATE_VQ_PROXY_ERROR	Info	Fires if there are issues with GES managing VQ Proxy Objects.	RBAC_CREATE_VQ_PROXY_ERROR, rbac_create_vq_proxy_error_tolerance	Triggered when there are at least 1000 instances of issues managing VQ Proxy objects within a 10-minute period.
PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_SLOW_HTTP_RESPONSE_TIME	Warning	Fired if the average response time for incoming requests begins to lag.	ges_http_request_duration_seconds_sum, ges_http_request_duration_seconds_count	Triggered when the average response time for incoming requests is above 1.5 seconds for a sustained period of 15 minutes.
PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_UNCAUGHT_EXCEPTION	Warning	There has been an uncaught exception within GES.	UNCAUGHT_EXCEPTION	Triggered when GES encounters any uncaught exceptions. Silenced after 1 minute.
PEC-CAB/Current/CABPEGuide/CallbackMetrics	GES_UP	Critical	Fires when fewer than two GES pods have been up for the last 15 minutes.		Triggered when fewer than two GES pods are up for 15 consecutive minutes.
PEC-DC/Current/DCPEGuide/DCMetrics	Memory usage is above 3000 Mb	Critical	Triggered when the memory usage on this pod is above 3000 Mb for 15 minutes.	nexus_process_resident_memory_bytes	For 15 minutes
PEC-DC/Current/DCPEGuide/DCMetrics	Nexus error rate	Critical	Triggered when the error rate on this pod is greater than 20% for 15 minutes.	nexus_errors_total, nexus_request_total	For 15 minutes
PEC-IWD/Current/IWDPEGuide/IWD metrics and alerts	Database connections above 75	HIGH	Triggered when pod database connections number is above 75.		Default number of connections: 75
PEC-IWD/Current/IWDPEGuide/IWD metrics and alerts	IWD DB errors	CRITICAL	Triggered when IWD experiences more than 2 errors within 1 minute during operations with database.		Default number of errors: 2
PEC-IWD/Current/IWDPEGuide/IWD metrics and alerts	IWD error rate	CRITICAL	Triggered when the number of errors in IWD exceeds the threshold for 15 min period.		Default number of errors: 2
PEC-IWD/Current/IWDPEGuide/IWD metrics and alerts	Memory usage is above 3000 Mb	CRITICAL	Triggered when the pod memory usage is above 3000 MB.		Default memory usage: 3000 MB
PEC-OU/Current/CXCPEGuide/APIAMetrics	CXC-API-LatencyHigh	HIGH	Triggered when the latency for API responses is beyond the defined threshold.		2500ms for 5m
PEC-OU/Current/CXCPEGuide/APIAMetrics	CXC-API-Redis-Connection-Failed	HIGH	Triggered when the connection to redis fails for more than 1 minute.		1m
PEC-OU/Current/CXCPEGuide/APIAMetrics	CXC-CPUUsage	HIGH	Triggered when the CPU utilization of a pod is beyond the threshold		300% for 5m
PEC-OU/Current/CXCPEGuide/APIAMetrics	CXC-EXT-Ingress-Error-Rate	HIGH	Triggered when the Ingress error rate is above the specified threshold.		20% for 5m
PEC-OU/Current/CXCPEGuide/APIAMetrics	CXC-MemoryUsage	HIGH	Triggered when the memory utilization of a pod is beyond the threshold.		70% for 5m
PEC-OU/Current/CXCPEGuide/APIAMetrics	CXC-MemoryUsagePD	HIGH	Triggered when the memory usage of a pod is above the critical threshold.		90% for 5m
PEC-OU/Current/CXCPEGuide/APIAMetrics	CXC-PodNotReadyCount	HIGH	Triggered when the number of pods ready for a CX Contact deployment is less than or equal to the threshold.		1 for 5m
PEC-OU/Current/CXCPEGuide/APIAMetrics	CXC-PodRestartsCount	HIGH	Triggered when the restart count for a pod is beyond the threshold.		1 for 5m
PEC-OU/Current/CXCPEGuide/APIAMetrics	CXC-PodRestartsCountPD	HIGH	Triggered when the restart count is beyond the critical threshold.		5 for 5m
PEC-OU/Current/CXCPEGuide/APIAMetrics	CXC-PodsNotReadyPD	HIGH	Triggered when there are no pods ready for CX Contact deployment.		0 for 1m
PEC-OU/Current/CXCPEGuide/APIAMetrics	cxc_api_too_many_errors_from_auth	HIGH	Triggered when there are too many error responses from the auth service for more than the specified time threshold.		1m
PEC-OU/Current/CXCPEGuide/CPGMMetrics	CXC-CM-Redis-Connection-Failed	HIGH	Triggered when the connection to redis fails for more than 1 minute.		1m
PEC-OU/Current/CXCPEGuide/CPGMMetrics	CXC-CPUUsage	HIGH	Triggered when a the CPU utilization of a pod is beyond the threshold		300% for 5m
PEC-OU/Current/CXCPEGuide/CPGMMetrics	CXC-MemoryUsage	HIGH	Triggered when the memory utilization of a pod is beyond the threshold.		70% for 5m
PEC-OU/Current/CXCPEGuide/CPGMMetrics	CXC-MemoryUsagePD	HIGH	Triggered when the memory usage of a pod is above the critical threshold.		90% for 5m
PEC-OU/Current/CXCPEGuide/CPGMMetrics	CXC-PodNotReadyCount	HIGH	Triggered when the number of pods ready for a CX Contact deployment is less than or equal to the threshold.		1 for 5m
PEC-OU/Current/CXCPEGuide/CPGMMetrics	CXC-PodRestartsCount	HIGH	Triggered when the restart count for a pod is beyond the threshold.		1 for 5m
PEC-OU/Current/CXCPEGuide/CPGMMetrics	CXC-PodRestartsCountPD	HIGH	Triggered when the restart count is beyond the critical threshold.		5 for 5m
PEC-OU/Current/CXCPEGuide/CPGMMetrics	CXC-PodsNotReadyPD	HIGH	Triggered when there are no pods ready for CX Contact deployment.		0 for 1m
PEC-OU/Current/CXCPEGuide/CPLMMetrics	CXC-CoM-Redis-no-active-connections	HIGH	Triggered when CX Contact compliance has no active redis connection for 2 minutes		2m
PEC-OU/Current/CXCPEGuide/CPLMMetrics	CXC-Compliance-LatencyHigh	HIGH	Triggered when the latency for API responses is beyond the defined threshold.		5000ms for 5m
PEC-OU/Current/CXCPEGuide/CPLMMetrics	CXC-CPUUsage	HIGH	Triggered when the CPU utilization of a pod is beyond the threshold.		300% for 5m
PEC-OU/Current/CXCPEGuide/CPLMMetrics	CXC-MemoryUsage	HIGH	Triggered when the memory utilization of a pod is beyond the threshold.		70% for 5m
PEC-OU/Current/CXCPEGuide/CPLMMetrics	CXC-MemoryUsagePD	HIGH	Triggered when the memory usage of a pod is above the critical threshold.		90% for 5m
PEC-OU/Current/CXCPEGuide/CPLMMetrics	CXC-PodNotReadyCount	HIGH	Triggered when the number of pods ready for a CX Contact deployment is less than or equal to the threshold.		1 for 5m
PEC-OU/Current/CXCPEGuide/CPLMMetrics	CXC-PodRestartsCount	HIGH	Triggered when the restart count for a pod is beyond the threshold.		1 for 5m
PEC-OU/Current/CXCPEGuide/CPLMMetrics	CXC-PodRestartsCountPD	HIGH	Triggered when the restart count is beyond the critical threshold.		5 for 5m
PEC-OU/Current/CXCPEGuide/CPLMMetrics	CXC-PodsNotReadyPD	HIGH	Triggered when there are no pods ready for CX Contact deployment.		0 for 1m
PEC-OU/Current/CXCPEGuide/DMMetrics	CXC-CPUUsage	HIGH	Triggered when the CPU utilization of a pod is beyond the threshold		300% for 5m
PEC-OU/Current/CXCPEGuide/DMMetrics	CXC-DM-LatencyHigh	HIGH	Triggered when the latency for dial manager is above the defined threshold.		5000ms for 5m
PEC-OU/Current/CXCPEGuide/DMMetrics	CXC-MemoryUsage	HIGH	Triggered when the memory utilization of a pod is beyond the threshold.		70% for 5m
PEC-OU/Current/CXCPEGuide/DMMetrics	CXC-MemoryUsagePD	HIGH	Triggered when the memory usage of a pod is above the critical threshold.		90% for 5m
PEC-OU/Current/CXCPEGuide/DMMetrics	CXC-PodNotReadyCount	HIGH	Triggered when the number of pods ready for a CX Contact deployment is less than or equal to the threshold.		1 for 5m
PEC-OU/Current/CXCPEGuide/DMMetrics	CXC-PodRestartsCount	HIGH	Triggered when the restart count for a pod is beyond the threshold.		1 for 5m
PEC-OU/Current/CXCPEGuide/DMMetrics	CXC-PodRestartsCountPD	HIGH	Triggered when the restart count is beyond the critical threshold.		5 for 5m

View (previous 500 | next 500) (20 | 50 | 100 | 250 | 500)

Modify query

Table(s):
Field(s):
Where:
Join on:
Group by:
Having:
Order by:
Limit:
Offset:
Format: