Voice Platform Resource Manager metrics and alerts
Find the metrics Voice Platform Resource Manager exposes and the alerts defined for Voice Platform Resource Manager.
Service | CRD or annotations? | Port | Endpoint/Selector | Metrics update interval |
---|---|---|---|---|
Voice Platform Resource Manager | ServiceMonitor / PodMonitor | 9116,
8200 |
Metrics endpoints:
curl -v "http://<RM_POD_IP>:9116/snmp?target=127.0.0.1%3A1161&module=if_mib"
curl -v "http://<RM_POD_IP>:8200/log Enable metrics: Service/Pod Monitoring Settingsprometheus:
enabled: true
metric:
port: 9116
log:
port: 8200 podMonitor:
enabled: true
metric:
path: /snmp
module: [ if_mib ]
target: [ 127.0.0.1:1161 ]
log:
path: /log
monitor:
|
See details about:
Metrics[edit source]
Metric and description | Metric details | Indicator of |
---|---|---|
rmTotal5xxInviteSent
Number of 5xx that were received for INVITE sent by RM |
Unit: Unsigned32 Type: Gauge |
Error |
rmTotal4xxInviteSent
Number of 4xx that were received for INVITE sent by RM |
Unit: Unsigned32 Type: Gauge |
Error |
rmPRStatus
GVP_RM_PhysicalResourceTable.Status: Current state of the resource (rmPRStatus{gvpConfigDBID="174",rmPRName="mcp-10.206.5.89",rmPRStatus="AVAILABLE"}) |
Unit: DisplayString Type: Gauge |
Information |
rmPRActiveCalls
GVP_RM_PhysicalResourceTable.ActiveCalls: Number of calls that currently are handled by the resource. (rmPRActiveCalls{gvpConfigDBID="174",rmPRName="mcp-10.206.5.89"}) |
Unit: Unsigned32 Type: Gauge |
Traffic |
rmPRTotalCalls
GVP_RM_PhysicalResourceTable.TotalCalls: Total number of calls that have been handled by this resource since it was connected to RM. (rmPRTotalCalls{gvpConfigDBID="174",rmPRName="mcp-10.206.5.89"}) |
Unit: Unsigned32 Type: Gauge |
Traffic |
rmTenantCurrentInboundCalls
Number of active inbound calls that use this tenant. |
Unit: Unsigned32 Type: Gauge |
Traffic |
rmTenantPeakCalls
Maximum number of concurrent calls to this Tenant since it became active |
Unit: Type: Gauge |
Traffic |
Alerts[edit source]
The following alerts are defined for Voice Platform Resource Manager.
Alert | Severity | Description | Based on | Threshold |
---|---|---|---|---|
RM Service Down | CRITICAL | RM pods are not in ready state and RM service is not available | kube_pod_container_status_running | 0
|
InitContainerFailingRepeatedly | CRITICAL | The trigger will flag an alarm when the RM init container gets failed 5 or more times within 15 mins. | kube_pod_init_container_status_restarts_total | 15 mins
|
ContainerRestartedRepeatedly | CRITICAL | The trigger will flag an alarm when the RM or RM SNMP container gets restarted 5 or more times within 15 mins | kube_pod_container_status_restarts_total | 15 mins
|
PodStatusNotReady | CRITICAL | The trigger will flag an alarm when RM pod status is Not ready for 30 mins and this will be controlled by override-value.yaml. | kube_pod_status_ready | 30mins
|
RMTotal5XXErrorForINVITE | HIGH | The RM mib counter stats will be collected for every 30 seconds and if the mib counter total5xxInviteSent increments from its previous value by 5 within 5 minutes the trigger will flag an alarm. | rmTotal5xxInviteSent | 5 mins
|
RMTotal4XXErrorForINVITE | MEDIUM | The RM mib counter stats will be collected for every 60 seconds and if the mib counter total4xxInviteSent increments from its previous value by 10 within 60 seconds the trigger will flag an alarm. | rmTotal4xxInviteSent | 1min
|
RMInterNodeConnectivityBroken | HIGH | Inter-node connectivity between RM nodes is lost for 5mins. | gvp_rm_log_parser_warn_total | 5 mins
|
RMConfigServerConnectionLost | HIGH | RM lost connection to GVP Configuration Server for 5mins. | gvp_rm_log_parser_warn_total | 5 mins
|
RMSocketInterNodeError | HIGH | RM Inter node Socket Error for 5mins. | gvp_rm_log_parser_eror_total | 5mins
|
ContainerCPUreached80percentForRM0 | HIGH | The trigger will flag an alarm when the RM container CPU utilization goes beyond 80% for 15 mins | container_cpu_usage_seconds_total, container_spec_cpu_quota, container_spec_cpu_period | 15mins
|
ContainerCPUreached80percentForRM1 | HIGH | The trigger will flag an alarm when the RM container CPU utilization goes beyond 80% for 15 mins | container_cpu_usage_seconds_total, container_spec_cpu_quota, container_spec_cpu_period | 15mins
|
ContainerMemoryUsage80percentForRM0 | HIGH | The trigger will flag an alarm when the RM container Memory utilization goes beyond 80% for 15 mins | container_memory_rss, kube_pod_container_resource_limits_memory_bytes | 15mins
|
ContainerMemoryUsage80percentForRM1 | HIGH | The trigger will flag an alarm when the RM container Memory utilization goes beyond 80% for 15 mins | container_memory_rss, kube_pod_container_resource_limits_memory_bytes | 15mins
|
MCPPortsExceeded | HIGH | All the MCP ports in MCP LRG are exceeded | gvp_rm_log_parser_eror_total | 1min
|
RMServiceDegradedTo50Percentage | HIGH | One of the RM container is not in running state for 5mins | kube_pod_container_status_running | 5mins
|
RMMatchingIVRTenantNotFound | MEDIUM | Matching IVR profile tenant could not be found for 2mins | gvp_rm_log_parser_eror_total | 2mins
|
RMResourceAllocationFailed | MEDIUM | RM Resource allocation failed for 1mins | gvp_rm_log_parser_eror_total | 1min |