FrontEnd Service metrics and alerts
Find the metrics FrontEnd Service exposes and the alerts defined for FrontEnd Service.
Service | CRD or annotations? | Port | Endpoint/Selector | Metrics update interval |
---|---|---|---|---|
FrontEnd Service | Supports both CRD and annotations | 9101 | http://<pod-ipaddress>:9101/metrics | 30 seconds |
See details about:
Metrics[edit source]
Voice FrontEnd Service exposes Genesys-defined, FrontEnd Service–specific metrics as well as some standard Kafka metrics. You can query Prometheus directly to see all the metrics that the FrontEnd Service exposes. The following metrics are likely to be particularly useful. Genesys does not commit to maintain other currently available FrontEnd Service metrics not documented on this page.
Metric and description | Metric details | Indicator of |
---|---|---|
kafka_ Number of Kafka producer pending events. |
Unit: N/A Type: gauge |
|
kafka_ Age of the oldest producer pending event, in seconds. |
Unit: seconds Type: gauge |
|
kafka_ Number of Kafka producer errors. |
Unit: N/A Type: counter |
|
kafka_ Current state of the Kafka producer. |
Unit: N/A Type: gauge |
|
kafka_ Biggest event size so far. |
Unit: Type: gauge |
|
kafka_ Exposed config to compare with biggest event size. |
Unit: Type: gauge |
|
log_ Total amount of log output, in bytes. |
Unit: bytes Type: counter |
|
sipfe_ Number of requests. |
Unit: N/A Type: counter |
Traffic |
sipfe_ Number of responses for the requests. |
Unit: N/A Type: counter |
Traffic |
sipfe_ Number of SIP nodes that are alive. |
Unit: N/A Type: gauge |
|
sipfe_ Number of requests to the SIP nodes. |
Unit: N/A Type: counter |
|
sipfe_ Number of responses from the SIP nodes for the requests. |
Unit: N/A Type: counter |
|
sipfe_ The duration of time between the SIP node request and the response, measured in seconds. |
Unit: seconds Type: histogram |
Latency |
service_ Displays the version of Voice FrontEnd Service that is currently running. In the case of this metric, the labels provide the important information. The metric value is always 1 and does not provide any information. |
Unit: Type: gauge |
|
sipfe_ Health level of the sipfe node: -1 – fail |
Unit: N/A Type: gauge |
Errors |
sipfe_ Health check errors for the sipfe node: 1 – has error |
Unit: N/A Type: gauge |
Errors |
Alerts[edit source]
The following alerts are defined for FrontEnd Service.
Alert | Severity | Description | Based on | Threshold |
---|---|---|---|---|
Too many Kafka pending producer events | Critical | Actions:
|
kafka_producer_queue_depth | Too many Kafka producer pending events for pod {{ $labels.pod }} (more than 100 in 5 minutes).
|
Too many received requests without a response | Critical | Actions:
|
sipfe_requests_total | For too many requests, the Front End service at pod {{ $labels.pod }} did not send any response (more than 100 requests without a response, measured over 5 minutes).
|
SIP Cluster Service response latency is too high | Critical | Actions:
|
sipfe_sip_node_request_duration_seconds_bucket | Latency for 95% of messages is more than 0.5 seconds for service {{ $labels.container }}.
|
No requests received | Critical | Absence of received requests for pod {{ $labels.pod }}.
Actions:
|
sipfe_requests_total | increase(sipfe_requests_total{pod=~"sipfe-.+"}[5m]) <= 0 and increase(sipfe_requests_total{pod=~"sipfe-.+"}[10m]) > 100
|
Too many failure responses sent | Critical | Too many failure responses are sent by the Front End service at pod {{ $labels.pod }}.
Actions:
|
sipfe_responses_total | More than 100 failure responses in 5 consecutive minutes.
|
Too many Kafka producer errors | Critical | Kafka responds with errors at pod {{ $labels.pod }}.
Actions:
|
kafka_producer_error_total | More than 100 errors in 5 consecutive minutes.
|
Too many SIP Cluster Service error responses | Critical | SIP Cluster Service responds with errors at pod {{ $labels.pod }}.
Actions:
|
sipfe_sip_node_responses_total | More than 100 errors in 5 consecutive minutes.
|
Kafka not available | Critical | Kafka is not available for pod {{ $labels.pod }}.
Actions:
|
kafka_producer_state | Kafka is not available for pod {{ $labels.pod }} for 5 consecutive minutes.
|
SIP Node(s) is not available | Critical | No available SIP Nodes for pod {{ $labels.pod }}.
Actions:
|
sipfe_sip_nodes_total | No available SIP Nodes for pod {{ $labels.pod }} for 5 consecutive minutes.
|
Pod status Failed | Warning | Pod {{ $labels.pod }} is in Failed state.
Actions:
|
kube_pod_status_phase | Pod {{ $labels.pod }} is in Failed state.
|
Pod status Unknown | Warning | Pod {{ $labels.pod }} is in Unknown state for 5 minutes.
Actions:
|
kube_pod_status_phase | Pod {{ $labels.pod }} is in Unknown state for 5 minutes.
|
Pod status Pending | Warning | Pod {{ $labels.pod }} is in Pending state for 5 minutes.
Actions:
|
kube_pod_status_phase | Pod {{ $labels.pod }} is in Pending state for 5 minutes.
|
Pod status NotReady | Critical | Pod {{ $labels.pod }} is in the NotReady state for 5 minutes.
Actions:
|
kube_pod_status_ready | Pod {{ $labels.pod }} is in the NotReady state for 5 minutes.
|
Container restarted repeatedly | Critical | Container {{ $labels.container }} was restarted 5 or more times within 15 minutes.
Actions:
|
kube_pod_container_status_restarts_total | Container {{ $labels.container }} was restarted 5 or more times within 15 minutes.
|
Max replicas is not sufficient for 5 mins | Critical | For the past 5 minutes, the desired number of replicas is higher than the number of replicas currently available.
Actions:
|
kube_statefulset_replicas, kube_statefulset_status_replicas | Desired number of replicas is higher than current available replicas for the past 5 minutes.
|
Pods scaled up greater than 80% | Critical | For the past 5 minutes, the desired number of replicas is greater than the number of replicas currently available.
Actions:
|
kube_hpa_status_current_replicas, kube_hpa_spec_max_replicas | (kube_hpa_status_current_replicas{namespace="voice",hpa="sipfe-node-hpa"} * 100) / kube_hpa_spec_max_replicas{namespace="voice",hpa="sipfe-node-hpa"} > 80 for: 5m
|
Pods less than Min Replicas | Critical | The current number of replicas is lower than the minimum number of replicas that should be available.
Actions:
|
kube_hpa_status_current_replicas, kube_hpa_spec_min_replicas | For the past 5 minutes, the current number of replicas is lower than the minimum number of replicas that should be available.
|
Pod CPU greater than 65% | Warning | High CPU load for pod {{ $labels.pod }}.
Actions:
|
container_cpu_usage_seconds_total, container_spec_cpu_period | Container {{ $labels.container }} CPU usage exceeded 65% for 5 minutes.
|
Pod CPU greater than 80% | Critical | Critical CPU load for pod {{ $labels.pod }}.
Actions:
|
container_cpu_usage_seconds_total, container_spec_cpu_period | Container {{ $labels.container }} CPU usage exceeded 80% for 5 minutes.
|
Pod memory greater than 65% | Warning | High memory usage for pod {{ $labels.pod }}.
Actions:
|
container_memory_working_set_bytes, kube_pod_container_resource_requests_memory_bytes | Container {{ $labels.container }} memory usage exceeded 65% for 5 minutes.
|
Pod memory greater than 80% | Critical | Critical memory usage for pod {{ $labels.pod }}.
Actions:
|
container_memory_working_set_bytes, kube_pod_container_resource_requests_memory_bytes | Container {{ $labels.container }} memory usage exceeded 80% for 5 minutes. |