Telemetry Service metrics and alerts

From Genesys Documentation
Revision as of 15:55, March 30, 2022 by Xavier (talk | contribs) (Published)
Jump to: navigation, search
This topic is part of the manual Telemetry Service Private Edition Guide for version Current of Telemetry Service.

Find the metrics Telemetry Service exposes and the alerts defined for Telemetry Service.

Service CRD or annotations? Port Endpoint/Selector Metrics update interval
Telemetry Service n/a
All the Telemetry Service metrics are standard Kubernetes metrics as delivered by a standard Kubernetes metrics service.

See details about:

Metrics[edit source]

Use standard Kubernetes metrics, as delivered by a standard Kubernetes metrics service (such as cAdvisor), to monitor the Telemetry Service. For information about standard system metrics to use to monitor services, see System metrics.

The following standard Kubernetes metrics are likely to be most relevant.

Metric and description Metric details Indicator of
container_cpu_usage_seconds_total

Cumulative CPU time consumed

Unit: seconds

Type: Counter
Label: pod="podId"
Sample value: 7000

Monitoring the CPU usage
container_fs_reads_bytes_total

Cumulative count of bytes read

Unit: bytes

Type: Counter
Label: pod="podId
Sample value: 900

Monitoring Filesystem usage
container_network_receive_bytes_total

Cumulative count of bytes received

Unit: bytes

Type: Counter
Label: pod="podId"
Sample value: 3000

Monitoring incoming network
container_network_transmit_bytes_total

Cumulative count of bytes transmitted

Unit: bytes

Type: Counter
Label: pod="podId"
Sample value: 5000

Monitoring outgoing network
kube_pod_container_status_ready

Describes whether the containers readiness check succeeded.

Unit: integer

Type: Gauge
Label: pod="podId"
Sample value: 2

Monitoring Healthy pods
kube_pod_container_status_restarts_total

The number of container restarts per container

Unit: integer

Type: Counter
Label: pod="podId"
Sample value: 0

Monitoring pod restarts

Alerts[edit source]

The following alerts are defined for No results.

Alert Severity Description Based on Threshold
Telemetry CPU Utilization is Greater Than Threshold High Triggered when average CPU usage is more than 60% node_cpu_seconds_total >60%


Telemetry Memory Usage is Greater Than Threshold High Triggered when average memory usage is more than 60% container_cpu_usage_seconds_total, kube_pod_container_resource_limits_cpu_cores >60%


Telemetry High Network Traffic High Triggered when network traffic exceeds 10MB/second for 5 minutes node_network_transmit_bytes_total, node_network_receive_bytes_total >10MBps


Http Errors Occurrences Exceeded Threshold High Triggered when the number of HTTP errors exceeds 500 responses in 5 minutes telemetry_events{eventName=~"http_error_.*", eventName!="http_error_404"} >500 in 5 minutes


Telemetry Dependency Status Low Triggered when there is no connection to one of the dependent services - GAuth, Config, Prometheus telemetry_dependency_status <80


Telemetry Healthy Pod Count Alert High Triggered when the number of healthy pods drops to critical level kube_pod_container_status_ready <2


Telemetry GAuth Time Alert High Triggered when there is no connection to the GAuth service telemetry_gws_auth_req_time >10000
Comments or questions about this documentation? Contact us for support!