Tenant Load Distribution Server (LDS) metrics and alerts
Find the metrics Tenant Load Distribution Server (LDS) exposes and the alerts defined for Tenant Load Distribution Server (LDS).
Service | CRD or annotations? | Port | Endpoint/Selector | Metrics update interval |
---|---|---|---|---|
Tenant Load Distribution Server (LDS) | PodMonitor | 9091 | selector:
matchLabels:
app.kubernetes.io/name: {{include "common.util.chart.name" . }}
app.kubernetes.io/instance: {{include "common.util.chart.fullname" . }}
service: {{.Release.Namespace }}
servicename: {{include "common.util.chart.name" . }}
tenant: {{.Values.tenant.sid }} Endpoints to query: /metrics/ |
30 seconds |
See details about:
Metrics[edit source]
Metric and description | Metric details | Indicator of |
---|---|---|
pulse_ The duration in seconds of the last health check performed by Monitor Agent. |
Unit: seconds Type: Gauge |
Error |
pulse_ The LDS container uptime in seconds. |
Unit: seconds Type: Gauge |
Error |
pulse_ The number of upstream servers to which the LDS is connected. |
Unit: Type: Gauge |
Error |
pulse_ The number of clients connected to the LDS. |
Unit: Type: Gauge |
Error |
pulse_ Duration in seconds of connection to the upstream server. |
Unit: seconds Type: Gauge |
Error |
pulse_ Duration in seconds of disconnection from the upstream server. |
Unit: seconds Type: Gauge |
Error |
pulse_ The number of DNs registered on the upstream server. |
Unit: Type: Gauge |
Saturation |
pulse_ The number of failed registrations of DNs on the upstream server. |
Unit: Type: Gauge |
Error |
pulse_ Duration in seconds of client connection to the LDS. |
Unit: seconds Type: Gauge |
Error |
pulse_ The number of DNs registered by the client. |
Unit: Type: Gauge |
Saturation |
pulse_ The number of failed registrations of DNs received from the client. |
Unit: Type: Gauge |
Error |
Alerts[edit source]
Alerts are based on LDS and Kubernetes cluster metrics.
The following alerts are defined for Tenant Load Distribution Server (LDS).
Alert | Severity | Description | Based on | Threshold |
---|---|---|---|---|
pulse_lds_monitor_data_unavailable | Critical | Pulse LDS Monitor Agents do not provide data. | pulse_monitor_check_duration_seconds, kube_statefulset_replicas | for 15 minutes
|
pulse_lds_critical_nonrunning_instances | Critical | Triggered when Pulse LDS instances are down. | kube_statefulset_status_replicas_ready, kube_statefulset_status_replicas | for 15 minutes
|
pulse_lds_too_frequent_restarts | Critical | Detected too frequent restarts of LDS Pod container. | kube_pod_container_status_restarts_total | 2 for 1 hour
|
pulse_lds_critical_cpu | Critical | Detected critical CPU usage by Pulse LDS Pod. | container_cpu_usage_seconds_total, kube_pod_container_resource_limits | 90%
|
pulse_lds_critical_memory | Critical | Detected critical memory usage by Pulse LDS Pod. | container_memory_working_set_bytes, kube_pod_container_resource_limits | 90%
|
pulse_lds_no_connected_senders | Critical | Pule LDS is not connected to upstream servers. | pulse_lds_senders_number | for 15 minutes
|
pulse_lds_no_registered_dns | Critical | No DNs are registered on Pulse LDS. | pulse_lds_sender_registered_dns_number | for 30 minutes |