Pulse metrics and alerts

Metrics[edit source]

The pulse_*_Boolean metrics are readable only from Prometheus directly. You cannot read them using the cURL command-line tool.

Metric and description	Metric details	Indicator of
pulse_health_all_Boolean Overall Pulse application status.	Unit: Type: Gauge Label: Sample value: 0.5	Error
pulse_health_connections_Boolean Status of the connections to the external services (Auth, GWS, Redis, and DB).	Unit: Type: Gauge Label: connection Sample value: 0	Error

Alerts are based on Pulse, Java, and Kubernetes cluster metrics.

The following alerts are defined for Pulse.

Alert	Severity	Description	Based on	Threshold
pulse_service_down	Critical	All Pulse instances are down.	up	for 15 minutes
pulse_critical_pulse_health	Critical	Detected critical number of healthy Pulse containers.	pulse_health_all_Boolean	50%
pulse_critical_running_instances	Critical	Triggered when Pulse instances are down.	kube_deployment_status_replicas_available, kube_deployment_status_replicas	75%
pulse_too_frequent_restarts	Critical	Detected too frequent restarts of Pulse Pod container.	kube_pod_container_status_restarts_total	2 for 1 hour
pulse_critical_cpu	Critical	Detected critical CPU usage by Pulse Pod.	container_cpu_usage_seconds_total, kube_pod_container_resource_limits	90%
pulse_critical_memory	Critical	Detected critical memory usage by Pulse Pod.	container_memory_working_set_bytes, kube_pod_container_resource_limits	90%
pulse_critical_hikari_cp	Critical	Detected critical Hikari connections pool usage by Pulse container.	hikaricp_connections_active, hikaricp_connections	90%
pulse_critical_5xx	Critical	Detected critical 5xx errors per second for Pulse container.	http_server_requests_seconds_count	15%