Pulse metrics and alerts

From Genesys Documentation
Revision as of 20:00, December 21, 2021 by Tony.gilchrist@genesys.com (talk | contribs) (Published)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
This topic is part of the manual Genesys Pulse Private Edition Guide for version Current of Reporting.

Find the metrics Pulse exposes and the alerts defined for Pulse.

Related documentation:
Service CRD or annotations? Port Endpoint/Selector Metrics update interval
Pulse ServiceMonitor 8090
selector:
  matchLabels:
    app.kubernetes.io/name: {{ include "common.util.chart.name" . }}
    app.kubernetes.io/instance: {{ include "common.util.chart.fullname" . }}
    service: {{ .Release.Namespace }}
    servicename: {{ include "common.util.chart.name" . }}
    tenant: "shared"

Endpoints to query:

  • For a list of metrics: /actuator/metrics/
  • For metric output: /actuator/metrics/pulse.health.all / actuator/metrics/pulse.health.connections
30 seconds

See details about:

Metrics[edit source]

The pulse_*_Boolean metrics are readable only from Prometheus directly. You cannot read them using the cURL command-line tool.

Metric and description Metric details Indicator of
pulse_health_all_Boolean

Overall Pulse application status.

Unit:

Type: Gauge
Label:
Sample value: 0.5

Error
pulse_health_connections_Boolean

Status of the connections to the external services (Auth, GWS, Redis, and DB).

Unit:

Type: Gauge
Label: connection
Sample value: 0

Error

Alerts[edit source]

Alerts are based on Pulse, Java, and Kubernetes cluster metrics.

The following alerts are defined for Pulse.

Alert Severity Description Based on Threshold
pulse_service_down Critical All Pulse instances are down. up for 15 minutes


pulse_critical_pulse_health Critical Detected critical number of healthy Pulse containers. pulse_health_all_Boolean 50%


pulse_critical_running_instances Critical Triggered when Pulse instances are down. kube_deployment_status_replicas_available, kube_deployment_status_replicas 75%


pulse_too_frequent_restarts Critical Detected too frequent restarts of Pulse Pod container. kube_pod_container_status_restarts_total 2 for 1 hour


pulse_critical_cpu Critical Detected critical CPU usage by Pulse Pod. container_cpu_usage_seconds_total, kube_pod_container_resource_limits 90%


pulse_critical_memory Critical Detected critical memory usage by Pulse Pod. container_memory_working_set_bytes, kube_pod_container_resource_limits 90%


pulse_critical_hikari_cp Critical Detected critical Hikari connections pool usage by Pulse container. hikaricp_connections_active, hikaricp_connections 90%


pulse_critical_5xx Critical Detected critical 5xx errors per second for Pulse container. http_server_requests_seconds_count 15%
Comments or questions about this documentation? Contact us for support!