Tenant Service metrics and alerts

From Genesys Documentation
Jump to: navigation, search

Find the metrics Tenant Service exposes and the alerts defined for Tenant Service.

Service CRD or annotations? Port Endpoint/Selector Metrics update interval
Tenant Service PodMonitor 15000 /metrics

(http://<pod address>:15000/metrics)

30 seconds (Applicable for any metric(s) that Tenant Service exposes. The update interval is not a property of the metric; it is a property of the optional PodMonitor that you can create.)

See details about:

Metrics[edit source]

You can query Prometheus directly to see all the metrics that the Tenant Service exposes. The following metrics are likely to be particularly useful. Genesys does not commit to maintain other currently available Tenant Service metrics not documented on this page.

Metric and description Metric details Indicator of
tenant_service_health_level

Health level of the tenant node. Values are -1 (fail), 0 (starting), 1 (degraded), 2 (pass).

When the value is 2, the tenant Tenant Service node is fully functional.

When the value is 1, the tenant might have issues with some of its internal functions and external dependencies, but is still capable of providing some services. When a value of 1 is reported, additional investigation is needed, via tenant logs, to troubleshoot and recover.

A value of 0 or -1 indicates an inoperable node, either pending start or it has failed.

Unit: N/A

Type: gauge
Label: <tenant id>
Sample value: 2

Health

Alerts[edit source]

If you enable a Tenant PodMonitor to expose the Tenant health metric, then you can create a basic alert rule for the Tenant Service using a template like the following:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: "custom-tenant-alert-rules"
spec:
        - alert: HealthFailFor5min
          expr: (max by (tenant) (tenant_service_health_level{namespace="<namespace where tenant is deployed>",pod=~"<name of tenant helm release>"})) < 2
          for: 5m
          labels:
            severity: high
            category: tenant_pager
            servicename: "tenant"
          annotations:
            description: "The trigger will flag an alarm when tenant status health (any pod) is failed for 5 mins"
            summary: "Tenant pod status health is failed for 5 mins"

Enter your values where there are placeholders in the preceding template; the placeholders are:

  • <namespace where tenant is deployed>
  • <name of tenant helm release>

Values are based on how you deployed tenant(s); in other words, what you used for override values.

No alerts are defined for Tenant Service.

Comments or questions about this documentation? Contact us for support!