Tenant Service metrics and alerts
Find the metrics Tenant Service exposes and the alerts defined for Tenant Service.
Service | CRD or annotations? | Port | Endpoint/Selector | Metrics update interval |
---|---|---|---|---|
Tenant Service | PodMonitor | 15000 | /metrics
(http://<pod address>:15000/metrics) |
30 seconds (Applicable for any metric(s) that Tenant Service exposes. The update interval is not a property of the metric; it is a property of the optional PodMonitor that you can create.) |
See details about:
Metrics[edit source]
You can query Prometheus directly to see all the metrics that the Tenant Service exposes. The following metrics are likely to be particularly useful. Genesys does not commit to maintain other currently available Tenant Service metrics not documented on this page.
Metric and description | Metric details | Indicator of |
---|---|---|
tenant_ Health level of the tenant node. Values are -1 (fail), 0 (starting), 1 (degraded), 2 (pass). When the value is 2, the tenant Tenant Service node is fully functional. When the value is 1, the tenant might have issues with some of its internal functions and external dependencies, but is still capable of providing some services. When a value of 1 is reported, additional investigation is needed, via tenant logs, to troubleshoot and recover. A value of 0 or -1 indicates an inoperable node, either pending start or it has failed. |
Unit: N/A Type: gauge |
Health |
Alerts[edit source]
If you enable a Tenant PodMonitor to expose the Tenant health metric, then you can create a basic alert rule for the Tenant Service using a template like the following:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: "custom-tenant-alert-rules"
spec:
- alert: HealthFailFor5min
expr: (max by (tenant) (tenant_service_health_level{namespace="<namespace where tenant is deployed>",pod=~"<name of tenant helm release>"})) < 2
for: 5m
labels:
severity: high
category: tenant_pager
servicename: "tenant"
annotations:
description: "The trigger will flag an alarm when tenant status health (any pod) is failed for 5 mins"
summary: "Tenant pod status health is failed for 5 mins"
Enter your values where there are placeholders in the preceding template; the placeholders are:
- <namespace where tenant is deployed>
- <name of tenant helm release>
Values are based on how you deployed tenant(s); in other words, what you used for override values.
No alerts are defined for Tenant Service.