Corinneh: Published

2022-01-12T19:35:57Z

Published

New page

{{ArticlePEServiceObservability
|ConfigureMetrics=The metrics that are exposed by Genesys Engagement Service (GES) are available by default. No further configuration is required in order to define or expose these metrics. You cannot define your own custom metrics.

The Metrics page linked to {{Link-SomewhereInThisVersion|manual=CABPEGuide|topic=Observability|anchor=MetricsLinks|display text=above}} shows some of the metrics GES exposes. You can also query Prometheus directly or via a dashboard to see all the metrics available from GES.
|AboutMonitoring=The GES/Callback metrics are divided into three categories:

*Metrics that have to do with the internal business logic of GES: This includes metrics that measure things like the number of callbacks booked, the number of click-to-call requests booked, callback monitor performance, and API usage.
*Metrics that measure the performance of GES: Disk, memory, and CPU usage, event loop lag, request handling times, and the health of connections to downstream services such as Redis, PostGres, GWS, ORS, and others.
*Alarms-type metrics, or alerts: These metrics are boolean flags that raise and lower whenever certain conditions are met.

Watching how the metrics change over time helps you understand the performance of a given GES deployment or pod. The {{Link-SomewhereInThisVersion|manual=CABPEGuide|topic=CallbackMetrics|anchor=samplepromexpressions|display text=sample Prometheus expressions}} show you how to use the basic metrics to gain valuable insights into your callback-related activity.

====Health metrics====
Health metrics – that is, those metrics that report on the status of connections from GES to dependencies such as Tenant Service (ORS), GWS, Redis, and Postgres – are implemented as a gauge that toggles between "0" and "1". For information about gauges, see the [https://prometheus.io/docs/concepts/metric_types/ Prometheus Metric types documentation]. When the connection to a service is down, the metric is "1". When the service is up, the metric is "0". Also see {{Link-SomewhereInThisVersion|manual=CABPEGuide|topic=Observability|anchor=AlertsLinks|display text=Alerting}}.
|AlertsDefined=Yes
|Alerting=In a Kubernetes deployment, GES relies on Prometheus and Alertmanager to generate alerts. These alerts can then be fowarded to a service of your choice (for example, PagerDuty).

One of the key things to understand about alerts in GES is that, while GES leverages Prometheus, the application manually triggers alerts when certain criteria are met. This internal alert is then turned into a counter that is incremented each time the conditions to trigger the alert are met. The counter is available on the <tt>/metrics</tt> endpoint. Prometheus rules capture the metric data and trigger the alert on Prometheus; also see {{Link-SomewhereInThisVersion|manual=CABPEGuide|topic=Observability|anchor=ConfigureAlerts|display text=Configure alerts}}. For more information about counters, see the [https://prometheus.io/docs/concepts/metric_types/ Prometheus Metric types documentation].

Because alerts are implemented as counters in GES, you can leverage metrics to analyze how the counters increase over time for a given deployment or pod. For a list of helpful Prometheus expressions to use for this purpose, see {{Link-SomewhereInThisVersion|manual=CABPEGuide|topic=CallbackMetrics|anchor=samplepromexpressions|display text=Sample Prometheus expressions}}.

The following example shows an alert used in an Azure deployment; an increase in instances of the alert firing over a certain period of time triggers the Prometheus alert.

<source lang="text">
- alert: GES_RBAC_CREATE_VQ_PROXY_ERROR
annotations:
summary: "There are issues managing VQ proxy objects on {{ $labels.pod }}"
labels:
severity: info
action: email
service: GES
expr: increase(RBAC_CREATE_VQ_PROXY_ERROR[10m]) > 5
</source>

Health alerts in GES work a little differently. They are gauges, rather than counters. The gauge toggles between "0" and "1"; "1" indicates that the service is down and "0" indicates that the service is up. Because GES has an automatic health check that runs approximately every 15-20 seconds, the health alerts are generated when a connection has been in the DOWN state for a given period of time. The following example shows the ORS_REDIS_DOWN alert.

<source lang="text">
- alert: GES_ORS_REDIS_DOWN
expr: ORS_REDIS_STATUS > 0
for: 5m
labels:
severity: critical
action: page
service: GES
annotations:
summary: "ORS REDIS Connection down for {{ $labels.pod }}"
dashboard: "See GES Performance > Health and Liveliness to track ORS Redis Health over time"
</source>
|ExtConfigAlertsBoilerplate=Yes
|Logging=For solution-level documentation about logging, see {{SuiteLevelLink|logging}}.

GES outputs logs to standard output (stdout).

GES log size is highly dependent on usage. A high-traffic enterprise that accepts many callbacks daily will have much larger log sizes than an enterprise with only a handful of callbacks each day. For your reference, a single Create Callback operation, invoked from the Callback UI, generates 16 log messages in 1 second at full trace level. Therefore, at a rate of 10 callbacks/second, there are 160 log messages per second.

You can make changes to logging settings using the Helm values file. For information about the <tt>log</tt> values included in the Helm charts, see the {{Link-SomewhereInThisVersion|manual=CABPEGuide|topic=Configure|anchor=configmap|display text=configMap section}}.
}}

PEC-CAB/Current/CABPEGuide/Observability - Revision history

Corinneh: Published