RAA metrics and alerts

This topic is part of the manual Genesys Customer Experience Insights Private Edition Guide for version Current of Reporting.

Service	CRD or annotations?	Port	Endpoint/Selector	Metrics update interval
RAA	PodMonitor and PrometheusRule	metrics: 9100, health: 9101	RAA forms matched labels from raa.statefulset.selector.matchLabels values, specified in values.yaml. The default contains a single raa-app item with raa.serviceName variable as a value. The element raa.serviceName is a concatenation of parameters. .... statefulset: ## pod selector selector: matchLabels: raa-app: "{{ tpl $.Values.raa.serviceName $ }}" template: ## a map of pod specific labels to add to common labels labels: raa-app: "{{ tpl $.Values.raa.serviceName $ }}"	metrics: several seconds, health: up to 3 minutes

See details about:

Metrics[edit source]

Metric and description	Metric details	Indicator of
gcxi_raa_health_level A health status metric extracted using a dedicated port on the monitor container. The metric value is a sum of values from two different health-checks: A database-based health-check. Results: A result of 2 indicates that RAA is working or in maintenance (has received a STOP command from Genesys Info Mart, and will restart only after receiving the START command). A result of 0 indicates that RAA is not working according to this check. A local health-check. A result of 1 indicates that RAA is processing aggregation requests according to local health files. A result of 0 indicates that RAA is not processing aggregation requests according to local health files.	Unit: Type: Gauge Label: Sample value: 0	Health check
gcxi_raa_command_count The number of commands received from Genesys Info mart since the previous scrape. Label reflects the name of the command. The supported commands are: START, QUIT, EXIT, UPDATE_CONFIG, REAGGREGATE	Unit: Type: Counter Label: cmd Sample value: 10	Traffic
gcxi_raa_dispatch_count The number of dispatch events (moving aggregation requests from AGR_NOTIFICATION to PENDING_ARG) since the previous scrape. Dispatch events typically occur every 15 seconds. Such events are used for aggregation health check based on local files.	Unit: Type: Counter Label: Sample value: 100	Health check
gcxi_raa_heartbeat_count The number of heartbeats since the previous scrape. Heartbeat is normally performed once every five minutes, and is used for health check based on local files. The label is the current RAA version.	Unit: Type: Counter Label: version Sample value: 10	Health check
gcxi_raa_relaunched_count The number of times RAA was relaunched since the previous scrape. The aggregation process can exit when an error occurs. Genesys Info Mart sends a START command every 15 minutes during the aggregation period, which causes RAA to relaunch.	Unit: Type: Counter Label: Sample value: 1	Error
gcxi_raa_launched_count The number of times RAA launched since the previous scrape.	Unit: Type: Counter Label: version Sample value: 1	Error
gcxi_raa_error_count The number of errors registered since the previous scrape.	Unit: Type: Counter Label: Sample value: 1	Error
gcxi_raa_notification_count The number of fact change notifications received from Genesys Info Mart since the previous scrape.	Unit: Type: Counter Label: fact Sample value: 10	Latency
gcxi_raa_notification_period_ms The total amount of time attributed to changed fact periods in notifications received from Genesys Info Mart since the previous scrape.	Unit: milliseconds Type: Counter Label: fact Sample value:	Latency
gcxi_raa_notification_delay_ms The total amount of time attributed to fact notification delays in notifications received from Genesys Info Mart since the previous scrape. Notification delay is calculated as the difference between the moment of notification and the start of the changed period.	Unit: milliseconds Type: Counter Label: fact Sample value:	Latency
gcxi_raa_aggregated_count The number of aggregations completed by RAA since the previous scrape. RAA groups the data by aggregation hierarchy name, materialized level (usually SUBHOUR, HOUR, DAY, MONTH), and media type (Online, Offline).	Unit: Type: Counter Label: hierarchy, level, mediaType Sample value:	Traffic
gcxi_raa_aggregated_period_ms The total number of periods aggregated by RAA since the previous scrape.	Unit: milliseconds Type: Counter Label: hierarchy, level, mediaType Sample value: 10	Traffic
gcxi_raa_aggregated_duration_ms The total duration of time periods aggregations completed by RAA since the previous scrape.	Unit: milliseconds Type: Counter Label: hierarchy, level, mediaType Sample value:	Traffic
gcxi_raa_aggregated_delay_ms The total duration of delays for aggregations completed by RAA since the previous scrape. Aggregation delay is calculated as the difference between the moment aggregation competes, and the start of the aggregation range.	Unit: milliseconds Type: Counter Label: hierarchy, level, mediaType Sample value:	Latency
gcxi_raa_purged_count The number of records purged by RAA since the previous scrape. RAA groups the data by purged table name.	Unit: seconds Type: Counter Label: table Sample value:	Traffic
gcxi_raa_purged_duration_ms The total amount of time spent on purging since the previous scrape.	Unit: milliseconds Type: Counter Label: table Sample value:	Traffic

Alerts[edit source]

Various raa.prometheusRule.alerts.* parameters in the values.yaml file specify the severity of alerts and some thresholds.

The following alerts are defined for RAA.

Alert	Severity	Description	Based on	Threshold
raa-health	Specified by: raa.prometheusRule.alerts.labels.severity Recommended value: severe	A zero value for a recent period (several scrape intervals) indicates that RAA is not operating.	gcxi_raa_health_level	Specified by: raa.prometheusRule.alerts.health.for Recommended value: 30m
raa-errors	Specified by: raa.prometheusRule.alerts.raa-errors.labels.severity in values.yaml. Recommended value: warning	A nonzero value indicates that errors have been logged during the scrape interval.	gcxi_raa_error_count	>0
raa-long-aggregation	Specified by: raa.prometheusRule.alerts.longAggregation.labels.severity in values.yaml. Recommended value: warning	Indicates that the average duration of aggregation queries specified by the hierarchy, level, and mediaType labels is greater than the deadlock-threshold.	gcxi_raa_aggregated_duration_ms/ gcxi_raa_aggregated_count	Greater than the value (seconds) of raa.prometheusRule.alerts.longAggregation.thresholdSec in values.yaml. Recommended value: 300

Genesys Customer Experience Insights Private Edition Guide

Overview

Configure and deploy RAA

Configure and deploy GCXI

Observability

RAA metrics and alerts

Contents

Metrics[edit source]

Alerts[edit source]