Job Scheduler metrics and alerts

This topic is part of the manual Outbound (CX Contact) Private Edition Guide for version Current of Outbound (CX Contact).

Metrics[edit source]

Metric and description	Metric details	Indicator of
сxc_js_jobs_executed_total Total jobs executed.	Unit: Type: Counter Label: "'type', 'ccid', 'tenant_name'" Sample value: 42
cxc_js_jobs_failed_total Total failed jobs.	Unit: Type: Counter Label: "'type', 'ccid', 'tenant_name'" Sample value: 42
cxc_js_jobs_success_total Total successful jobs.	Unit: Type: Counter Label: "'type', 'ccid', 'tenant_name'" Sample value: 42
cxc_js_jobs_nothing_to_do_total Total jobs with Nothing TO DO result.	Unit: Type: Counter Label: "'type', 'ccid', 'tenant_name'" Sample value: 42
cxc_js_jobs_run_now_total Total jobs that were started manually.	Unit: Type: Counter Label: "'ccid', 'tenant_name'" Sample value: 42
cxc_js_files_imported_total Total files imported.	Unit: Type: Counter Label: "'action', 'ccid', 'tenant_name'" Sample value: 42
cxc_js_jobs_ttl_exceeded_total Total ttl exceeded jobs.	Unit: Type: Counter Label: "'type', 'ccid', 'tenant_name'" Sample value: 42
cxc_js_jobs_running_total Number of currently active jobs.	Unit: Type: Gauge Label: "'type', 'ccid', 'tenant_name'" Sample value: 4.2
cxc_js_redis_connections Count of active connections to Redis server.	Unit: Type: Gauge Label: n/a Sample value: 4.2
cxc_js_job_duration_seconds Job duration histogram.	Unit: Type: Histogram Label: "'type', 'ccid', 'tenant_name'" Sample value: [1, 2, 3]
cxc_js_job_import_file_size_megabytes Job import file size histogram.	Unit: Type: Histogram Label: "'action', 'ccid', 'tenant_name'" Sample value: [1, 2, 3]
cxc_js_healthy_instance Healthy instance.	Unit: Type: Gauge Label: n/a Sample value: 4.2
cxc_js_request_count Total requests by verb and code.	Unit: Type: Counter Label: "'method', 'path', 'code'" Sample value: 42
cxc_js_request_latencies_ms Request latencies histogram by verb, in milliseconds.	Unit: Type: Histogram Label: "'method', 'path', 'code'" Sample value: [1, 2, 3]
cxc_js_request_out_count Total out requests by verb, destination, and code.	Unit: Type: Counter Label: "'method', 'destination', 'code'" Sample value: 42
cxc_js_request_out_latencies_ms Out Request latencies histogram by verb, destination, and code, in milliseconds.	Unit: Type: Histogram Label: "'method', 'destination', 'code'" Sample value: [1, 2. 3]
cxc_js_healthy_tenants Healthy tenants.	Unit: Type: Gauge Label: "'ccid', 'tenant_name'" Sample value: 4.2

Alerts[edit source]

The following alerts are defined for Job Scheduler.

Alert	Severity	Description	Threshold
CXC-JS-LatencyHigh	HIGH	Triggered when the latency for job scheduler is above the defined threshold.	5000ms for 5m
CXC-CPUUsage	HIGH	Triggered when the CPU utilization of a pod is beyond the threshold	300% for 5m
CXC-MemoryUsage	HIGH	Triggered when the memory utilization of a pod is beyond the threshold.	70% for 5m
CXC-PodNotReadyCount	HIGH	Triggered when the number of pods ready for a CX Contact deployment is less than or equal to the threshold.	1 for 5m
CXC-PodRestartsCount	HIGH	Triggered when the restart count for a pod is beyond the threshold.	1 for 5m
CXC-MemoryUsagePD	HIGH	Triggered when the memory usage of a pod is above the critical threshold.	90% for 5m
CXC-PodRestartsCountPD	HIGH	Triggered when the restart count is beyond the critical threshold.	5 for 5m
CXC-PodsNotReadyPD	HIGH	Triggered when there are no pods ready for CX Contact deployment.	0 for 1m

Outbound (CX Contact) Private Edition Guide

Overview

Configure and deploy

Upgrade, roll back, or uninstall

Observability

Job Scheduler metrics and alerts

Contents

Metrics[edit source]

Alerts[edit source]