IWD metrics and alerts

From Genesys Documentation
Jump to: navigation, search


Find the metrics IWD exposes and the alerts defined for IWD.

Service CRD or annotations? Port Endpoint/Selector Metrics update interval
IWD Both or either, depends on harvester Default is 4024 (overridden by values) /iwd/v3/metrics 15 sec recommended, depends on harvester

See details about:

Metrics[edit source]

Metric and description Metric details Indicator of
iwd_redis_connections_established

Current number of established Redis connections

Unit:

Type: gauge
Label:
Sample value: 0

iwd_redis_connections_reconnecting

Current number of reconnecting Redis connections

Unit:

Type: gauge
Label:
Sample value: 0

iwd_redis_connections_ready

Current number of ready Redis connections

Unit:

Type: gauge
Label:
Sample value: 1

iwd_redis_duration_until_ready

Duration until ready state reached

Unit:

Type: histogram
Label: 'le'
Sample value: 0, 1, 39

iwd_redis_errors_total

Total number of Redis connection errors

Unit:

Type: counter
Label:
Sample value: 0

iwdTenantDB_db_connect_total

The total number of all database connection requests

Unit:

Type: counter
Label: 'db'
Sample value: 1252424, 1457770

iwdTenantDB_db_disconnect_total

The total number of all database disconnection requests

Unit:

Type: counter
Label: 'db'
Sample value: 1252424, 1457770

iwdTenantDB_db_request_total

The total number of all Database requests sent

Unit:

Type: counter
Label: 'db'
Sample value: 4850730, 5056452

iwdTenantDB_db_success_total

The total number of all all Database requests executed successfully

Unit:

Type: counter
Label: 'db', 'command'
Sample value: 2307896, 2126805, 1221394, 1450355

iwdTenantDB_db_errors_total

The total number of all Database errors

Unit:

Type: counter
Label: 'db', 'code'
Sample value: 131, 5, 4

iwdTenantDB_db_request_duration_milliseconds

Database transaction duration

Unit:

Type: histogram
Label: 'le', 'db', 'method'
Sample value: 2290844, 2306385, 2307241, 2307894

iwd_process_cpu_user_seconds_total

Total user CPU time spent in seconds.

Unit:

Type: counter
Label:
Sample value: 1634045655571

iwd_process_cpu_system_seconds_total

Total system CPU time spent in seconds.

Unit:

Type: counter
Label:
Sample value: 1634045655571

iwd_process_cpu_seconds_total

Total user and system CPU time spent in seconds.

Unit:

Type: counter
Label:
Sample value: 1634045655571

iwd_process_start_time_seconds

Start time of the process since unix epoch in seconds.

Unit:

Type: gauge
Label:
Sample value: 1633992102

iwd_process_resident_memory_bytes

Resident memory size in bytes.

Unit:

Type: gauge
Label:
Sample value: 1634045655572

iwd_process_virtual_memory_bytes

Virtual memory size in bytes.

Unit:

Type: gauge
Label:
Sample value: 1634045655572

iwd_process_heap_bytes

Process heap size in bytes.

Unit:

Type: gauge
Label:
Sample value: 1634045655572

iwd_process_open_fds

Number of open file descriptors.

Unit:

Type: gauge
Label:
Sample value: 1634045655572

iwd_process_max_fds

Maximum number of open file descriptors.

Unit:

Type: gauge
Label:
Sample value: 197176

iwd_nodejs_eventloop_lag_seconds

Lag of event loop in seconds.

Unit:

Type: gauge
Label:
Sample value: 1634045655572

iwd_nodejs_active_handles

Number of active libuv handles grouped by handle type. Every handle type is C++ class name.

Unit:

Type: gauge
Label: 'type'
Sample value: 17, 1, 69

iwd_nodejs_active_handles_total

Total number of active handles.

Unit:

Type: gauge
Label:
Sample value: 1634045655572

iwd_nodejs_active_requests

Number of active libuv requests grouped by request type. Every request type is C++ class name.

Unit:

Type: gauge
Label: 'type'
Sample value: 2

iwd_nodejs_active_requests_total

Total number of active requests.

Unit:

Type: gauge
Label:
Sample value: 1634045655572

iwd_nodejs_heap_size_total_bytes

Process heap size from node.js in bytes.

Unit:

Type: gauge
Label:
Sample value: 1634045655572

iwd_nodejs_heap_size_used_bytes

Process heap size used from node.js in bytes.

Unit:

Type: gauge
Label:
Sample value: 1634045655572

iwd_nodejs_external_memory_bytes

Nodejs external memory size in bytes.

Unit:

Type: gauge
Label:
Sample value: 1634045655572

iwd_nodejs_heap_space_size_total_bytes

Process heap space size total from node.js in bytes.

Unit:

Type: gauge
Label: 'space'
Sample value: 262144, 16777216, 130428928, 6721536

iwd_nodejs_heap_space_size_used_bytes

Process heap space size used from node.js in bytes.

Unit:

Type: gauge
Label: 'space'
Sample value: 32808, 1479672, 92634792, 4852384

iwd_nodejs_heap_space_size_available_bytes

Process heap space size available from node.js in bytes.

Unit:

Type: gauge
Label: 'space'
Sample value: 0, 6899976, 37040456, 1542496

iwd_nodejs_version_info

Node.js version info.

Unit:

Type: gauge
Label: 'version', 'major', 'minor', 'patch'
Sample value: 1

iwd_request_total

The total number of all API requests received

Unit:

Type: counter
Label:
Sample value: 177186

iwd_success_total

The total number of all API requests with success response

Unit:

Type: counter
Label: 'ccid'
Sample value: 21400, 46769, 48539

iwd_errors_total

The total number of all API requests with error response

Unit:

Type: counter
Label: 'ccid'
Sample value: 438, 49, 4

iwd_client_error_total

The total number of all API requests with client error response

Unit:

Type: counter
Label: 'ccid'
Sample value: 204, 49, 2

iwd_server_error_total

The total number of all API requests with server error response

Unit:

Type: counter
Label: 'ccid'
Sample value: 234, 2

iwd_api_request_total

The total number of all API requests

Unit:

Type: counter
Label: 'method', 'path', 'code', 'ccid'
Sample value: 3570, 3584, 25079, 19500

iwd_api_request_long

Number of API requests that took long time to execute

Unit:

Type: counter
Label:
Sample value:

iwd_api_request_closed

Number of API requests that expired before response was sent

Unit:

Type: counter
Label: 'method', 'path'
Sample value: 3, 14, 4, 9

iwd_api_request_duration_milliseconds

API requests duration

Unit:

Type: histogram
Label: 'le', 'method', 'path', 'code', 'ccid'
Sample value: 6, 2708, 3502, 3570

iwd_api_blacklist

Total number of blacklisted requests

Unit:

Type: counter
Label:
Sample value:

iwd_cometd_connections_total

The current number of client cometd connections to GWS

Unit:

Type: gauge
Label: 'type', 'ccid'
Sample value: 27, 1, 41

iwd_cometd_errors_total

The total number of client cometd errors

Unit:

Type: counter
Label: 'type', 'ccid'
Sample value: 1

iwd_cometd_request_errors_total

The total number of client cometd error response from GWS

Unit:

Type: counter
Label: 'type', 'name', 'ccid', 'domain'
Sample value: 13026, 64, 1, 102

iwd_cometd_request_current

The current number of client cometd requests to GWS

Unit:

Type: gauge
Label: 'type', 'name', 'ccid', 'domain'
Sample value: -6318, -11825, 0

iwd_cometd_request_duration_milliseconds

The cometd request duration (to GWS)

Unit:

Type: histogram
Label: 'le', 'type', 'name', 'ccid', 'domain'
Sample value: 6298, 6320, 6345, 6395

iwd_cometd_request_duration_milliseconds_summary

The cometd request duration (to GWS): summary

Unit:

Type: summary
Label: 'quantile', 'type', 'name', 'ccid', 'domain'
Sample value: 0, 930700, 6577, 89959

iwd_cometd_events_total

The total number of client cometd events from GWS

Unit:

Type: counter
Label: 'type', 'name', 'ccid', 'domain'
Sample value: 80, 443, 346, 17


Alerts[edit source]

The following alerts are defined for IWD.

Alert Severity Description Based on Threshold
IWD error rate CRITICAL Triggered when the number of errors in IWD exceeds the threshold for 15 min period. Default number of errors: 2


IWD DB errors CRITICAL Triggered when IWD experiences more than 2 errors within 1 minute during operations with database. Default number of errors: 2


Memory usage is above 3000 Mb CRITICAL Triggered when the pod memory usage is above 3000 MB. Default memory usage: 3000 MB


Database connections above 75 HIGH Triggered when pod database connections number is above 75. Default number of connections: 75