IWD metrics and alerts
Find the metrics No results exposes and the alerts defined for No results.
| Service | CRD or annotations? | Port | Endpoint/Selector | Metrics update interval |
|---|---|---|---|---|
| IWD | Both or either, depends on harvester | Default is 4024 (overridden by values) | /iwd/v3/metrics | 15 sec recommended, depends on harvester |
See details about:
Metrics[edit source]
| Metric and description | Metric details | Indicator of |
|---|---|---|
| iwd_ Current number of established Redis connections |
Unit: Type: gauge |
|
| iwd_ Current number of reconnecting Redis connections |
Unit: Type: gauge |
|
| iwd_ Current number of ready Redis connections |
Unit: Type: gauge |
|
| iwd_ Duration until ready state reached |
Unit: Type: histogram |
|
| iwd_ Total number of Redis connection errors |
Unit: Type: counter |
|
| iwdTenantDB_ The total number of all database connection requests |
Unit: Type: counter |
|
| iwdTenantDB_ The total number of all database disconnection requests |
Unit: Type: counter |
|
| iwdTenantDB_ The total number of all Database requests sent |
Unit: Type: counter |
|
| iwdTenantDB_ The total number of all all Database requests executed successfully |
Unit: Type: counter |
|
| iwdTenantDB_ The total number of all Database errors |
Unit: Type: counter |
|
| iwdTenantDB_ Database transaction duration |
Unit: Type: histogram |
|
| iwd_ Total user CPU time spent in seconds. |
Unit: Type: counter |
|
| iwd_ Total system CPU time spent in seconds. |
Unit: Type: counter |
|
| iwd_ Total user and system CPU time spent in seconds. |
Unit: Type: counter |
|
| iwd_ Start time of the process since unix epoch in seconds. |
Unit: Type: gauge |
|
| iwd_ Resident memory size in bytes. |
Unit: Type: gauge |
|
| iwd_ Virtual memory size in bytes. |
Unit: Type: gauge |
|
| iwd_ Process heap size in bytes. |
Unit: Type: gauge |
|
| iwd_ Number of open file descriptors. |
Unit: Type: gauge |
|
| iwd_ Maximum number of open file descriptors. |
Unit: Type: gauge |
|
| iwd_ Lag of event loop in seconds. |
Unit: Type: gauge |
|
| iwd_ Number of active libuv handles grouped by handle type. Every handle type is C++ class name. |
Unit: Type: gauge |
|
| iwd_ Total number of active handles. |
Unit: Type: gauge |
|
| iwd_ Number of active libuv requests grouped by request type. Every request type is C++ class name. |
Unit: Type: gauge |
|
| iwd_ Total number of active requests. |
Unit: Type: gauge |
|
| iwd_ Process heap size from node.js in bytes. |
Unit: Type: gauge |
|
| iwd_ Process heap size used from node.js in bytes. |
Unit: Type: gauge |
|
| iwd_ Nodejs external memory size in bytes. |
Unit: Type: gauge |
|
| iwd_ Process heap space size total from node.js in bytes. |
Unit: Type: gauge |
|
| iwd_ Process heap space size used from node.js in bytes. |
Unit: Type: gauge |
|
| iwd_ Process heap space size available from node.js in bytes. |
Unit: Type: gauge |
|
| iwd_ Node.js version info. |
Unit: Type: gauge |
|
| iwd_ The total number of all API requests received |
Unit: Type: counter |
|
| iwd_ The total number of all API requests with success response |
Unit: Type: counter |
|
| iwd_ The total number of all API requests with error response |
Unit: Type: counter |
|
| iwd_ The total number of all API requests with client error response |
Unit: Type: counter |
|
| iwd_ The total number of all API requests with server error response |
Unit: Type: counter |
|
| iwd_ The total number of all API requests |
Unit: Type: counter |
|
| iwd_ Number of API requests that took long time to execute |
Unit: Type: counter |
|
| iwd_ Number of API requests that expired before response was sent |
Unit: Type: counter |
|
| iwd_ API requests duration |
Unit: Type: histogram |
|
| iwd_ Total number of blacklisted requests |
Unit: Type: counter |
|
| iwd_ The current number of client cometd connections to GWS |
Unit: Type: gauge |
|
| iwd_ The total number of client cometd errors |
Unit: Type: counter |
|
| iwd_ The total number of client cometd error response from GWS |
Unit: Type: counter |
|
| iwd_ The current number of client cometd requests to GWS |
Unit: Type: gauge |
|
| iwd_ The cometd request duration (to GWS) |
Unit: Type: histogram |
|
| iwd_ The cometd request duration (to GWS): summary |
Unit: Type: summary |
|
| iwd_ The total number of client cometd events from GWS |
Unit: Type: counter |
Alerts[edit source]
The following alerts are defined for IWD.
| Alert | Severity | Description | Based on | Threshold |
|---|---|---|---|---|
| IWD error rate | CRITICAL | Triggered when the number of errors in IWD exceeds the threshold for 15 min period. | Default number of errors: 2
| |
| IWD DB errors | CRITICAL | Triggered when IWD experiences more than 2 errors within 1 minute during operations with database. | Default number of errors: 2
| |
| Memory usage is above 3000 Mb | CRITICAL | Triggered when the pod memory usage is above 3000 MB. | Default memory usage: 3000 MB
| |
| Database connections above 75 | HIGH | Triggered when pod database connections number is above 75. | Default number of connections: 75 |