IWD metrics and alerts
Find the metrics No results exposes and the alerts defined for No results.
Service | CRD or annotations? | Port | Endpoint/Selector | Metrics update interval |
---|---|---|---|---|
IWD | Both or either, depends on harvester | Default is 4024 (overridden by values) | /iwd/v3/metrics | 15 sec recommended, depends on harvester |
See details about:
Metrics[edit source]
Metric and description | Metric details | Indicator of |
---|---|---|
iwd_ Current number of established Redis connections |
Unit: Type: gauge |
|
iwd_ Current number of reconnecting Redis connections |
Unit: Type: gauge |
|
iwd_ Current number of ready Redis connections |
Unit: Type: gauge |
|
iwd_ Duration until ready state reached |
Unit: Type: histogram |
|
iwd_ Total number of Redis connection errors |
Unit: Type: counter |
|
iwdTenantDB_ The total number of all database connection requests |
Unit: Type: counter |
|
iwdTenantDB_ The total number of all database disconnection requests |
Unit: Type: counter |
|
iwdTenantDB_ The total number of all Database requests sent |
Unit: Type: counter |
|
iwdTenantDB_ The total number of all all Database requests executed successfully |
Unit: Type: counter |
|
iwdTenantDB_ The total number of all Database errors |
Unit: Type: counter |
|
iwdTenantDB_ Database transaction duration |
Unit: Type: histogram |
|
iwd_ Total user CPU time spent in seconds. |
Unit: Type: counter |
|
iwd_ Total system CPU time spent in seconds. |
Unit: Type: counter |
|
iwd_ Total user and system CPU time spent in seconds. |
Unit: Type: counter |
|
iwd_ Start time of the process since unix epoch in seconds. |
Unit: Type: gauge |
|
iwd_ Resident memory size in bytes. |
Unit: Type: gauge |
|
iwd_ Virtual memory size in bytes. |
Unit: Type: gauge |
|
iwd_ Process heap size in bytes. |
Unit: Type: gauge |
|
iwd_ Number of open file descriptors. |
Unit: Type: gauge |
|
iwd_ Maximum number of open file descriptors. |
Unit: Type: gauge |
|
iwd_ Lag of event loop in seconds. |
Unit: Type: gauge |
|
iwd_ Number of active libuv handles grouped by handle type. Every handle type is C++ class name. |
Unit: Type: gauge |
|
iwd_ Total number of active handles. |
Unit: Type: gauge |
|
iwd_ Number of active libuv requests grouped by request type. Every request type is C++ class name. |
Unit: Type: gauge |
|
iwd_ Total number of active requests. |
Unit: Type: gauge |
|
iwd_ Process heap size from node.js in bytes. |
Unit: Type: gauge |
|
iwd_ Process heap size used from node.js in bytes. |
Unit: Type: gauge |
|
iwd_ Nodejs external memory size in bytes. |
Unit: Type: gauge |
|
iwd_ Process heap space size total from node.js in bytes. |
Unit: Type: gauge |
|
iwd_ Process heap space size used from node.js in bytes. |
Unit: Type: gauge |
|
iwd_ Process heap space size available from node.js in bytes. |
Unit: Type: gauge |
|
iwd_ Node.js version info. |
Unit: Type: gauge |
|
iwd_ The total number of all API requests received |
Unit: Type: counter |
|
iwd_ The total number of all API requests with success response |
Unit: Type: counter |
|
iwd_ The total number of all API requests with error response |
Unit: Type: counter |
|
iwd_ The total number of all API requests with client error response |
Unit: Type: counter |
|
iwd_ The total number of all API requests with server error response |
Unit: Type: counter |
|
iwd_ The total number of all API requests |
Unit: Type: counter |
|
iwd_ Number of API requests that took long time to execute |
Unit: Type: counter |
|
iwd_ Number of API requests that expired before response was sent |
Unit: Type: counter |
|
iwd_ API requests duration |
Unit: Type: histogram |
|
iwd_ Total number of blacklisted requests |
Unit: Type: counter |
|
iwd_ The current number of client cometd connections to GWS |
Unit: Type: gauge |
|
iwd_ The total number of client cometd errors |
Unit: Type: counter |
|
iwd_ The total number of client cometd error response from GWS |
Unit: Type: counter |
|
iwd_ The current number of client cometd requests to GWS |
Unit: Type: gauge |
|
iwd_ The cometd request duration (to GWS) |
Unit: Type: histogram |
|
iwd_ The cometd request duration (to GWS): summary |
Unit: Type: summary |
|
iwd_ The total number of client cometd events from GWS |
Unit: Type: counter |
Alerts[edit source]
The following alerts are defined for IWD.
Alert | Severity | Description | Based on | Threshold |
---|---|---|---|---|
IWD error rate | CRITICAL | Triggered when the number of errors in IWD exceeds the threshold for 15 min period. | Default number of errors: 2
| |
IWD DB errors | CRITICAL | Triggered when IWD experiences more than 2 errors within 1 minute during operations with database. | Default number of errors: 2
| |
Memory usage is above 3000 Mb | CRITICAL | Triggered when the pod memory usage is above 3000 MB. | Default memory usage: 3000 MB
| |
Database connections above 75 | HIGH | Triggered when pod database connections number is above 75. | Default number of connections: 75 |