Difference between revisions of "PEC-IWD/Current/IWDPEGuide/IWD metrics and alerts"
(Published) |
(Published) |
||
Line 278: | Line 278: | ||
|SampleValue=80, 443, 346, 17 | |SampleValue=80, 443, 346, 17 | ||
}} | }} | ||
− | |AlertsDefined= | + | |AlertsDefined=Yes |
+ | |PEAlert={{PEAlert | ||
+ | |Alert=IWD error rate | ||
+ | |Severity=CRITICAL | ||
+ | |AlertDescription=Triggered when the number of errors in IWD exceeds the threshold for 15 min period. | ||
+ | |Threshold=Default number of errors: 2 | ||
+ | }}{{PEAlert | ||
+ | |Alert=IWD DB errors | ||
+ | |Severity=CRITICAL | ||
+ | |AlertDescription=Triggered when IWD experiences more than 2 errors within 1 minute during operations with database. | ||
+ | |Threshold=Default number of errors: 2 | ||
+ | }}{{PEAlert | ||
+ | |Alert=Memory usage is above 3000 Mb | ||
+ | |Severity=CRITICAL | ||
+ | |AlertDescription=Triggered when the pod memory usage is above 3000 MB. | ||
+ | |Threshold=Default memory usage: 3000 MB | ||
+ | }}{{PEAlert | ||
+ | |Alert=Database connections above 75 | ||
+ | |Severity=HIGH | ||
+ | |AlertDescription=Triggered when pod database connections number is above 75. | ||
+ | |Threshold=Default number of connections: 75 | ||
+ | }} | ||
}} | }} |
Latest revision as of 14:59, March 30, 2022
Find the metrics No results exposes and the alerts defined for No results.
Service | CRD or annotations? | Port | Endpoint/Selector | Metrics update interval |
---|---|---|---|---|
IWD | Both or either, depends on harvester | Default is 4024 (overridden by values) | /iwd/v3/metrics | 15 sec recommended, depends on harvester |
See details about:
Metrics[edit source]
Metric and description | Metric details | Indicator of |
---|---|---|
iwd_ Current number of established Redis connections |
Unit: Type: gauge |
|
iwd_ Current number of reconnecting Redis connections |
Unit: Type: gauge |
|
iwd_ Current number of ready Redis connections |
Unit: Type: gauge |
|
iwd_ Duration until ready state reached |
Unit: Type: histogram |
|
iwd_ Total number of Redis connection errors |
Unit: Type: counter |
|
iwdTenantDB_ The total number of all database connection requests |
Unit: Type: counter |
|
iwdTenantDB_ The total number of all database disconnection requests |
Unit: Type: counter |
|
iwdTenantDB_ The total number of all Database requests sent |
Unit: Type: counter |
|
iwdTenantDB_ The total number of all all Database requests executed successfully |
Unit: Type: counter |
|
iwdTenantDB_ The total number of all Database errors |
Unit: Type: counter |
|
iwdTenantDB_ Database transaction duration |
Unit: Type: histogram |
|
iwd_ Total user CPU time spent in seconds. |
Unit: Type: counter |
|
iwd_ Total system CPU time spent in seconds. |
Unit: Type: counter |
|
iwd_ Total user and system CPU time spent in seconds. |
Unit: Type: counter |
|
iwd_ Start time of the process since unix epoch in seconds. |
Unit: Type: gauge |
|
iwd_ Resident memory size in bytes. |
Unit: Type: gauge |
|
iwd_ Virtual memory size in bytes. |
Unit: Type: gauge |
|
iwd_ Process heap size in bytes. |
Unit: Type: gauge |
|
iwd_ Number of open file descriptors. |
Unit: Type: gauge |
|
iwd_ Maximum number of open file descriptors. |
Unit: Type: gauge |
|
iwd_ Lag of event loop in seconds. |
Unit: Type: gauge |
|
iwd_ Number of active libuv handles grouped by handle type. Every handle type is C++ class name. |
Unit: Type: gauge |
|
iwd_ Total number of active handles. |
Unit: Type: gauge |
|
iwd_ Number of active libuv requests grouped by request type. Every request type is C++ class name. |
Unit: Type: gauge |
|
iwd_ Total number of active requests. |
Unit: Type: gauge |
|
iwd_ Process heap size from node.js in bytes. |
Unit: Type: gauge |
|
iwd_ Process heap size used from node.js in bytes. |
Unit: Type: gauge |
|
iwd_ Nodejs external memory size in bytes. |
Unit: Type: gauge |
|
iwd_ Process heap space size total from node.js in bytes. |
Unit: Type: gauge |
|
iwd_ Process heap space size used from node.js in bytes. |
Unit: Type: gauge |
|
iwd_ Process heap space size available from node.js in bytes. |
Unit: Type: gauge |
|
iwd_ Node.js version info. |
Unit: Type: gauge |
|
iwd_ The total number of all API requests received |
Unit: Type: counter |
|
iwd_ The total number of all API requests with success response |
Unit: Type: counter |
|
iwd_ The total number of all API requests with error response |
Unit: Type: counter |
|
iwd_ The total number of all API requests with client error response |
Unit: Type: counter |
|
iwd_ The total number of all API requests with server error response |
Unit: Type: counter |
|
iwd_ The total number of all API requests |
Unit: Type: counter |
|
iwd_ Number of API requests that took long time to execute |
Unit: Type: counter |
|
iwd_ Number of API requests that expired before response was sent |
Unit: Type: counter |
|
iwd_ API requests duration |
Unit: Type: histogram |
|
iwd_ Total number of blacklisted requests |
Unit: Type: counter |
|
iwd_ The current number of client cometd connections to GWS |
Unit: Type: gauge |
|
iwd_ The total number of client cometd errors |
Unit: Type: counter |
|
iwd_ The total number of client cometd error response from GWS |
Unit: Type: counter |
|
iwd_ The current number of client cometd requests to GWS |
Unit: Type: gauge |
|
iwd_ The cometd request duration (to GWS) |
Unit: Type: histogram |
|
iwd_ The cometd request duration (to GWS): summary |
Unit: Type: summary |
|
iwd_ The total number of client cometd events from GWS |
Unit: Type: counter |
Alerts[edit source]
The following alerts are defined for IWD.
Alert | Severity | Description | Based on | Threshold |
---|---|---|---|---|
IWD error rate | CRITICAL | Triggered when the number of errors in IWD exceeds the threshold for 15 min period. | Default number of errors: 2
| |
IWD DB errors | CRITICAL | Triggered when IWD experiences more than 2 errors within 1 minute during operations with database. | Default number of errors: 2
| |
Memory usage is above 3000 Mb | CRITICAL | Triggered when the pod memory usage is above 3000 MB. | Default memory usage: 3000 MB
| |
Database connections above 75 | HIGH | Triggered when pod database connections number is above 75. | Default number of connections: 75 |