Cargo query
Showing below up to 100 results in range #201 to #300.
View (previous 100 | next 100) (20 | 50 | 100 | 250 | 500)
Page | Alert | Severity | AlertDescription | BasedOn | Threshold |
---|---|---|---|---|---|
Draft:PEC-OU/Current/CXCPEGuide/CPLMMetrics | CXC-PodRestartsCount | HIGH | Triggered when the restart count for a pod is beyond the threshold. | 1 for 5m | |
Draft:PEC-OU/Current/CXCPEGuide/CPLMMetrics | CXC-PodRestartsCountPD | HIGH | Triggered when the restart count is beyond the critical threshold. | 5 for 5m | |
Draft:PEC-OU/Current/CXCPEGuide/CPLMMetrics | CXC-PodsNotReadyPD | HIGH | Triggered when there are no pods ready for CX Contact deployment. | 0 for 1m | |
Draft:PEC-OU/Current/CXCPEGuide/DMMetrics | CXC-CPUUsage | HIGH | Triggered when the CPU utilization of a pod is beyond the threshold | 300% for 5m | |
Draft:PEC-OU/Current/CXCPEGuide/DMMetrics | CXC-DM-LatencyHigh | HIGH | Triggered when the latency for dial manager is above the defined threshold. | 5000ms for 5m | |
Draft:PEC-OU/Current/CXCPEGuide/DMMetrics | CXC-MemoryUsage | HIGH | Triggered when the memory utilization of a pod is beyond the threshold. | 70% for 5m | |
Draft:PEC-OU/Current/CXCPEGuide/DMMetrics | CXC-MemoryUsagePD | HIGH | Triggered when the memory usage of a pod is above the critical threshold. | 90% for 5m | |
Draft:PEC-OU/Current/CXCPEGuide/DMMetrics | CXC-PodNotReadyCount | HIGH | Triggered when the number of pods ready for a CX Contact deployment is less than or equal to the threshold. | 1 for 5m | |
Draft:PEC-OU/Current/CXCPEGuide/DMMetrics | CXC-PodRestartsCount | HIGH | Triggered when the restart count for a pod is beyond the threshold. | 1 for 5m | |
Draft:PEC-OU/Current/CXCPEGuide/DMMetrics | CXC-PodRestartsCountPD | HIGH | Triggered when the restart count is beyond the critical threshold. | 5 for 5m | |
Draft:PEC-OU/Current/CXCPEGuide/DMMetrics | CXC-PodsNotReadyPD | HIGH | Triggered when there are no pods ready for CX Contact deployment. | 0 for 1m | |
Draft:PEC-OU/Current/CXCPEGuide/JSMetrics | CXC-CPUUsage | HIGH | Triggered when the CPU utilization of a pod is beyond the threshold | 300% for 5m | |
Draft:PEC-OU/Current/CXCPEGuide/JSMetrics | CXC-JS-LatencyHigh | HIGH | Triggered when the latency for job scheduler is above the defined threshold. | 5000ms for 5m | |
Draft:PEC-OU/Current/CXCPEGuide/JSMetrics | CXC-MemoryUsage | HIGH | Triggered when the memory utilization of a pod is beyond the threshold. | 70% for 5m | |
Draft:PEC-OU/Current/CXCPEGuide/JSMetrics | CXC-MemoryUsagePD | HIGH | Triggered when the memory usage of a pod is above the critical threshold. | 90% for 5m | |
Draft:PEC-OU/Current/CXCPEGuide/JSMetrics | CXC-PodNotReadyCount | HIGH | Triggered when the number of pods ready for a CX Contact deployment is less than or equal to the threshold. | 1 for 5m | |
Draft:PEC-OU/Current/CXCPEGuide/JSMetrics | CXC-PodRestartsCount | HIGH | Triggered when the restart count for a pod is beyond the threshold. | 1 for 5m | |
Draft:PEC-OU/Current/CXCPEGuide/JSMetrics | CXC-PodRestartsCountPD | HIGH | Triggered when the restart count is beyond the critical threshold. | 5 for 5m | |
Draft:PEC-OU/Current/CXCPEGuide/JSMetrics | CXC-PodsNotReadyPD | HIGH | Triggered when there are no pods ready for CX Contact deployment. | 0 for 1m | |
Draft:PEC-OU/Current/CXCPEGuide/LBMetrics | CXC-CPUUsage | HIGH | Triggered when the CPU utilization of a pod is beyond the threshold | 300% for 5m | |
Draft:PEC-OU/Current/CXCPEGuide/LBMetrics | CXC-LB-LatencyHigh | HIGH | Triggered when the latency for list builder is above the defined threshold. | 5000ms for 5m | |
Draft:PEC-OU/Current/CXCPEGuide/LBMetrics | CXC-MemoryUsage | HIGH | Triggered when the memory utilization of a pod is beyond the threshold. | 70% for 5m | |
Draft:PEC-OU/Current/CXCPEGuide/LBMetrics | CXC-MemoryUsagePD | HIGH | Triggered when the memory usage of a pod is above the critical threshold. | 90% for 5m | |
Draft:PEC-OU/Current/CXCPEGuide/LBMetrics | CXC-PodNotReadyCount | HIGH | Triggered when the number of pods ready for a CX Contact deployment is less than or equal to the threshold. | 1 for 5m | |
Draft:PEC-OU/Current/CXCPEGuide/LBMetrics | CXC-PodRestartsCount | HIGH | Triggered when the restart count for a pod is beyond the threshold. | 1 for 5m | |
Draft:PEC-OU/Current/CXCPEGuide/LBMetrics | CXC-PodRestartsCountPD | HIGH | Triggered when the restart count is beyond the critical threshold. | 5 for 5m | |
Draft:PEC-OU/Current/CXCPEGuide/LBMetrics | CXC-PodsNotReadyPD | HIGH | Triggered when there are no pods ready for CX Contact deployment. | 0 for 1m | |
Draft:PEC-OU/Current/CXCPEGuide/LMMetrics | CXC-CPUUsage | HIGH | Triggered when the CPU utilization of a pod is beyond the threshold | 300% for 5m | |
Draft:PEC-OU/Current/CXCPEGuide/LMMetrics | CXC-LM-LatencyHigh | HIGH | Triggered when the latency for list manager is above the defined threshold | 5000ms for 5m | |
Draft:PEC-OU/Current/CXCPEGuide/LMMetrics | CXC-MemoryUsage | HIGH | Triggered when the memory utilization of a pod is beyond the threshold. | 70% for 5m | |
Draft:PEC-OU/Current/CXCPEGuide/LMMetrics | CXC-MemoryUsagePD | HIGH | Triggered when the memory usage of a pod is above the critical threshold. | 90% for 5m | |
Draft:PEC-OU/Current/CXCPEGuide/LMMetrics | CXC-PodNotReadyCount | HIGH | Triggered when the number of pods ready for a CX Contact deployment is less than or equal to the threshold. | 1 for 5m | |
Draft:PEC-OU/Current/CXCPEGuide/LMMetrics | CXC-PodRestartsCount | HIGH | Triggered when the restart count for a pod is beyond the threshold. | 1 for 5m | |
Draft:PEC-OU/Current/CXCPEGuide/LMMetrics | CXC-PodRestartsCountPD | HIGH | Triggered when the restart count is beyond the critical threshold. | 5 for 5m | |
Draft:PEC-OU/Current/CXCPEGuide/LMMetrics | CXC-PodsNotReadyPD | HIGH | Triggered when there are no pods ready for CX Contact deployment. | 0 for 1m | |
Draft:PEC-OU/Current/CXCPEGuide/LMMetrics | cxc_list_manager_too_many_errors_from_auth | HIGH | Triggered when there are too many error responses from the auth service (list manager) for more than the specified time threshold. | 1m | |
Draft:PEC-REP/Current/GCXIPEGuide/GCXIMetrics | gcxi__cluster__info | This alert indicates problems with the cluster states. Applicable only if you have two or more nodes in a cluster. | gcxi__cluster__info | ||
Draft:PEC-REP/Current/GCXIPEGuide/GCXIMetrics | gcxi__projects__status | If the value of cxi__projects__status is greater than 0, this alarm is set, indicating that reporting is not functioning properly. | cxi__projects__status | < 0 | |
Draft:PEC-REP/Current/GCXIPEGuide/RAAMetrics | raa-errors | '''Specified by''': raa. '''Recommended value''': warning |
A nonzero value indicates that errors have been logged during the scrape interval. | gcxi_raa_error_count | >0 |
Draft:PEC-REP/Current/GCXIPEGuide/RAAMetrics | raa-health | '''Specified by''': raa. '''Recommended value:''' severe |
A zero value for a recent period (several scrape intervals) indicates that RAA is not operating. | gcxi_raa_health_level | Specified by: raa. '''Recommended value''': 30m |
Draft:PEC-REP/Current/GCXIPEGuide/RAAMetrics | raa-long-aggregation | '''Specified by''': raa. '''Recommended value''': warning |
Indicates that the average duration of aggregation queries specified by the hierarchy, level, and mediaType labels is greater than the deadlock-threshold. | gcxi_raa_aggregated_duration_ms/ gcxi_raa_aggregated_count | Greater than the value (seconds) of raa.prometheusRule.alerts.longAggregation.thresholdSec in values.yaml. '''Recommended value''': 300 |
Draft:PEC-REP/Current/GIMPEGuide/GCAMetrics | GcaOOMKilled | Critical | Triggered when a GCA pod is restarted because of OOMKilled. | kube_pod_container_status_restarts_total and kube_pod_container_status_last_terminated_reason | 1 |
Draft:PEC-REP/Current/GIMPEGuide/GCAMetrics | GcaPodCrashLooping | Critical | Triggered when a GCA pod is crash looping. | kube_pod_container_status_restarts_total | The restart rate is greater than 0 for 5 minutes |
Draft:PEC-REP/Current/GIMPEGuide/GSPMetrics | GspFlinkJobDown | Critical | Triggered when the GSP Flink job is not running (number of running jobs equals to 0 or metric is not available) | flink_jobmanager_numRunningJobs | For 5 minutes |
Draft:PEC-REP/Current/GIMPEGuide/GSPMetrics | GspNoTmRegistered | Critical | Triggered when there are no registered TaskManagers (or metric not available) | flink_jobmanager_numRegisteredTaskManagers | For 5 minutes |
Draft:PEC-REP/Current/GIMPEGuide/GSPMetrics | GspOOMKilled | Critical | Triggered when a GSP pod is restarted because of OOMKilled | kube_pod_container_status_restarts_total | 0 |
Draft:PEC-REP/Current/GIMPEGuide/GSPMetrics | GspUnknownPerson | High | Triggered when GSP encounters unknown person(s) | flink_ |
For 5 minutes |
Draft:PEC-REP/Current/PulsePEGuide/dcuMetrics | pulse_dcu_critical_col_connected_configservers | Critical | Pulse DCU Collector is not connected to ConfigServer. | pulse_collector_connection_status | for 15 minutes |
Draft:PEC-REP/Current/PulsePEGuide/dcuMetrics | pulse_dcu_critical_col_connected_dbservers | Critical | Pulse DCU Collector is not connected to DbServer. | pulse_collector_connection_status | for 15 minutes |
Draft:PEC-REP/Current/PulsePEGuide/dcuMetrics | pulse_dcu_critical_col_connected_statservers | Critical | Pulse DCU Collector is not connected to Stat Server. | pulse_collector_connection_status | for 15 minutes |
Draft:PEC-REP/Current/PulsePEGuide/dcuMetrics | pulse_dcu_critical_col_snapshot_writing | Critical | Pulse DCU Collector does not write snapshots. | pulse_collector_snapshot_writing_status | for 15 minutes |
Draft:PEC-REP/Current/PulsePEGuide/dcuMetrics | pulse_dcu_critical_cpu | Critical | Detected critical CPU usage by Pulse DCU Pod. | container_cpu_usage_seconds_total, kube_pod_container_resource_limits | 90% |
Draft:PEC-REP/Current/PulsePEGuide/dcuMetrics | pulse_dcu_critical_disk | Critical | Detected critical disk usage by Pulse DCU Pod. | kubelet_volume_stats_available_bytes, kubelet_volume_stats_capacity_bytes | 90% |
Draft:PEC-REP/Current/PulsePEGuide/dcuMetrics | pulse_dcu_critical_memory | Critical | Detected critical memory usage by Pulse DCU Pod. | container_memory_working_set_bytes, kube_pod_container_resource_limits | 90% |
Draft:PEC-REP/Current/PulsePEGuide/dcuMetrics | pulse_dcu_critical_nonrunning_instances | Critical | Triggered when Pulse DCU instances are down. | kube_statefulset_status_replicas_ready, kube_statefulset_status_replicas | for 15 minutes |
Draft:PEC-REP/Current/PulsePEGuide/dcuMetrics | pulse_dcu_critical_ss_connected_configservers | Critical | Pulse DCU Stat Server is not connected to ConfigServer. | pulse_statserver_server_connected_seconds | for 15 minutes |
Draft:PEC-REP/Current/PulsePEGuide/dcuMetrics | pulse_dcu_critical_ss_connected_ixnservers | Critical | Pulse DCU Stat Server is not connected to IxnServers. | pulse_statserver_server_connected_seconds | 2 |
Draft:PEC-REP/Current/PulsePEGuide/dcuMetrics | pulse_dcu_critical_ss_connected_tservers | Critical | Pulse DCU Stat Server is not connected to T-Servers. | pulse_statserver_server_connected_number | 2 |
Draft:PEC-REP/Current/PulsePEGuide/dcuMetrics | pulse_dcu_critical_ss_failed_dn_registrations | Critical | Detected critical DN registration failures on Pulse DCU Stat Server. | pulse_statserver_dn_failed, pulse_statserver_dn_registered | 0.5% |
Draft:PEC-REP/Current/PulsePEGuide/dcuMetrics | pulse_dcu_monitor_data_unavailable | Critical | Pulse DCU Monitor Agents do not provide data. | pulse_monitor_check_duration_seconds, kube_statefulset_replicas | for 15 minutes |
Draft:PEC-REP/Current/PulsePEGuide/dcuMetrics | pulse_dcu_too_frequent_restarts | Critical | Detected too frequent restarts of DCU Pod container. | kube_pod_container_status_restarts_total | 2 for 1 hour |
Draft:PEC-REP/Current/PulsePEGuide/ldsMetrics | pulse_lds_critical_cpu | Critical | Detected critical CPU usage by Pulse LDS Pod. | container_cpu_usage_seconds_total, kube_pod_container_resource_limits | 90% |
Draft:PEC-REP/Current/PulsePEGuide/ldsMetrics | pulse_lds_critical_memory | Critical | Detected critical memory usage by Pulse LDS Pod. | container_memory_working_set_bytes, kube_pod_container_resource_limits | 90% |
Draft:PEC-REP/Current/PulsePEGuide/ldsMetrics | pulse_lds_critical_nonrunning_instances | Critical | Triggered when Pulse LDS instances are down. | kube_statefulset_status_replicas_ready, kube_statefulset_status_replicas | for 15 minutes |
Draft:PEC-REP/Current/PulsePEGuide/ldsMetrics | pulse_lds_monitor_data_unavailable | Critical | Pulse LDS Monitor Agents do not provide data. | pulse_monitor_check_duration_seconds, kube_statefulset_replicas | for 15 minutes |
Draft:PEC-REP/Current/PulsePEGuide/ldsMetrics | pulse_lds_no_connected_senders | Critical | Pule LDS is not connected to upstream servers. | pulse_lds_senders_number | for 15 minutes |
Draft:PEC-REP/Current/PulsePEGuide/ldsMetrics | pulse_lds_no_registered_dns | Critical | No DNs are registered on Pulse LDS. | pulse_lds_sender_registered_dns_number | for 30 minutes |
Draft:PEC-REP/Current/PulsePEGuide/ldsMetrics | pulse_lds_too_frequent_restarts | Critical | Detected too frequent restarts of LDS Pod container. | kube_pod_container_status_restarts_total | 2 for 1 hour |
Draft:PEC-REP/Current/PulsePEGuide/PulseMetrics | pulse_critical_5xx | Critical | Detected critical 5xx errors per second for Pulse container. | http_server_requests_seconds_count | 15% |
Draft:PEC-REP/Current/PulsePEGuide/PulseMetrics | pulse_critical_cpu | Critical | Detected critical CPU usage by Pulse Pod. | container_cpu_usage_seconds_total, kube_pod_container_resource_limits | 90% |
Draft:PEC-REP/Current/PulsePEGuide/PulseMetrics | pulse_critical_hikari_cp | Critical | Detected critical Hikari connections pool usage by Pulse container. | hikaricp_connections_active, hikaricp_connections | 90% |
Draft:PEC-REP/Current/PulsePEGuide/PulseMetrics | pulse_critical_memory | Critical | Detected critical memory usage by Pulse Pod. | container_memory_working_set_bytes, kube_pod_container_resource_limits | 90% |
Draft:PEC-REP/Current/PulsePEGuide/PulseMetrics | pulse_critical_pulse_health | Critical | Detected critical number of healthy Pulse containers. | pulse_health_all_Boolean | 50% |
Draft:PEC-REP/Current/PulsePEGuide/PulseMetrics | pulse_critical_running_instances | Critical | Triggered when Pulse instances are down. | kube_deployment_status_replicas_available, kube_deployment_status_replicas | 75% |
Draft:PEC-REP/Current/PulsePEGuide/PulseMetrics | pulse_service_down | Critical | All Pulse instances are down. | up | for 15 minutes |
Draft:PEC-REP/Current/PulsePEGuide/PulseMetrics | pulse_too_frequent_restarts | Critical | Detected too frequent restarts of Pulse Pod container. | kube_pod_container_status_restarts_total | 2 for 1 hour |
Draft:PEC-REP/Current/PulsePEGuide/PulsePermissionsMetrics | pulse_permissions_critical_cpu | Critical | Detected critical CPU usage by Pulse Permissions Pod. | container_cpu_usage_seconds_total, kube_pod_container_resource_limits | 90% |
Draft:PEC-REP/Current/PulsePEGuide/PulsePermissionsMetrics | pulse_permissions_critical_memory | Critical | Detected critical memory usage by Pulse Permissions Pod. | container_memory_working_set_bytes, kube_pod_container_resource_limits | 90% |
Draft:PEC-REP/Current/PulsePEGuide/PulsePermissionsMetrics | pulse_permissions_critical_running_instances | Critical | Triggered when Pulse Permissions instances are down. | kube_deployment_status_replicas_available, kube_deployment_status_replicas | 75% |
Draft:PEC-REP/Current/PulsePEGuide/PulsePermissionsMetrics | pulse_permissions_too_frequent_restarts | Critical | Detected too frequent restarts of Permissions Pod container. | kube_pod_container_status_restarts_total | 2 for 1 hour |
Draft:STRMS/Current/STRMSPEGuide/ServiceMetrics | streams_GWS_AUTH_DOWN | critical | Unable to connect to GWS auth service | gws_auth_down | 10 seconds |
Draft:STRMS/Current/STRMSPEGuide/STRMSMetrics | streams_BATCH_LAG_TIME | warning | Message handling exceeds 2 secs | 30 seconds | |
Draft:STRMS/Current/STRMSPEGuide/STRMSMetrics | streams_DOWN | critical | The number of running instances is 0 | sum(up) < 1 | 10 seconds |
Draft:STRMS/Current/STRMSPEGuide/STRMSMetrics | streams_ENDPOINT_CONNECTION_DOWN | warning | Unable to connect to a customer endpoint | endpoint_connection_down | 30 seconds |
Draft:STRMS/Current/STRMSPEGuide/STRMSMetrics | streams_ENGAGE_KAFKA_CONNECTION_DOWN | critical | Unable to connect to Engage Kafka | engage_kafka_main_connection_down | 10 seconds |
Draft:STRMS/Current/STRMSPEGuide/STRMSMetrics | streams_GWS_AUTH_DOWN | Critical | Unable to connect to GWS auth service | gws_auth_down | 30 seconds |
Draft:STRMS/Current/STRMSPEGuide/STRMSMetrics | streams_GWS_CONFIG_DOWN | critical | Unable to connect to GWS config service | gws_config_down | |
Draft:STRMS/Current/STRMSPEGuide/STRMSMetrics | streams_GWS_ENV_DOWN | critical | Unable to connect to GWS environment service | gws_env_down | 30 seconds |
Draft:STRMS/Current/STRMSPEGuide/STRMSMetrics | streams_INIT_ERROR | critical | Aborted due to initialization error e.g., KAFKA_FQDN is not defined | application_streams_init_error > 0 | 10 seconds |
Draft:STRMS/Current/STRMSPEGuide/STRMSMetrics | streams_REDIS_DOWN | critical | redis_connection_down | 10 seconds | |
Draft:TLM/Current/TLMPEGuide/TLMMetrics | Http Errors Occurrences Exceeded Threshold | High | Triggered when the number of HTTP errors exceeds 500 responses in 5 minutes | telemetry_events{eventName=~"http_error_.*", eventName!="http_error_404"} | >500 in 5 minutes |
Draft:TLM/Current/TLMPEGuide/TLMMetrics | Telemetry CPU Utilization is Greater Than Threshold | High | Triggered when average CPU usage is more than 60% | node_cpu_seconds_total | >60% |
Draft:TLM/Current/TLMPEGuide/TLMMetrics | Telemetry Dependency Status | Low | Triggered when there is no connection to one of the dependent services - GAuth, Config, Prometheus | telemetry_dependency_status | <80 |
Draft:TLM/Current/TLMPEGuide/TLMMetrics | Telemetry GAuth Time Alert | High | Triggered when there is no connection to the GAuth service | telemetry_gws_auth_req_time | >10000 |
Draft:TLM/Current/TLMPEGuide/TLMMetrics | Telemetry Healthy Pod Count Alert | High | Triggered when the number of healthy pods drops to critical level | kube_pod_container_status_ready | <2 |
Draft:TLM/Current/TLMPEGuide/TLMMetrics | Telemetry High Network Traffic | High | Triggered when network traffic exceeds 10MB/second for 5 minutes | node_network_transmit_bytes_total, node_network_receive_bytes_total | >10MBps |
Draft:TLM/Current/TLMPEGuide/TLMMetrics | Telemetry Memory Usage is Greater Than Threshold | High | Triggered when average memory usage is more than 60% | container_cpu_usage_seconds_total, kube_pod_container_resource_limits_cpu_cores | >60% |
Draft:UCS/Current/UCSPEGuide/UCSMetrics | ucsx_elasticsearch_health_status | critical | Triggered when there is no connection to ElasticSearch | ucsx_elasticsearch_health_status | 2 minutes |
Draft:UCS/Current/UCSPEGuide/UCSMetrics | ucsx_elasticsearch_slow_processing_time | critical | Triggered when Elasticsearch internal processing time > 500 ms | ucsx_elastic_search_sum, ucsx_elastic_search_count | 5 minutes |
Draft:UCS/Current/UCSPEGuide/UCSMetrics | ucsx_instance_high_cpu_utilization | warning | Triggered when average CPU usage is more than 80% | ucsx_performance | 5 minutes |