User: Jose.druker@genesys.com/PEMetrics
flink_(job|task)manager_Status_JVM_Memory_Heap_Used
Type: Gauge
Unit:
Label: pod
Description:
SampleValue:
Indicator of: Saturation
flink_(job|task)manager_Status_JVM_Memory_Heap_Max
Type: Gauge
Unit:
Label: pod
Description:
SampleValue:
Indicator of: Saturation
flink_(job|task)manager_Status_JVM_Memory_NonHeap_Used
Type: Gauge
Unit:
Label: pod
Description:
SampleValue:
Indicator of: Saturation
flink_(job|task)manager_Status_JVM_Memory_NonHeap_Max
Type: Gauge
Unit:
Label: pod
Description:
SampleValue:
Indicator of: Saturation
flink_(job|task)manager_Status_JVM_Memory_Direct_MemoryUsed
Type: Gauge
Unit:
Label: pod
Description:
SampleValue:
Indicator of: Saturation
flink_(job|task)manager_Status_JVM_Memory_Direct_TotalCapacity
Type: Gauge
Unit:
Label: pod
Description:
SampleValue:
Indicator of: Saturation
flink_(job|task)manager_Status_JVM_CPU_Load
Type: Gauge
Unit:
Label: pod
Description:
SampleValue:
Indicator of: Saturation
flink_taskmanager_job_task_operator_numChainsProcessedPerSecond
Type: Gauge
Unit:
Label:
Description: Number of EventOCSChainStartProcessing events per second (CPS).
SampleValue:
Indicator of: Traffic
flink_taskmanager_job_task_operator_numChainsProcessed
Type: Gauge
Unit:
Label:
Description: Total number of EventOCSChainStartProcessing events GSP received since it started processing.
SampleValue:
Indicator of: Traffic
flink_taskmanager_job_task_operator_numCallThreadsCreatedPerSecond
Type: Gauge
Unit:
Label:
Description: Number of CallThread creation events per second (CTHPS).
SampleValue:
Indicator of: Traffic
flink_taskmanager_job_task_operator_numThreadsCreated
Type: Gauge
Unit:
Label:
Description: Total number of CallThreads GSP received since it started processing.
SampleValue:
Indicator of: Traffic
flink_taskmanager_job_task_operator_numCallsCreatedPerSecond
Type: Gauge
Unit:
Label:
Description: Number of EventCallCreated events per second (CPS).
SampleValue:
Indicator of: Traffic
flink_taskmanager_job_task_operator_numCallsCreated
Type: Gauge
Unit:
Label:
Description: Total number of EventCallCreated events GSP received since it started processing.
SampleValue:
Indicator of: Traffic
flink_taskmanager_job_task_operator_records_consumed_rate
Type: Gauge
Unit:
Label:
Description:
SampleValue:
Indicator of: Traffic
flink_taskmanager_job_task_operator_records_lag_max
Type: Gauge
Unit:
Label:
Description:
SampleValue:
Indicator of: Latency
flink_taskmanager_job_task_operator_currentOutputWatermark
Type: Gauge
Unit:
Label: operator_name:
- Sink:_Agent_State_Facts
- Sink:_Interaction_Facts
Description:
SampleValue:
Indicator of: Latency
flink_taskmanager_job_task_operator_currentInputWatermark
Type: Gauge
Unit: milliseconds
Label: operator_name
Description: The last watermark received by this operator/tasks, in milliseconds.
Note: For operators/tasks with 2 inputs, this is the earlier of the last received watermarks.
SampleValue:
Indicator of: Latency
flink_taskmanager_job_task_operator_maxInteractionEndTs
Type: Gauge
Unit:
Label:
Description: Maximum event time seen.
SampleValue:
Indicator of: Latency
flink_taskmanager_job_task_operator_numForcefullyEndedReasons
Type: Gauge
Unit:
Label:
Description: Number of forcibly ended agent state reasons.
SampleValue:
Indicator of: Error
flink_taskmanager_job_task_operator_numForcefullyEndedStates
Type: Gauge
Unit:
Label:
Description: Number of forcibly ended agent states.
SampleValue: 1
Indicator of: Error
flink_taskmanager_job_task_operator_numForcefullyEndedSessions
Type: Gauge
Unit:
Label:
Description: Number of forcibly ended agent sessions.
SampleValue: 1
Indicator of: Error
flink_taskmanager_job_task_operator_user_errors_numOversizedMessages
Type: Gauge
Unit:
Label: operator_name
Description: Number of messages exceeding the max.request.size Kafka option.
SampleValue: 0
Indicator of: Error
flink_jobmanager_numRunningJobs
Type: Gauge
Unit:
Label:
Description: Number of running Flink jobs. If less than 1, there is a problem.
SampleValue: 1
Indicator of: Error
flink_taskmanager_job_task_operator_errors_numInvalidRecords
Type: Gauge
Unit:
Label:
Description: Number of invalid input records.
SampleValue: 0
Indicator of: Error
Examples of alternative formatting for metrics, Options 1-3[ | edit source]
Content is from GSP metrics and alerts.
Continued on another page:
Option 1: Table with lightbox[ | edit source]
Option 2: List with lightbox[ | edit source]
Option 3: Table[ | edit source]
Metric | Type | Unit | Label | Description | SampleValue | Indicator of |
---|---|---|---|---|---|---|
flink_ |
Gauge | 0 | Error | |||
flink_ |
Gauge | 1 | Error | |||
flink_ |
Gauge | operator_name | 0 | Error | ||
flink_ |
Gauge | 1 | Error | |||
flink_ |
Gauge | 0 | Error | |||
flink_ |
Gauge | 1 | Error | |||
flink_ |
Gauge | operator_name | 0 | Error | ||
flink_ |
Gauge | 1 | Error | |||
iwddm_ |
gauge | tenant, job, execution_chain | Indicates that IWDDM is active
Values:
|
iwddm_job_active{execution_chain="TEST", job="iwddm_metrics", tenant="TEST"} 1 | Error | |
iwddm_ |
gauge | milliseconds | tenant, job, execution_chain | Indicates when the IWDDM job started
Value: Unix timestamp when job started |
iwddm_job_last_start{execution_chain="TEST", job="iwddm_metrics", tenant="TEST"} 1618322383 | Error |
iwddm_ |
gauge | milliseconds | tenant, job, execution_chain | Indicates when the IWDDM job succeeded
Value: Unix timestamp when job succeeded |
iwddm_job_last_success{execution_chain="TEST", job="iwddm_metrics", tenant="TEST"} 1618322383 | Error |
iwddm_ |
gauge | milliseconds | tenant, job, execution_chain | Indicates when the IWDDM job failed
Value: Unix timestamp when job failed |
iwddm_job_last_fail{execution_chain="TEST", job="iwddm_metrics", tenant="TEST"} 1618322383 | Error |
sdServiceCounter | Gauge | Unsigned32 | Consul and Configserver Sync Check Counter | 131 | Useful for checking if SD is stuck or there is any deadlock | |
sdServiceLastRun | Gauge | Unix Time | Last Time When Consul and Configserver Sync Check has Run | 1634071196.054 | Useful for checking if SD is stuck or there is any deadlock | |
iwddm_ |
gauge | tenant, job, execution_chain | Indicates that IWDDM is active
Values:
|
iwddm_job_active{execution_chain="TEST", job="iwddm_metrics", tenant="TEST"} 1 | Error | |
iwddm_ |
gauge | milliseconds | tenant, job, execution_chain | Indicates when the IWDDM job started
Value: Unix timestamp when job started |
iwddm_job_last_start{execution_chain="TEST", job="iwddm_metrics", tenant="TEST"} 1618322383 | Error |
iwddm_ |
gauge | milliseconds | tenant, job, execution_chain | Indicates when the IWDDM job succeeded
Value: Unix timestamp when job succeeded |
iwddm_job_last_success{execution_chain="TEST", job="iwddm_metrics", tenant="TEST"} 1618322383 | Error |
iwddm_ |
gauge | milliseconds | tenant, job, execution_chain | Indicates when the IWDDM job failed
Value: Unix timestamp when job failed |
iwddm_job_last_fail{execution_chain="TEST", job="iwddm_metrics", tenant="TEST"} 1618322383 | Error |
ixn_ |
Gauge | Amount | None | Indicates the number of clients that are connected to IXN at the moment. | 5 | Workload |
ixn_ |
Gauge | Amount | None | Indicates the number of 'connected to' IXN routers. | 1 | Workload, Operability |
ixn_ |
Gauge | Amount | client_type_name. See the metric description for more details. | Indicates the number of clients with specified type, connected to IXN.
Label descriptions:
|
101 | Workload |
ixn_ |
Counter | Amount | router_name. See the metric description for more details. | Indicates the total number of interactions that have been submitted to the router.
Label descriptions: router_name - the name of the router into which the interactions have been submitted. |
33 | Workload, Operability |
ixn_ |
Gauge | Amount | router_name, strategy_name, strategy_tenant. See the metric description for more details. | Indicates the number of strategies with specified name loaded into a specified router.
Label descriptions:
|
1 | Workload |
ixn_ |
Gauge | Amount | router_name. See the metric description for more details. | Indicates the current capacity of a specified router - the number of interactions, not including those already submitted, that can be submitted into the router.
Label descriptions: router_name - name of router. |
987 | Workload, Operability |
ixn_ |
Gauge | Amount | router_name. See the metric description for more details. | Indicates the number of interactions that are in a specified router.
Label descriptions: router_name - name of router. |
13 | Workload, Operability |
ixn_ |
Gauge | Amount | None | Indicates number of strategies which are associated with active submitters. | 11 | Workload |
ixn_ |
Gauge | Amount | router_name. See the metric description for more details. | Indicates the maximum capacity of specified router - the number of interactions, that can be submitted into the router.
Label descriptions:
ixn_health_info_router_max_submitted = ixn_health_info_router_currently_submitted + ixn_health_info_router_current_capacity |
1000 | Workload, Operability |
ixn_ |
Gauge | Amount | None | Indicates the current database requests queue length. | 0 | Workload, Operability |
ixn_ |
Counter | Amount | None | Indicates the number of processed database requests from IXN application start till current moment. | 75 | Workload, Operability |
ixn_ |
Gauge | Amount | None | Indicates the current number of DB connections. | 5 | Workload, Operability |
ixn_ |
Counter | Amount | None | Indicates the total number of database queries that end up with a deadlock for all the time since IXN started. | 0 | Workload, Operability |
ixn_ |
Gauge | Unix timestamp | router_name, strategy_name, strategy_tenant. See the metric description for more details. | Indicates the Unix timestamp when last interaction has been submitted to router for specified strategy.
Label descriptions:
|
1618322383 | Workload, Operability |
ixn_ |
Gauge | Amount | queue_name, queue_tenant, media_name. See the metric description for more details. | Indicates the current number of interactions with specified media type that are waiting for processing in a specified queue.
Label descriptions:
|
0 | Workload |
ixn_ |
Gauge | Amount | agent_tenant. See the metric description for more details. | Indicates the current number of logged in agents.
Label descriptions: agent_tenant - tenant number. |
565 | Workload |
ixn_ |
Gauge | Amount | queue_name, queue_tenant, media_name. See the metric description for more details. | Indicates the number of the interactions with specified media type from a specified queue being routed.
Label descriptions:
|
10 | Workload, Operability |
ixn_ |
Gauge | Amount | queue_name, queue_tenant, media_name. See the metric description for more details. | Indicates the number of the interactions with specified media type from specified queue being handled by agents.
Label descriptions:
|
5 | Workload, Operability |
ixn_ |
Gauge | Amount | queue_name, queue_tenant, media_name. See the metric description for more details. | Indicates the number of interactions with specified media type that are waiting processing in specified queue and were never delivered to agent.
Label descriptions:
|
2 | Workload, Operability |
ixn_ |
Gauge | Amount | queue_name, queue_tenant, media_name. See the metric description for more details. | Indicates the sum of the interactions with specified media type from specified queue being routed by routers and being handled by agents.
Label descriptions:
ixn_health_info_queue_media_in_processing = ixn_health_info_queue_media_in_router + ixn_health_info_queue_media_on_agent Note: This value is provided in Pulse as well. |
15 | Workload, Operability |
ixn_ |
Gauge | Amount | router_name, strategy_name, strategy_tenant. See the metric description for more details. | Indicates the number of interactions which are submitted to specified router by specified strategy at the moment.
Label descriptions:
|
3 | Workload, Operability |
ixn_ |
Gauge | Amount | router_name, strategy_name, strategy_tenant. See the metric description for more details. | Indicates the number of interactions that can be submitted more to specified router by specified strategy.
Label descriptions:
|
197 | Workload, Operability |
ixn_ |
Counter | Amount | router_name, strategy_name, strategy_tenant. See the metric description for more details. | Indicates the number of interactions that were submitted to specified router by specified strategy since IXN app start till now.
Label descriptions:
|
9 | Workload, Operability |
ixnnode_ |
Counter | Amount | strategy. See the metric description for more details. | Indicates the total number of the interactions pulled for the specific strategy.
Label descriptions: strategy - The name of the strategy for which interactions are pulled. |
Workload, Operability | |
ixnnode_ |
Gauge | Amount | None | Indicates the current number of the routing sessions in routing. | Workload | |
ixnnode_ |
Counter | Amount | None | Indicates the total number of instructions (of any type) received from ORS service. |
Workload, Operability | |
ixnnode_ |
Counter | Amount | strategy, type. See the metric description for more details. | Indicates the total number of received routing instructions.
Label descriptions:
|
Workload, Operability | |
ixnnode_ |
Gauge | Status | redis_client. See the metric description for more details. er". | Indicates the status of Redis client.
Label descriptions: redis_client - The Redis client instance for which the metric is present. It takes values "reader" and "writer". Value: 0 - Not Ready 1 - Ready |
Operability | |
ixnnode_ |
Counter | Amount | redis_client. See the metric description for more details. | Indicates the total number of errors occurred on Redis client.
Label descriptions: redis_client - The Redis client instance for which the metric is present. It takes values "reader" and "writer". |
Error | |
ixnnode_ |
Gauge | Status | redis_client, node. See the metric description for more details. | Indicates the status of connection to individual nodes of Redis server (in singleton mode matches to ixnnode_redis_client_status).
Label descriptions:
Value: 0 - Ready 1 - Not Ready 2 - Wait (so far there have been no connection attempts) |
Operability | |
ixnnode_ |
Counter | Amount | redis_client, node. See the metric description for more details. | Indicates the total number of errors occurred on individual nodes of Redis client (in singleton mode matches to ixnnode_redis_client_errors_total).
Label descriptions:
|
Error | |
ixnnode_ |
Counter | Amount | redis_client, command. See the metric description for more details. | Indicates the total number of successfully completed redis commands.
Label descriptions:
|
Workload, Operability | |
ixnnode_ |
Counter | Amount | redis_client, command. See the metric description for more details. | Indicates the total number of failed redis commands.
Label desriptions:
|
Error | |
ixnnode_ |
Gauge | Status | rq_node. See the metric description for more details. | Indicates the status of connection to RQ Service nodes.
Label descriptions: rq_node - RQ Service node for which the metric is present. |
Operability | |
ixnnode_ |
Counter | Amount | type. See the metric description for more details. | Indicates the total number of failed requests to RQ Service.
Label descriptions: type - The type of the failed requests. It takes values "isp_event" - interaction protocol evnts and "ixn_ping" - health check messages. |
Error | |
ixnnode_ |
Gauge | Amount | None | Indicates the maximum number of routing instructions that can be processed in parallel. | n/a | |
ixnnode_ |
Gauge | Amount | type. See the metric description for more details. | Indicates the number of instructions received from ORS currently being processed.
Label descriptions: type - The type of the instruction. It takes values "isp_request" - routing instruction and "ixn_ping" - reply to health check message. |
Workload, Operability | |
ixnnode_ |
Counter | Amount | strategy. See the metric description for more details. | Indicates the total number of RequestPull requests successfully completed by InteractionServer.
Label descriptions: strategy - The strategy for which interactions are pulled. |
Workload, Operability | |
ixnnode_ |
Counter | Amount | strategy. See the metric description for more details. | Indicates the total number of route requests successfully sent to ORS.
Label descriptions: strategy - The strategy to which requests are sent. |
Workload, Operability | |
ixnnode_ |
Counter | Amount | strategy. See the metric description for more details. | Indicates the total number of route requests failed to send to ORS.
Label descriptions: strategy - The strategy to which requests are sent. |
Error | |
ixnnode_ |
Gauge | Amount | None | Indicates the number of routing instructions currently being processed by IXN Server. | Workload | |
ixnnode_ |
Counter | Amount | reason, strategy. See the metric description for more details. | Indicates the total number of times an interaction was placed back in queue.
Label descriptions:
|
Workload, Error, ORS Error | |
ixnnode_ |
Gauge | Amount | None | Indicates the number of the strategies for which interactions currently are being pulled. | Operability | |
ixnnode_ |
Gauge | Amount | None | Indicates the number of the strategies read from configuration for which interactions should be pulled. | Operability | |
ixnnode_ |
Counter | Amount | None | Indicates the total number of error occurred while fetching configuration from Configuration Service. | Error | |
ixnnode_ |
Gauge | Timestamp | None | Indicates the last time the configuration was successfully fetched from Configuration Service as the number of seconds since January 1 1970 UTC. | Operability | |
wrtc_ |
Integer | Specifies the number of current registered DNs | 2 | Monitoring | ||
wrtc_ |
Integer | Specifies the number of current incoming calls | 2 | Monitoring | ||
wrtc_ |
Integer | Specifies the number of current outgoing calls | 5 | Monitoring | ||
wrtc_ |
Integer | Specifies the number of current audio calls | 5 | Monitoring | ||
wrtc_ |
Integer | Specifies the number of current video calls | 2 | Monitoring | ||
wrtc_ |
Integer | Specifies the number of current xcoding calls | 2 | Monitoring | ||
wrtc_ |
Integer | Specifies the maximum number of incoming calls | 50 | Monitoring | ||
wrtc_ |
Integer | Specifies the maximum number of outgoing calls | 50 | Monitoring | ||
wrtc_ |
Integer | Specifies the maximum number of audio calls | 50 | Monitoring | ||
wrtc_ |
Integer | Specifies the maximum number of video calls | 50 | Monitoring | ||
wrtc_ |
Integer | Specifies the maximum number of xcoding calls | 50 | Monitoring | ||
wrtc_ |
Counter | Specifies the total number of incoming calls | 100 | Monitoring | ||
wrtc_ |
Counter | Specifies the total number of outgoing calls | 100 | Monitoring | ||
wrtc_ |
Counter | Specifies the total number of audio calls | 100 | Monitoring | ||
wrtc_ |
Counter | Specifies the total number of video calls | 100 | Monitoring | ||
wrtc_ |
Counter | Specifies the total number of xcoding calls | 100 | Monitoring | ||
wrtc_ |
Counter | Specifies number of unauthorized access attempts | 20 | Monitoring | ||
wrtc_ |
Counter | Specifies the number of unknown requests received | 20 | Monitoring | ||
wrtc_ |
Counter | Specifies the number of registration requests that was received for registered DN | 20 | Monitoring | ||
wrtc_ |
Counter | Specifies the number of lost RTP packets | 20 | Monitoring | ||
wrtc_ |
Counter | Specifies the number of RTP receive errors | 2 | Monitoring | ||
wrtc_ |
Counter | {over="100"} | Audio quality monitoring metrics | Monitoring | ||
wrtc_ |
Counter | {over="300"} | Audio quality monitoring metrics | Monitoring | ||
wrtc_ |
Counter | {over="500"} | Audio quality monitoring metrics | Monitoring | ||
wrtc_ |
Counter | {over="100"} | Audio quality monitoring metrics | Monitoring | ||
wrtc_ |
Counter | {over="300"} | Audio quality monitoring metrics | Monitoring | ||
wrtc_ |
Counter | {over="500"} | Audio quality monitoring metrics | Monitoring | ||
wrtc_ |
Integer | {type="turn_errors"} | Specifies the number of failed ICE transactions | Error | ||
wrtc_ |
Integer | {type="sips", sip="<proxy address>"} | Specifies the number of registration transactions which were timed out | 2 | Error | |
wrtc_ |
Integer | {type="es"} | Specifies if WebRTC is able to connect to Elasticsearch server or not | 1 or 0 | Error | |
wrtc_ |
Counter | {type="es_errors"} | Specifies the number of error responses received from Elasticsearch server | 2 | Error | |
wrtc_ |
Integer | {type="auth"} | Specifies if WebRTC is able to connect to GAuth service or not | 1 or 0 | Error | |
wrtc_ |
Counter | {type="gauth_errors"} | Specifies the number of error responses received from GAuth server | 2 | Error | |
wrtc_ |
Integer | {type=”cfg”} | Specifies if WebRTC is able to connect to GWS Configuration service or not | 1 or 0 | Error | |
wrtc_ |
Counter | {type=”cfg_errors”} | Specifies the number of error responses received from GWS Configuration server | 2 | Error | |
wrtc_ |
Integer | {type=”env”} | Specifies if WebRTC is able to connect to GWS Environments service or not | 1 or 0 | Error |
+ many more...