Monitoring pipeline metrics
You can monitor Amazon OpenSearch Ingestion pipelines using Amazon CloudWatch, which collects raw data and processes it into readable, near real-time metrics. These statistics are kept for 15 months, so that you can access historical information and gain a better perspective on how your web application or service is performing. You can also set alarms that watch for certain thresholds, and send notifications or take actions when those thresholds are met. For more information, see the Amazon CloudWatch User Guide.
The OpenSearch Ingestion console displays a series of charts based on the raw data from CloudWatch on the Performance tab for each pipeline.
OpenSearch Ingestion reports metrics from most supported
plugins. If certain plugins don't have their own table below, it means that
they don't report any plugin-specific metrics. Pipeline metrics are published in the
AWS/OSIS
namespace.
Topics
Common metrics
The following metrics are common to all processors and sinks.
Each metric is prefixed by the sub-pipeline name and plugin name, in the format
<sub_pipeline_name
><plugin
><metric_name
>.
For example, the full name of the recordsIn.count
metric for a
sub-pipeline named my-pipeline
and the datemy-pipeline.date.recordsIn.count
.
Metric suffix | Description |
---|---|
recordsIn.count |
The ingress of records to a pipeline component. This metric applies to processors and sinks. Relevant statistics: Sum Dimension:
|
recordsOut.count |
The egress of records from a pipeline component. This metric applies to processors and sources. Relevant statistics: Sum Dimension:
|
timeElapsed.count |
A count of data points recorded during execution of a pipeline component. This metric applies to processors and sinks. Relevant statistics: Sum Dimension:
|
timeElapsed.sum |
The total time elapsed during execution of a pipeline component. This metric applies to processors and sinks, in milliseconds. Relevant statistics: Sum Dimension:
|
timeElapsed.max |
The maximum time elapsed during execution of a pipeline component. This metric applies to processors and sinks, in milliseconds. Relevant statistics: Max Dimension:
|
Buffer metrics
The following metrics apply to the default Bounded blocking
Each metric is prefixed by the sub-pipeline name and buffer name, in the format
<sub_pipeline_name
><buffer_name
><metric_name
>.
For example, the full name of the recordsWritten.count
metric for a
sub-pipeline named my-pipeline
would be
my-pipeline.BlockingBuffer.recordsWritten.count
.
Metric suffix | Description |
---|---|
recordsWritten.count |
The number of records written to a buffer. Relevant statistics: Sum Dimension:
|
recordsRead.count |
The number of records read from a buffer. Relevant statistics: Sum Dimension:
|
recordsInFlight.value |
The number of unchecked records read from a buffer. Relevant statistics: Average Dimension:
|
recordsInBuffer.value |
The number of records currently in a buffer. Relevant statistics: Average Dimension:
|
recordsProcessed.count |
The number of records read from a buffer and processed by a pipeline. Relevant statistics: Sum Dimension:
|
recordsWriteFailed.count |
The number of records that the pipeline failed to write to the sink. Relevant statistics: Sum Dimension:
|
writeTimeElapsed.count |
A count of data points recorded while writing to a buffer. Relevant statistics: Sum Dimension:
|
writeTimeElapsed.sum |
The total time elapsed while writing to a buffer, in milliseconds. Relevant statistics: Sum Dimension:
|
writeTimeElapsed.max |
The maximum time elapsed while writing to a buffer, in milliseconds. Relevant statistics: Max Dimension:
|
writeTimeouts.count |
The count of write timeouts to a buffer. Relevant statistics: Sum Dimension:
|
readTimeElapsed.count |
A count of data points recorded while reading from a buffer. Relevant statistics: Sum Dimension:
|
readTimeElapsed.sum |
The total time elapsed while reading from a buffer, in milliseconds. Relevant statistics: Sum Dimension:
|
readTimeElapsed.max |
The maximum time elapsed while reading from a buffer, in milliseconds. Relevant statistics: Max Dimension:
|
checkpointTimeElapsed.count |
A count of data points recorded while checkpointing. Relevant statistics: Sum Dimension:
|
checkpointTimeElapsed.sum |
The total time elapsed while checkpointing, in milliseconds. Relevant statistics: Sum Dimension:
|
checkpointTimeElapsed.max |
The maximum time elapsed while checkpointing, in milliseconds. Relevant statistics: Max Dimension:
|
Signature V4 metrics
The following metrics apply to the ingestion endpoint for a pipeline and are
associate with the source plugins (http
, otel_trace
, and
otel_metrics
). All requests to the ingestion endpoint must be
signed using Signature Version 4. These metrics can help you identify authorization
issues when connecting to your pipeline, or confirm that you're successfully
authenticating.
Each metric is prefixed by the sub-pipeline name and osis_sigv4_auth
.
For example,
.sub_pipeline_name
.osis_sigv4_auth.httpAuthSuccess.count
Metric suffix | Description |
---|---|
httpAuthSuccess.count |
The number of successful Signature V4 requests to the pipeline. Relevant statistics: Sum Dimension:
|
httpAuthFailure.count |
The number of failed Signature V4 requests to the pipeline. Relevant statistics: Sum Dimension:
|
httpAuthServerError.count |
The number of Signature V4 requests to the pipeline that returned server errors. Relevant statistics: Sum Dimension:
|
Bounded blocking buffer metrics
The following metrics apply to the bounded blockingBlockingBuffer
. For example,
.sub_pipeline_name
.BlockingBuffer.bufferUsage.value
Metric suffix | Description |
---|---|
bufferUsage.value |
Percent usage of the Relevant statistics: Average Dimension:
|
Otel trace source metrics
The following metrics apply to the OTel traceotel_trace_source
. For example,
.sub_pipeline_name
.otel_trace_source.requestTimeouts.count
Metric suffix | Description |
---|---|
requestTimeouts.count |
The number of requests that timed out. Relevant statistics: Sum Dimension:
|
requestsReceived.count |
The number of requests received by the plugin. Relevant statistics: Sum Dimension:
|
successRequests.count |
The number of requests that were successfully processed by the plugin. Relevant statistics: Sum Dimension:
|
badRequests.count |
The number of requests with an invalid format that were processed by the plugin. Relevant statistics: Sum Dimension:
|
requestsTooLarge.count |
The number of requests of which the number of spans in the content is larger than the buffer capacity. Relevant statistics: Sum Dimension:
|
internalServerError.count |
The number of requests processed by the plugin with a custom exception type. Relevant statistics: Sum Dimension:
|
requestProcessDuration.count |
A count of data points recorded while processing requests by the plugin. Relevant statistics: Sum Dimension:
|
requestProcessDuration.sum |
The total latency of requests processed by the plugin, in milliseconds. Relevant statistics: Sum Dimension:
|
requestProcessDuration.max |
The maximum latency of requests processed by the plugin, in milliseconds. Relevant statistics: Max Dimension:
|
payloadSize.count |
A count of the distribution of payload sizes of incoming requests, in bytes. Relevant statistics: Sum Dimension:
|
payloadSize.sum |
The total distribution of the payload sizes of incoming requests, in bytes. Relevant statistics: Sum Dimension:
|
payloadSize.max |
The maximum distribution of payload sizes of incoming requests, in bytes. Relevant statistics: Max Dimension:
|
Otel metrics source metrics
The following metrics apply to the OTel metricsotel_metrics_source
. For example,
.sub_pipeline_name
.otel_metrics_source.requestTimeouts.count
Metric suffix | Description |
---|---|
requestTimeouts.count |
The total number of requests to the plugin that time out. Relevant statistics: Sum Dimension:
|
requestsReceived.count |
The total number of requests received by the plugin. Relevant statistics: Sum Dimension:
|
successRequests.count |
The number of requests successfully processed (200 response status code) by the plugin. Relevant statistics: Sum Dimension:
|
requestProcessDuration.count |
A count of the latency of requests processed by the plugin, in seconds. Relevant statistics: Sum Dimension:
|
requestProcessDuration.sum |
The total latency of requests processed by the plugin, in milliseconds. Relevant statistics: Sum Dimension:
|
requestProcessDuration.max |
The maximum latency of requests processed by the plugin, in milliseconds. Relevant statistics: Max Dimension:
|
payloadSize.count |
A count of the distribution of payload sizes of incoming requests, in bytes. Relevant statistics: Sum Dimension:
|
payloadSize.sum |
The total distribution of the payload sizes of incoming requests, in bytes. Relevant statistics: Sum Dimension:
|
payloadSize.max |
The maximum distribution of payload sizes of incoming requests, in bytes. Relevant statistics: Max Dimension:
|
Http metrics
The following metrics apply to the HTTPhttp
. For example,
.sub_pipeline_name
.http.requestsReceived.count
Metric suffix | Description |
---|---|
requestsReceived.count |
The number of requests received by the
Relevant statistics: Sum Dimension:
|
requestsRejected.count |
The number of requests rejected (429 response status code) by the plugin. Relevant statistics: Sum Dimension:
|
successRequests.count |
The number of requests successfully processed (200 response status code) by the plugin. Relevant statistics: Sum Dimension:
|
badRequests.count |
The number of requests with invalid content type or format (400 response status code) processed by the plugin. Relevant statistics: Sum Dimension:
|
requestTimeouts.count |
The number of requests that time out in the HTTP source server (415 response status code). Relevant statistics: Sum Dimension:
|
requestsTooLarge.count |
The number of requests of which the events size in the content is larger than the buffer capacity (413 response status code). Relevant statistics: Sum Dimension:
|
internalServerError.count |
The number of requests processed by the plugin with a custom exception type (500 response status code). Relevant statistics: Sum Dimension:
|
requestProcessDuration.count |
A count of the latency of requests processed by the plugin, in seconds. Relevant statistics: Sum Dimension:
|
requestProcessDuration.sum |
The total latency of requests processed by the plugin, in milliseconds. Relevant statistics: Sum Dimension:
|
requestProcessDuration.max |
The maximum latency of requests processed by the plugin, in milliseconds. Relevant statistics: Max Dimension:
|
payloadSize.count |
A count of the distribution of payload sizes of incoming requests, in bytes. Relevant statistics: Sum Dimension:
|
payloadSize.sum |
The total distribution of the payload sizes of incoming requests, in bytes. Relevant statistics: Sum Dimension:
|
payloadSize.max |
The maximum distribution of payload sizes of incoming requests, in bytes. Relevant statistics: Max Dimension:
|
S3 metrics
The following metrics apply to the S3s3
. For example,
.sub_pipeline_name
.s3.s3ObjectsFailed.count
Metric suffix | Description |
---|---|
s3ObjectsFailed.count |
The total number of S3 objects that the plugin failed to read. Relevant statistics: Sum Dimension:
|
s3ObjectsNotFound.count |
The number of S3 objects that the plugin failed to read due to
a Relevant statistics: Sum Dimension:
|
s3ObjectsAccessDenied.count |
The number of S3 objects that the plugin failed to read due to
an Relevant statistics: Sum Dimension:
|
s3ObjectReadTimeElapsed.count |
The amount of time the plugin takes to perform a GET request for an S3 object, parse it, and write events to the buffer. Relevant statistics: Sum Dimension:
|
s3ObjectReadTimeElapsed.sum |
The total amount of time that the plugin takes to perform a GET request for an S3 object, parse it, and write events to the buffer, in milliseconds. Relevant statistics: Sum Dimension:
|
s3ObjectReadTimeElapsed.max |
The maximum amount of time that the plugin takes to perform a GET request for an S3 object, parse it, and write events to the buffer, in milliseconds. Relevant statistics: Max Dimension:
|
s3ObjectSizeBytes.count |
The count of the distribution of S3 object sizes, in bytes. Relevant statistics: Sum Dimension:
|
s3ObjectSizeBytes.sum |
The total distribution of S3 object sizes, in bytes. Relevant statistics: Sum Dimension:
|
s3ObjectSizeBytes.max |
The maximum distribution of S3 object sizes, in bytes. Relevant statistics: Max Dimension:
|
s3ObjectProcessedBytes.count |
The count of the distribution of S3 objects processed by the plugin, in bytes. Relevant statistics: Sum Dimension:
|
s3ObjectProcessedBytes.sum |
The total distribution of S3 objects processed by the plugin, in bytes. Relevant statistics: Sum Dimension:
|
s3ObjectProcessedBytes.max |
The maximum distribution of S3 objects processed by the plugin, in bytes. Relevant statistics: Max Dimension:
|
s3ObjectsEvents.count |
The count of the distribution of S3 events received by the plugin. Relevant statistics: Sum Dimension:
|
s3ObjectsEvents.sum |
The total distribution of S3 events received by the plugin. Relevant statistics: Sum Dimension:
|
s3ObjectsEvents.max |
The maximum distribution of S3 events received by the plugin. Relevant statistics: Max Dimension:
|
sqsMessageDelay.count |
A count of data points recorded while S3 records an event time for the creation of an object to when it's fully parsed. Relevant statistics: Sum Dimension:
|
sqsMessageDelay.sum |
The total amount of time between when S3 records an event time for the creation of an object to when it's fully parsed, in milliseconds. Relevant statistics: Sum Dimension:
|
sqsMessageDelay.max |
The maximum amount of time between when S3 records an event time for the creation of an object to when it's fully parsed, in milliseconds. Relevant statistics: Max Dimension:
|
s3ObjectsSucceeded.count |
The number of S3 objects that the plugin successfully read. Relevant statistics: Sum Dimension:
|
sqsMessagesReceived.count |
The number of Amazon SQS messages received from the queue by the plugin. Relevant statistics: Sum Dimension:
|
sqsMessagesDeleted.count |
The number of Amazon SQS messages deleted from the queue by the plugin. Relevant statistics: Sum Dimension:
|
sqsMessagesFailed.count |
The number of Amazon SQS messages that the plugin failed to parse. Relevant statistics: Sum Dimension:
|
Aggregate metrics
The following metrics apply to the Aggregateaggregate
. For example,
.sub_pipeline_name
.aggregate.actionHandleEventsOut.count
Metric suffix | Description |
---|---|
actionHandleEventsOut.count |
The number of events that have been returned from the
Relevant statistics: Sum Dimension:
|
actionHandleEventsDropped.count |
The number of events that have been returned from the
Relevant statistics: Sum Dimension:
|
actionHandleEventsProcessingErrors.count |
The number of calls made to Relevant statistics: Sum Dimension:
|
actionConcludeGroupEventsOut.count |
The number of events that have been returned from the
Relevant statistics: Sum Dimension:
|
actionConcludeGroupEventsDropped.count |
The number of events that have not been returned from the
Relevant statistics: Sum Dimension:
|
actionConcludeGroupEventsProcessingErrors.count |
The number of calls made to Relevant statistics: Sum Dimension:
|
currentAggregateGroups.value |
The current number of groups. This gauge decreases when groups are concluded, and increases when an event initiates the creation of a new group. Relevant statistics: Average Dimension:
|
Date metrics
The following metrics apply to the Datedate
. For example,
.sub_pipeline_name
.date.dateProcessingMatchSuccess.count
Metric suffix | Description |
---|---|
dateProcessingMatchSuccess.count |
The number of records that match at least one of the patterns
specified in the Relevant statistics: Sum Dimension:
|
dateProcessingMatchFailure.count |
The number of records that didn't match any of the patterns
specified in the Relevant statistics: Sum Dimension:
|
Grok metrics
The following metrics apply to the Grokgrok
. For example,
.sub_pipeline_name
.grok.grokProcessingMatch.count
Metric suffix | Description |
---|---|
grokProcessingMatch.count |
The number of records that found at least one pattern match
from the Relevant statistics: Sum Dimension:
|
grokProcessingMismatch.count |
The number of records that didn't match any of the patterns
specified in the Relevant statistics: Sum Dimension:
|
grokProcessingErrors.count |
The number of record processing errors. Relevant statistics: Sum Dimension:
|
grokProcessingTimeouts.count |
The number of records that timed out while matching. Relevant statistics: Sum Dimension:
|
grokProcessingTime.count |
A count of data points recorded while an individual record
matched against patterns from the Relevant statistics: Sum Dimension:
|
grokProcessingTime.sum |
The total amount of time that each individual record takes to
match against patterns from the Relevant statistics: Sum Dimension:
|
grokProcessingTime.max |
The maximum amount of time that each individual record takes
to match against patterns from the Relevant statistics: Max Dimension:
|
Otel trace raw metrics
The following metrics apply to the OTel trace rawotel_trace_raw
. For example,
.sub_pipeline_name
.otel_trace_raw.traceGroupCacheCount.value
Metric suffix | Description |
---|---|
traceGroupCacheCount.value |
The number of trace groups in the trace group cache. Relevant statistics: Sum Dimension:
|
spanSetCount.value |
The number of span sets in the span set collection. Relevant statistics: Sum Dimension:
|
Otel trace group metrics
The following metrics apply to the OTel trace groupotel_trace_group
. For example,
.sub_pipeline_name
.otel_trace_group.recordsInMissingTraceGroup.count
Metric suffix | Description |
---|---|
recordsInMissingTraceGroup.count |
The number of ingress records missing trace group fields. Relevant statistics: Sum Dimension:
|
recordsOutFixedTraceGroup.count |
The number of egress records with trace group fields that were filled successfully. Relevant statistics: Sum Dimension:
|
recordsOutMissingTraceGroup.count |
The number of egress records missing trace group fields. Relevant statistics: Sum Dimension:
|
Service map stateful metrics
The following metrics apply to the Service-map statefulservice-map-stateful
. For example,
.sub_pipeline_name
.service-map-stateful.spansDbSize.count
Metric suffix | Description |
---|---|
spansDbSize.value |
The in-memory byte sizes of spans in MapDB across the current and previous window durations. Relevant statistics: Average Dimension:
|
traceGroupDbSize.value |
The in-memory byte sizes of trace groups in MapDB across the current and previous window durations. Relevant statistics: Average Dimension:
|
spansDbCount.value |
The count of spans in MapDB across the current and previous window durations. Relevant statistics: Sum Dimension:
|
traceGroupDbCount.value |
The count of trace groups in MapDB across the current and previous window durations. Relevant statistics: Sum Dimension:
|
relationshipCount.value |
The count of relationships stored across the current and previous window durations. Relevant statistics: Sum Dimension:
|
OpenSearch metrics
The following metrics apply to the OpenSearchopensearch
. For example,
.sub_pipeline_name
.opensearch.bulkRequestErrors.count
Metric suffix | Description |
---|---|
bulkRequestErrors.count |
The total number of errors encountered while sending bulk requests. Relevant statistics: Sum Dimension:
|
documentsSuccess.count |
The number of documents successfully sent to the OpenSearch Service by bulk request, including retries. Relevant statistics: Sum Dimension:
|
documentsSuccessFirstAttempt.count |
The number of documents successfully sent to OpenSearch Service by bulk request on the first attempt. Relevant statistics: Sum Dimension:
|
documentErrors.count |
The number of documents that failed to be sent by bulk requests. Relevant statistics: Sum Dimension:
|
bulkRequestFailed.count |
The number of bulk requests that failed. Relevant statistics: Sum Dimension:
|
bulkRequestNumberOfRetries.count |
The number of retries of failed bulk requests. Relevant statistics: Sum Dimension:
|
bulkBadRequestErrors.count |
The number of Relevant statistics: Sum Dimension:
|
bulkRequestNotAllowedErrors.count |
The number of Relevant statistics: Sum Dimension:
|
bulkRequestInvalidInputErrors.count |
The number of Relevant statistics: Sum Dimension:
|
bulkRequestNotFoundErrors.count |
The number of Relevant statistics: Sum Dimension:
|
bulkRequestTimeoutErrors.count |
The number of Relevant statistics: Sum Dimension:
|
bulkRequestServerErrors.count |
The number of Relevant statistics: Sum Dimension:
|
bulkRequestSizeBytes.count |
A count of the distribution of payload sizes of bulk requests, in bytes. Relevant statistics: Sum Dimension:
|
bulkRequestSizeBytes.sum |
The total distribution of payload sizes of bulk requests, in bytes. Relevant statistics: Sum Dimension:
|
bulkRequestSizeBytes.max |
The maximum distribution of payload sizes of bulk requests, in bytes. Relevant statistics: Max Dimension:
|
bulkRequestLatency.count |
A count of data points recorded while requests are sent to the plugin, including retries. Relevant statistics: Sum Dimension:
|
bulkRequestLatency.sum |
The total latency of requests sent to the plugin, including retries, in milliseconds. Relevant statistics: Sum Dimension:
|
bulkRequestLatency.max |
The maximum latency of requests sent to the plugin, including retries, in milliseconds. Relevant statistics: Max Dimension:
|
s3.dlqS3RecordsSuccess.count |
The number of records successfully sent to the S3 dead letter queue. Relevant statistics: Sum Dimension:
|
s3.dlqS3RecordsFailed.count |
The number of recourds that failed to be sent to the S3 dead letter queue. Relevant statistics: Sum Dimension:
|
s3.dlqS3RequestSuccess.count |
The number of successful requests to the S3 dead letter queue. Relevant statistics: Sum Dimension:
|
s3.dlqS3RequestFailed.count |
The number of failed requests to the S3 dead letter queue. Relevant statistics: Sum Dimension:
|
s3.dlqS3RequestLatency.count |
A count of data points recorded while requests are sent to the S3 dead letter queue, including retries. Relevant statistics: Sum Dimension:
|
s3.dlqS3RequestLatency.sum |
The total latency of requests sent to the S3 dead letter queue, including retries, in milliseconds. Relevant statistics: Sum Dimension:
|
s3.dlqS3RequestLatency.max |
The maximum latency of requests sent to the S3 dead letter queue, including retries, in milliseconds. Relevant statistics: Max Dimension:
|
s3.dlqS3RequestSizeBytes.count |
A count of the distribution of payload sizes of requests to the S3 dead letter queue, in bytes. Relevant statistics: Sum Dimension:
|
s3.dlqS3RequestSizeBytes.sum |
The total distribution of payload sizes of requests to the S3 dead letter queue, in bytes. Relevant statistics: Sum Dimension:
|
s3.dlqS3RequestSizeBytes.max |
The maximum distribution of payload sizes of requests to the S3 dead letter queue, in bytes. Relevant statistics: Max Dimension:
|
System and metering metrics
The following metrics apply to the overall OpenSearch Ingestion system. These metrics aren't prefixed by anything.
Metric | Description |
---|---|
system.cpu.usage.value |
The percentage of available CPU usage for all data nodes. Relevant statistics: Average Dimension:
|
system.cpu.count.value |
The total amount of CPU usage for all data nodes. Relevant statistics: Average Dimension:
|
jvm.memory.max.value |
The maximum amount of memory that can be used for memory management, in bytes. Relevant statistics: Average Dimension:
|
jvm.memory.used.value |
The total amount of memory used, in bytes. Relevant statistics: Average Dimension:
|
jvm.memory.committed.value |
The amount of memory that is committed for use by the Java virtual machine (JVM), in bytes. Relevant statistics: Average Dimension:
|
computeUnits |
The number of Ingestion OpenSearch Compute Units (Ingestion OCUs) in use by a pipeline. Relevant statistics: Max, Sum, Average Dimension:
|