Supported plugins and options for Amazon OpenSearch Ingestion pipelines
Amazon OpenSearch Ingestion supports a subset of sources, processors, and sinks compared to open source Data Prepper. In addition, there are some constraints that OpenSearch Ingestion places on the available options for each supported plugin. The following sections describe the plugins and associated options that OpenSearch Ingestion supports.
Note
OpenSearch Ingestion doesn't support any buffer plugins because it automatically configures a default buffer. You receive a validation error if you include a buffer in your pipeline configuration.
Topics
Supported plugins
OpenSearch Ingestion supports the following Data Prepper plugins:
Sources:
Processors:
-
Mutate event
(series of processors) -
Mutate string
(series of processors)
Sinks:
-
OpenSearch
(supports OpenSearch Service, OpenSearch Serverless, and Elasticsearch 6.8 or later)
Sink codecs:
Stateless versus stateful processors
Stateless processors perform operations like transformations and
filtering, while stateful processors perform operations like
aggregations, which remember the result of the previous run. OpenSearch Ingestion supports
the stateful processors Aggregate
For pipelines that contain only stateless processors, the maximum capacity limit is 96 Ingestion OCUs. If a pipeline contains any stateful processors, the maximum capacity limit is 48 Ingestion OCUs. However, if a pipeline has persistent buffering enabled, it can have a maximum of 384 Ingestion OCUs with only stateless processors, or 192 Ingestion OCUs if it contains any stateful processors. For more information, see Scaling pipelines.
End-to-end acknowledgment is only supported for stateless processors. For more information, see End-to-end acknowledgement.
Configuration requirements and constraints
Unless otherwise specified below, all options described in the Data Prepper configuration reference for the supported plugins listed above are allowed in OpenSearch Ingestion pipelines. The following sections explain the constraints that OpenSearch Ingestion places on certain plugin options.
Note
OpenSearch Ingestion doesn't support any buffer plugins because it automatically configures a default buffer. You receive a validation error if you include a buffer in your pipeline configuration.
Many options are configured and managed internally by OpenSearch Ingestion, such as
authentication
and acm_certificate_arn
. Other options,
such as thread_count
and request_timeout
, have performance
impacts if changed manually. Therefore, these values are set internally to ensure
optimal performance of your pipelines.
Lastly, some options can't be passed to OpenSearch Ingestion, such as
ism_policy_file
and sink_template
, because they're local
files when run in open source Data Prepper. These values aren't supported.
Topics
General pipeline options
The following general pipeline options
-
workers
-
delay
Grok processor
The following Grok
-
patterns_directories
-
patterns_files_glob
HTTP source
The HTTP
-
The
path
option is required. The path is a string such as/log/ingest
, which represents the URI path for log ingestion. This path defines the URI that you use to send data to the pipeline. For example,https://log-pipeline.us-west-2.osis.amazonaws.com
. The path must start with a slash (/), and can contain the special characters '-', '_', '.', and '/', as well as the/log/ingest
${pipelineName}
placeholder. -
The following HTTP source options are set by OpenSearch Ingestion and aren't supported in pipeline configurations:
-
port
-
ssl
-
ssl_key_file
-
ssl_certificate_file
-
aws_region
-
authentication
-
unauthenticated_health_check
-
use_acm_certificate_for_ssl
-
thread_count
-
request_timeout
-
max_connection_count
-
max_pending_requests
-
health_check_service
-
acm_private_key_password
-
acm_certificate_timeout_millis
-
acm_certificate_arn
-
OpenSearch sink
The OpenSearch
-
The
aws
option is required, and must contain the following options:-
sts_role_arn
-
region
-
hosts
-
serverless
(if the sink is an OpenSearch Serverless collection)
-
-
The
sts_role_arn
option must point to the same role for each sink within a YAML definition file. -
The
hosts
option must specify an OpenSearch Service domain endpoint or an OpenSearch Serverless collection endpoint. You can't specify a custom endpoint for a domain; it must be the standard endpoint. -
If the
hosts
option is a serverless collection endpoint, you must set theserverless
option totrue
. In addition, if your YAML definition file contains theindex_type
option, it must be set tomanagement_disabled
, otherwise validation fails. -
The following options aren't supported:
-
username
-
password
-
cert
-
proxy
-
dlq_file
- If you want to offload failed events to a dead letter queue (DLQ), you must use thedlq
option and specify an S3 bucket. -
ism_policy_file
-
socket_timeout
-
template_file
-
insecure
-
bulk_size
-
OTel metrics source, OTel trace source, and OTel logs source
The OTel metrics
-
The
path
option is required. The path is a string such as/log/ingest
, which represents the URI path for log ingestion. This path defines the URI that you use to send data to the pipeline. For example,https://log-pipeline.us-west-2.osis.amazonaws.com
. The path must start with a slash (/), and can contain the special characters '-', '_', '.', and '/', as well as the/log/ingest
${pipelineName}
placeholder. -
The following options are set by OpenSearch Ingestion and aren't supported in pipeline configurations:
-
port
-
ssl
-
sslKeyFile
-
sslKeyCertChainFile
-
authentication
-
unauthenticated_health_check
-
useAcmCertForSSL
-
unframed_requests
-
proto_reflection_service
-
thread_count
-
request_timeout
-
max_connection_count
-
acmPrivateKeyPassword
-
acmCertIssueTimeOutMillis
-
health_check_service
-
acmCertificateArn
-
awsRegion
-
OTel trace group processor
The OTel trace group
-
The
aws
option is required, and must contain the following options:-
sts_role_arn
-
region
-
hosts
-
-
The
sts_role_arn
option specify the same role as the pipeline role that you specify in the OpenSearch sink configuration. -
The
username
,password
,cert
, andinsecure
options aren't supported. -
The
aws_sigv4
option is required and must be set to true. -
The
serverless
option within the OpenSearch sink plugin isn't supported. The Otel trace group processor doesn't currently work with OpenSearch Serverless collections. -
The number of
otel_trace_group
processors within the pipeline configuration body can't exceed 8.
OTel trace processor
The OTel trace
-
The value of the
trace_flush_interval
option can't exceed 300 seconds.
Service-map processor
The Service-map
-
The value of the
window_duration
option can't exceed 300 seconds.
S3 source
The S3
-
The
aws
option is required, and must containregion
andsts_role_arn
options. -
The value of the
records_to_accumulate
option can't exceed 200. -
The value of the
maximum_messages
option can't exceed 10. -
If specified, the
disable_bucket_ownership_validation
option must be set to false. -
If specified, the
input_serialization
option must be set toparquet
.