Key concepts
As you get started with OpenSearch Ingestion, you can benefit from understanding the following concepts:
- Pipeline
-
From an OpenSearch Ingestion perspective, a pipeline refers to a single provisioned data collector that you create within OpenSearch Service. You can think of it as the entire YAML configuration file, which includes one or more sub-pipelines. For steps to create an ingestion pipeline, see Creating pipelines.
- Sub-pipeline
-
You define sub-pipelines within a YAML configuration file. Each sub-pipeline is a combination of a source, a buffer, zero or more processors, and one or more sinks. You can define multiple sub-pipelines in a single YAML file, each with unique sources, processors, and sinks. To aid in monitoring with CloudWatch and other services, we recommend that you specify a pipeline name that's distinct from all of its sub-pipelines.
You can string multiple sub-pipelines together within a single YAML file, so that the source for one sub-pipeline is another sub-pipeline, and its sink is a third sub-pipeline. For an example, see Using an OpenSearch Ingestion pipeline with OpenTelemetry Collector.
- Source
-
The input component of a sub-pipeline. It defines the mechanism through which a pipeline consumes records. The source can consume events either by receiving them over HTTPS, or by reading from external endpoints such as Amazon S3. There are two types of sources: push-based and pull-based. Push-based sources, such as HTTP
and OTel logs , stream records to ingestion endpoints. Pull-based sources, such as OTel trace and S3 , pull data from the source. - Processors
-
Intermediate processing units that can filter, transform, and enrich records into a desired format before publishing them to the sink. The processor is an optional component of a pipeline. If you don't define a processor, records are published in the format defined in the source. You can have more than one processor. A pipeline runs processors in the order that you define them.
- Sink
-
The output component of a sub-pipeline. It defines one or more destinations that a sub-pipeline publishes records to. OpenSearch Ingestion supports OpenSearch Service domains as sinks. It also supports sub-pipelines as sinks. This means that you can string together multiple sub-pipelines within a single OpenSearch Ingestion pipeline (YAML file). Self-managed OpenSearch clusters aren't supported as sinks.
- Buffer
-
The part of a processor that acts as the layer between the source and the sink. You can't manually configure a buffer within your pipeline. OpenSearch Ingestion uses a default buffer configuration.
- Route
-
The part of a processor that allows pipeline authors to only send events that match certain conditions to different sinks.
A valid sub-pipeline definition must contain a source and a sink. For more information about each of these pipeline elements, see the configuration reference.