Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backport 2.9] [DOC] Add ingest processors documentation #4941

Merged
merged 1 commit into from
Aug 29, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 100 additions & 0 deletions _api-reference/ingest-apis/create-ingest.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
---
layout: default
title: Create pipeline
parent: Ingest pipelines
grand_parent: Ingest APIs
nav_order: 10
redirect_from:
- /opensearch/rest-api/ingest-apis/create-update-ingest/
---

# Create pipeline

Use the create pipeline API operation to create or update pipelines in OpenSearch. Note that the pipeline requires you to define at least one processor that specifies how to change the documents.

## Path and HTTP method

Replace `<pipeline-id>` with your pipeline ID:

```json
PUT _ingest/pipeline/<pipeline-id>
```
#### Example request

Here is an example in JSON format that creates an ingest pipeline with two `set` processors and an `uppercase` processor. The first `set` processor sets the `grad_year` to `2023`, and the second `set` processor sets `graduated` to `true`. The `uppercase` processor converts the `name` field to uppercase.

```json
PUT _ingest/pipeline/my-pipeline
{
"description": "This pipeline processes student data",
"processors": [
{
"set": {
"description": "Sets the graduation year to 2023",
"field": "grad_year",
"value": 2023
}
},
{
"set": {
"description": "Sets graduated to true",
"field": "graduated",
"value": true
}
},
{
"uppercase": {
"field": "name"
}
}
]
}
```
{% include copy-curl.html %}

To learn more about error handling, see [Handling pipeline failures]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/pipeline-failures/).

## Request body fields

The following table lists the request body fields used to create or update a pipeline.

Parameter | Required | Type | Description
:--- | :--- | :--- | :---
`processors` | Required | Array of processor objects | An array of processors, each of which transforms documents. Processors are run sequentially in the order specified.
`description` | Optional | String | A description of your ingest pipeline.

## Path parameters

Parameter | Required | Type | Description
:--- | :--- | :--- | :---
`pipeline-id` | Required | String | The unique identifier, or pipeline ID, assigned to the ingest pipeline.

## Query parameters

Parameter | Required | Type | Description
:--- | :--- | :--- | :---
`cluster_manager_timeout` | Optional | Time | Period to wait for a connection to the cluster manager node. Defaults to 30 seconds.
`timeout` | Optional | Time | Period to wait for a response. Defaults to 30 seconds.

## Template snippets

Some processor parameters support [Mustache](https://mustache.github.io/) template snippets. To get the value of a field, surround the field name in three curly braces, for example, `{% raw %}{{{field-name}}}{% endraw %}`.

#### Example: `set` ingest processor using Mustache template snippet

The following example sets the field `{% raw %}{{{role}}}{% endraw %}` with a value `{% raw %}{{{tenure}}}{% endraw %}`:

```json
PUT _ingest/pipeline/my-pipeline
{
"processors": [
{
"set": {
"field": "{% raw %}{{{role}}}{% endraw %}",
"value": "{% raw %}{{{tenure}}}{% endraw %}"
}
}
]
}
```
{% include copy-curl.html %}
79 changes: 0 additions & 79 deletions _api-reference/ingest-apis/create-update-ingest.md

This file was deleted.

43 changes: 13 additions & 30 deletions _api-reference/ingest-apis/delete-ingest.md
Original file line number Diff line number Diff line change
@@ -1,44 +1,27 @@
---
layout: default
title: Delete a pipeline
parent: Ingest APIs
nav_order: 14
title: Delete pipeline
parent: Ingest pipelines
grand_parent: Ingest APIs
nav_order: 13
redirect_from:
- /opensearch/rest-api/ingest-apis/delete-ingest/
---

# Delete a pipeline
# Delete pipeline

If you no longer want to use an ingest pipeline, use the delete ingest pipeline API operation.
Use the following request to delete a pipeline.

## Example
To delete a specific pipeline, pass the pipeline ID as a parameter:

```
DELETE _ingest/pipeline/12345
```json
DELETE /_ingest/pipeline/<pipeline-id>
```
{% include copy-curl.html %}

## Path and HTTP methods

Delete an ingest pipeline based on that pipeline's ID.

```
DELETE _ingest/pipeline/
```

## URL parameters

All URL parameters are optional.

Parameter | Type | Description
:--- | :--- | :---
master_timeout | time | How long to wait for a connection to the master node.
timeout | time | How long to wait for the request to return.

## Response
To delete all pipelines in a cluster, use the wildcard character (`*`):

```json
{
"acknowledged" : true
}
```
DELETE /_ingest/pipeline/*
```
{% include copy-curl.html %}
71 changes: 37 additions & 34 deletions _api-reference/ingest-apis/get-ingest.md
Original file line number Diff line number Diff line change
@@ -1,59 +1,62 @@
---
layout: default
title: Get ingest pipeline
parent: Ingest APIs
nav_order: 10
title: Get pipeline
parent: Ingest pipelines
grand_parent: Ingest APIs
nav_order: 12
redirect_from:
- /opensearch/rest-api/ingest-apis/get-ingest/
---

## Get ingest pipeline
# Get pipeline

After you create a pipeline, use the get ingest pipeline API operation to return all the information about a specific ingest pipeline.
Use the get ingest pipeline API operation to retrieve all the information about the pipeline.

## Example
## Retrieving information about all pipelines

```
GET _ingest/pipeline/12345
The following example request returns information about all ingest pipelines:

```json
GET _ingest/pipeline/
```
{% include copy-curl.html %}

## Path and HTTP methods
## Retrieving information about a specific pipeline

Return all ingest pipelines.
The following example request returns information about a specific pipeline, which for this example is `my-pipeline`:

```json
GET _ingest/pipeline/my-pipeline
```
GET _ingest/pipeline
```

Returns a single ingest pipeline based on the pipeline's ID.

```
GET _ingest/pipeline/{id}
```

## URL parameters

All parameters are optional.

Parameter | Type | Description
:--- | :--- | :---
master_timeout | time | How long to wait for a connection to the master node.
{% include copy-curl.html %}

## Response
The response contains the pipeline information:

```json
{
"pipeline-id" : {
"description" : "A description for your pipeline",
"processors" : [
"my-pipeline": {
"description": "This pipeline processes student data",
"processors": [
{
"set" : {
"field" : "field-name",
"value" : "value"
"set": {
"description": "Sets the graduation year to 2023",
"field": "grad_year",
"value": 2023
}
},
{
"set": {
"description": "Sets graduated to true",
"field": "graduated",
"value": true
}
},
{
"uppercase": {
"field": "name"
}
}
]
}
}
```
```
11 changes: 9 additions & 2 deletions _api-reference/ingest-apis/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,13 @@ redirect_from:

# Ingest APIs

Before you index your data, OpenSearch's ingest APIs help transform your data by creating and managing ingest pipelines. Pipelines consist of **processors**, customizable tasks that run in the order they appear in the request body. The transformed data appears in your index after each of the processor completes.
Ingest APIs are a valuable tool for loading data into a system. Ingest APIs work together with [ingest pipelines]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/) and [ingest processors]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-processors/) to process or transform data from a variety of sources and in a variety of formats.

Ingest pipelines in OpenSearch can only be managed using ingest API operations. When using ingest in production environments, your cluster should contain at least one node with the node roles permission set to `ingest`. For more information on setting up node roles within a cluster, see [Cluster Formation]({{site.url}}{{site.baseurl}}/opensearch/cluster/).
## Ingest pipeline APIs

Simplify, secure, and scale your OpenSearch data ingestion with the following APIs:

- [Create pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/create-ingest/): Use this API to create or update a pipeline configuration.
- [Get pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/get-ingest/): Use this API to retrieve a pipeline configuration.
- [Simulate pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/simulate-ingest/): Use this pipeline to test a pipeline configuration.
- [Delete pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/delete-ingest/): Use this API to delete a pipeline configuration.
Loading