Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC]Add new dot expander processor doc #5631

Merged
merged 37 commits into from
Jan 30, 2024
Merged
Changes from 6 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
1febfc9
Add new dot expander processor doc
vagimeli Nov 18, 2023
81eb142
Merge branch 'main' into dot-expander.md
vagimeli Nov 28, 2023
da2d4ad
Draft content for tech review
vagimeli Nov 28, 2023
e9b196f
Merge branch 'main' into dot-expander.md
vagimeli Nov 28, 2023
793b2ba
Merge branch 'main' into dot-expander.md
vagimeli Dec 1, 2023
7a7f886
Merge branch 'main' into dot-expander.md
vagimeli Dec 12, 2023
4406af3
Merge branch 'main' into dot-expander.md
vagimeli Dec 21, 2023
3e977ff
Address tech review feedback
vagimeli Dec 21, 2023
8184c9e
Merge branch 'main' into dot-expander.md
vagimeli Dec 21, 2023
b3f912a
Update _ingest-pipelines/processors/dot-expander.md
vagimeli Dec 21, 2023
5c85986
Update _ingest-pipelines/processors/dot-expander.md
vagimeli Dec 21, 2023
64b1001
Update _ingest-pipelines/processors/dot-expander.md
vagimeli Dec 21, 2023
96e0306
Update _ingest-pipelines/processors/dot-expander.md
vagimeli Dec 21, 2023
c5e32eb
Address doc review feedback
vagimeli Dec 22, 2023
d5042e1
Edit line 227
vagimeli Dec 22, 2023
38c639b
Edit line 227
vagimeli Dec 22, 2023
15006d6
Edit line 227
vagimeli Dec 22, 2023
041423b
Address doc review comments
vagimeli Jan 4, 2024
340f72d
Update _ingest-pipelines/processors/dot-expander.md
vagimeli Jan 4, 2024
7319647
Merge branch 'main' into dot-expander.md
vagimeli Jan 4, 2024
f87d41c
Merge branch 'main' into dot-expander.md
vagimeli Jan 10, 2024
c003b67
Added path parameter and field name conflicts sections
kolchfa-aws Jan 17, 2024
ec4477c
Update _ingest-pipelines/processors/dot-expander.md
vagimeli Jan 18, 2024
dd9118a
Update _ingest-pipelines/processors/dot-expander.md
vagimeli Jan 18, 2024
586f704
Update _ingest-pipelines/processors/dot-expander.md
vagimeli Jan 18, 2024
29f3fb9
Update _ingest-pipelines/processors/dot-expander.md
vagimeli Jan 18, 2024
026af45
Update _ingest-pipelines/processors/dot-expander.md
vagimeli Jan 18, 2024
41dda97
Update _ingest-pipelines/processors/dot-expander.md
vagimeli Jan 18, 2024
feae233
Update _ingest-pipelines/processors/dot-expander.md
vagimeli Jan 18, 2024
53d0d0a
Update _ingest-pipelines/processors/dot-expander.md
vagimeli Jan 18, 2024
3dde50b
Update _ingest-pipelines/processors/dot-expander.md
vagimeli Jan 18, 2024
6a01fba
Update _ingest-pipelines/processors/dot-expander.md
vagimeli Jan 18, 2024
0d5c299
Address editorial review feedback
vagimeli Jan 18, 2024
b201c63
Merge branch 'main' into dot-expander.md
vagimeli Jan 18, 2024
fbb4146
Merge branch 'main' into dot-expander.md
vagimeli Jan 18, 2024
01d81b6
Merge branch 'main' into dot-expander.md
vagimeli Jan 30, 2024
6729218
Update dot-expander.md
vagimeli Jan 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
315 changes: 315 additions & 0 deletions _ingest-pipelines/processors/dot-expander.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,315 @@
---
layout: default
title: Dot expander
parent: Ingest processors
nav_order: 65
---

# Dot expander

The `dot_expander` processor transforms fields containing dots into object fields, making them accessible to other processors in the pipeline. Without this transformation, fields with dots cannot be processed.

The following is the syntax for the `date_index_name` processor:
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

```json
{
"dot_expander": {
"field": "field.to.expand"
}
}
```
{% include copy-curl.html %}

## Configuration parameters

Check failure on line 23 in _ingest-pipelines/processors/dot-expander.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _ingest-pipelines/processors/dot-expander.md#L23

[OpenSearch.HeadingCapitalization] 'Configuration parameters' is a heading and should be in sentence case.
Raw output
{"message": "[OpenSearch.HeadingCapitalization] 'Configuration parameters' is a heading and should be in sentence case.", "location": {"path": "_ingest-pipelines/processors/dot-expander.md", "range": {"start": {"line": 23, "column": 4}}}, "severity": "ERROR"}

The following table lists the required and optional parameters for the `dot_expander` processor.

Parameter | Required/Optional | Description |

Check failure on line 27 in _ingest-pipelines/processors/dot-expander.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _ingest-pipelines/processors/dot-expander.md#L27

[OpenSearch.TableHeadings] 'Required/Optional' is a table heading and should be in sentence case.
Raw output
{"message": "[OpenSearch.TableHeadings] 'Required/Optional' is a table heading and should be in sentence case.", "location": {"path": "_ingest-pipelines/processors/dot-expander.md", "range": {"start": {"line": 27, "column": 13}}}, "severity": "ERROR"}
|-----------|-----------|-----------|
`field` | Required | The field to expand into an object field. If set to `*`, all top-level fields will be expanded. |
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
`path` | Optional | The field is only required if the field to be expanded is nested within another object field. This is because the `field` parameter only recognizes leaf fields, which are fields that are not nested within any other objects. |
`override` | Optional | The field determines how the processor handles conflicts when expanding a field that overlaps with an existing nested object. Setting `override` to `false` instructs the processor to merge the conflicting values into an array, preserving both the original and expanded values. Conversely, setting `override` to `true` causes the processor to replace the existing nested object's value with the expanded field's value. |
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
`description` | Optional | A brief description of the processor. |
`if` | Optional | A condition for running this processor. |
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
`ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. |
`on_failure` | Optional | A list of processors to run if the processor fails. |
`tag` | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. |
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

## Using the processor

Follow these steps to use the processor in a pipeline.

**Step 1: Create a pipeline.**
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

The following query expands two fields named `user.address.city` and `user.address.state` into nested objects named `city` and `state`:
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

```json
PUT /_ingest/pipeline/dot-expander
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
{
"description": "Dot expander processor",
"processors": [
{
"dot_expander": {
"field": "user.address.city"
}
},
{
"dot_expander":{
"field": "user.address.state"
}
}
]
}
```
{% include copy-curl.html %}

**Step 2 (Optional): Test the pipeline.**

It is recommended that you test your pipeline before you ingest documents.
{: .tip}

To test the pipeline, run the following query:

```json
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
POST _ingest/pipeline/dot-expander/_simulate
{
"docs": [
{
"_index": "testindex1",
"_id": "1",
"_source": {
"field": "city, state"
}
}
]
}
```
{% include copy-curl.html %}

#### Response

The following example response confirms that the pipeline is working as expected:
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

```json
{
"docs": [
{
"doc": {
"_index": "testindex1",
"_id": "1",
"_source": {
"field": "city, state"
},
"_ingest": {
"timestamp": "2023-11-17T23:49:27.597933805Z"
}
}
}
]
}
```
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

**Step 3: Ingest a document.**

The following query ingests a document into an index named `testindex1`:

```json
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
PUT testindex1/_doc/1?pipeline=dot-expander
{
"field": "Denver, CO"
}
```
{% include copy-curl.html %}

**Step 4 (Optional): Retrieve the document.**

To retrieve the document, run the following query:

```json
GET testindex1/_doc/1
```
{% include copy-curl.html %}

#### Response

The following response confirms the document was indexed:
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

```json
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
{
"_index": "testindex1",
"_id": "1",
"_version": 58,
"_seq_no": 57,
"_primary_term": 30,
"found": true,
"_source": {
"field": "Denver, CO"
}
}
```

## Nested fields

The processor consolidates the `user.address.city` and `user.address.state` fields by merging with an existing `address`, `city`, and `state` field nested under `user`. If the field is a scalar value, then it will turn that into an array field. Take for example the following document:
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

```json
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
{
"user": {
"address": {
"city": "Denver",
"state": "CO"
}
}
}
```

The `dot_expander` processor transforms the individual fields into arrays as follows:

```json
{
"user": {
"address": {
"city": ["Denver"],
"state": ["CO"]
}
}
}
```

If you set the `override` parameter to `true`, the value of the expanded field overrides the value of the nested object. Take for example the following configuration:

```json
{
"dot_expander": {
"field": "user.address.city",
"override": true
}
}
```

The result is the following document, in which the expanded field `user.address.city` overrides the value of the nested object `user.address`:

```json
{
"user": {
"address": {
"city": "Denver",
"state": "CO"
}
}
}
```
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

If the field value is set to a wildcard `*`, the processor expands all top-level dotted field names, for example:
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

```json
{
"dot_expander": {
"field": "*"
}
}
```

Take for example the following:
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

```json
{
"user.address.city": "Denver",
"user.address.state": "CO"
}
```

The `dot_expander` processor transforms that document into the following:
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

```json
{
"user": {
"address": {
"city": "Denver"
}
},
"user": {
"address": {
"state": "CO"
}
}
}
```

If the field is nested within a structure without dots, use the `path` parameter to navigate the non-dotted structure, for example:
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

```json
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
{
"dot_expander": {
"path": "user.address",
"field": "*"
}
}
```

Take for example the following document:

```json
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
{
"user": {
"address": {
"city.one": "Denver",
"city.two": "Houston",
"state.one": "CO",
"state.two": "TX"
}
}
}
```

The `dot_expander` processor transforms the document into:

```json
{
"user": {
"address": {
"city": {
"one": "Denver",
"two": "Houston"
},
"state": {
"one": "CO",
"two": "TX"
}
}
}
}
```

To ensure proper expansion of the `user.address.city` and `user.address.state` fields and handle conflicts with pre-existing fields, use a similar configuration as the following document:
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

```json
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
{
"user": {
"address": {
"city": "Denver",
"state": "CO"
}
}
}
```

To ensure the correct expansion of the `city` and `state` fields, the following pipeline uses the `rename` processor to prevent conflicts and allow for proper handling of scalar fields during expansion.

```json
{
"processors": [
{
"rename": {
"field": "user.address",
"target_field": "user.address.original"
}
},
{
"dot_expander": {
"field": "user.address.original"
}
}
]
}
```