From e2167571aac789b644a451e63b07fbc5725b16f9 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" Date: Thu, 7 Mar 2024 16:46:02 +0000 Subject: [PATCH] [DOC] Add drop processor (#5767) * Add drop processor doc to address content gap Signed-off-by: Melissa Vagi * Address tech review feedback Signed-off-by: Melissa Vagi * Address tech review changes Signed-off-by: Melissa Vagi * Delete _ingest-pipelines/processors/date.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Revert "Delete _ingest-pipelines/processors/date.md" This reverts commit 73296f59d9e6e1721aefcdcc49f7dcb3c6f735ad. * Update _ingest-pipelines/processors/drop.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _ingest-pipelines/processors/drop.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _ingest-pipelines/processors/drop.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _ingest-pipelines/processors/drop.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _ingest-pipelines/processors/drop.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update drop.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi --------- Signed-off-by: Melissa Vagi Co-authored-by: Nathan Bower (cherry picked from commit b38ba75d8a80e7679fd36971d679ec9859bf3cce) Signed-off-by: github-actions[bot] --- _ingest-pipelines/processors/date.md | 2 +- _ingest-pipelines/processors/drop.md | 123 +++++++++++++++++++++++++++ 2 files changed, 124 insertions(+), 1 deletion(-) create mode 100644 _ingest-pipelines/processors/drop.md diff --git a/_ingest-pipelines/processors/date.md b/_ingest-pipelines/processors/date.md index 1ebb8a1a59..364e6cce96 100644 --- a/_ingest-pipelines/processors/date.md +++ b/_ingest-pipelines/processors/date.md @@ -11,7 +11,7 @@ redirect_from: The `date` processor is used to parse dates from document fields and to add the parsed data to a new field. By default, the parsed data is stored in the `@timestamp` field. -## Syntax +## Syntax example The following is the syntax for the `date` processor: diff --git a/_ingest-pipelines/processors/drop.md b/_ingest-pipelines/processors/drop.md new file mode 100644 index 0000000000..1dd5fdb9d6 --- /dev/null +++ b/_ingest-pipelines/processors/drop.md @@ -0,0 +1,123 @@ +--- +layout: default +title: Drop +parent: Ingest processors +nav_order: 70 +--- + +# Drop processor + +The `drop` processor is used to discard documents without indexing them. This can be useful for preventing documents from being indexed based on certain conditions. For example, you might use a `drop` processor to prevent documents that are missing important fields or contain sensitive information from being indexed. + +The `drop` processor does not raise any errors when it discards documents, making it useful for preventing indexing problems without cluttering your OpenSearch logs with error messages. + +## Syntax example + +The following is the syntax for the `drop` processor: + +```json +{ + "drop": { + "if": "ctx.foo == 'bar'" + } +} +``` +{% include copy-curl.html %} + +## Configuration parameters + +The following table lists the required and optional parameters for the `drop` processor. + +Parameter | Required | Description | +|-----------|-----------|-----------| +`description` | Optional | A brief description of the processor. | +`if` | Optional | A condition for running the processor. | +`ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. See [Handling pipeline failures]({{site.url}}{{site.baseurl}}/ingest-pipelines/pipeline-failures/) for more information. | +`on_failure` | Optional | A list of processors to run if the processor fails. See [Handling pipeline failures]({{site.url}}{{site.baseurl}}/ingest-pipelines/pipeline-failures/) for more information. | +`tag` | Optional | An identifier tag for the processor. Useful for distinguishing between processors of the same type when debugging. | + +## Using the processor + +Follow these steps to use the processor in a pipeline. + +**Step 1: Create a pipeline** + +The following query creates a pipeline, named `drop-pii`, that uses the `drop` processor to prevent a document containing personally identifiable information (PII) from being indexed: + +```json +PUT /_ingest/pipeline/drop-pii +{ + "description": "Pipeline that prevents PII from being indexed", + "processors": [ + { + "drop": { + "if" : "ctx.user_info.contains('password') || ctx.user_info.contains('credit card')" + } + } + ] +} +``` +{% include copy-curl.html %} + +**Step 2 (Optional): Test the pipeline** + +It is recommended that you test your pipeline before ingesting documents. +{: .tip} + +To test the pipeline, run the following query: + +```json +POST _ingest/pipeline/drop-pii/_simulate +{ + "docs": [ + { + "_index": "testindex1", + "_id": "1", + "_source": { + "user_info": "Sensitive information including credit card" + } + } + ] +} +``` +{% include copy-curl.html %} + +#### Response + +The following example response confirms that the pipeline is working as expected (the document has been dropped): + +```json +{ + "docs": [ + null + ] +} +``` +{% include copy-curl.html %} + +**Step 3: Ingest a document** + +The following query ingests a document into an index named `testindex1`: + +```json +PUT testindex1/_doc/1?pipeline=drop-pii +{ + "user_info": "Sensitive information including credit card" +} +``` +{% include copy-curl.html %} + +The following response confirms that the document with the ID of `1` was not indexed: + +{ + "_index": "testindex1", + "_id": "1", + "_version": -3, + "result": "noop", + "_shards": { + "total": 0, + "successful": 0, + "failed": 0 + } +} +{% include copy-curl.html %}