Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new documentation IP2Geo processor automatic updating feature #4095

Closed
wants to merge 80 commits into from
Closed
Changes from 61 commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
8a49f53
Content planning
vagimeli May 17, 2023
398eae6
Content planning
vagimeli May 17, 2023
74dd61d
Writing
vagimeli May 22, 2023
dbcec4e
Writing
vagimeli May 23, 2023
daee116
Writing
vagimeli May 23, 2023
2c1bf28
Writing
vagimeli May 23, 2023
989aa7c
Writing
vagimeli May 23, 2023
0822c04
Writing
vagimeli May 23, 2023
a659296
Update _api-reference/ingest-apis/geoip.md
vagimeli May 24, 2023
0742e27
Update _api-reference/ingest-apis/geoip.md
vagimeli May 24, 2023
185179c
Update _api-reference/ingest-apis/geoip.md
vagimeli May 24, 2023
fd4b645
Update _api-reference/ingest-apis/geoip.md
vagimeli May 24, 2023
a460a6e
Update _api-reference/ingest-apis/geoip.md
vagimeli May 24, 2023
a5cf0e0
Update _api-reference/ingest-apis/geoip.md
vagimeli May 24, 2023
a5d96dc
Update _api-reference/ingest-apis/geoip.md
vagimeli May 24, 2023
9b292ea
Update _api-reference/ingest-apis/geoip.md
vagimeli May 24, 2023
9655bb8
Update _api-reference/ingest-apis/geoip.md
vagimeli May 24, 2023
a448e40
Update _api-reference/ingest-apis/geoip.md
vagimeli May 24, 2023
d225a16
Update _api-reference/ingest-apis/geoip.md
vagimeli May 24, 2023
720c244
Update _api-reference/ingest-apis/geoip.md
vagimeli May 24, 2023
aba49db
Update _api-reference/ingest-apis/geoip.md
vagimeli May 24, 2023
1affbe6
Writing
vagimeli May 24, 2023
91cd80a
Writing
vagimeli May 24, 2023
85b845b
Writing
vagimeli May 24, 2023
23d8cd4
Writing
vagimeli May 24, 2023
edbb76f
Writing
vagimeli May 24, 2023
ed5ac30
Writing
vagimeli May 25, 2023
037dff5
Writing
vagimeli May 25, 2023
0b9b5e2
Address tech review feedback
vagimeli May 26, 2023
823f77a
Update processors.md
vagimeli May 26, 2023
7c0474f
Update processors.md
vagimeli May 26, 2023
c32a6b6
Writing
vagimeli May 30, 2023
baa6a38
Add processor index page
vagimeli Jun 2, 2023
28f5b6b
Update front matter
vagimeli Jun 22, 2023
7a5bdc8
Created new file under Ingest Processors TOC
vagimeli Jun 22, 2023
bed4f18
Update ip2geo.md
vagimeli Jun 28, 2023
75572e9
Update ip2geo.md
vagimeli Jun 28, 2023
3c2c1df
Update ip2geo.md
vagimeli Jun 28, 2023
985e111
Update ip2geo.md
vagimeli Jul 5, 2023
b718810
Update ip2geo.md
vagimeli Jul 5, 2023
8a48e7b
Update _api-reference/ingest-apis/ip2geo.md
vagimeli Aug 3, 2023
7e1b3af
Update _api-reference/ingest-apis/ip2geo.md
vagimeli Aug 3, 2023
85bbaa6
Update _api-reference/ingest-apis/ip2geo.md
vagimeli Aug 3, 2023
c5e30f4
Update _api-reference/ingest-apis/ip2geo.md
vagimeli Aug 3, 2023
9c42c20
Update _api-reference/ingest-apis/ip2geo.md
vagimeli Aug 3, 2023
cc86348
Update _api-reference/ingest-apis/ip2geo.md
vagimeli Aug 3, 2023
72c2091
Update _api-reference/ingest-apis/ip2geo.md
vagimeli Aug 3, 2023
8abfc1c
Update _api-reference/ingest-apis/ip2geo.md
vagimeli Aug 3, 2023
99b0600
Update _api-reference/ingest-apis/ip2geo.md
vagimeli Aug 3, 2023
280f71b
Update _api-reference/ingest-apis/ip2geo.md
vagimeli Aug 3, 2023
0ca4e2d
Update _api-reference/ingest-apis/ip2geo.md
vagimeli Aug 3, 2023
b6f406b
Update _api-reference/ingest-apis/ip2geo.md
vagimeli Aug 3, 2023
3fbeb91
Update _api-reference/ingest-apis/ip2geo.md
vagimeli Aug 3, 2023
71ae9d7
Update _api-reference/ingest-apis/ip2geo.md
vagimeli Aug 3, 2023
99b117b
Update ip2geo.md
vagimeli Aug 3, 2023
a6dc976
Update _api-reference/ingest-apis/ip2geo.md
vagimeli Aug 18, 2023
e381d1e
Address doc review feedback
vagimeli Aug 18, 2023
c64e3cc
Update ip2geo.md
vagimeli Aug 18, 2023
008a52b
Update ip2geo.md
vagimeli Aug 18, 2023
59ef808
Add copy labels
vagimeli Aug 22, 2023
e83bbd8
Update ip2geo.md
vagimeli Aug 22, 2023
e43415a
Update _api-reference/ingest-apis/ip2geo.md
vagimeli Aug 23, 2023
512cb4f
Update _api-reference/ingest-apis/ip2geo.md
vagimeli Aug 23, 2023
0af7ee4
Update _api-reference/ingest-apis/ip2geo.md
vagimeli Aug 23, 2023
073980c
Update _api-reference/ingest-apis/ip2geo.md
vagimeli Aug 23, 2023
167468f
Update _api-reference/ingest-apis/ip2geo.md
vagimeli Aug 23, 2023
8fee903
Update _api-reference/ingest-apis/ip2geo.md
vagimeli Aug 23, 2023
9730088
Update _api-reference/ingest-apis/ip2geo.md
vagimeli Aug 23, 2023
e2c2e01
Update _api-reference/ingest-apis/ip2geo.md
vagimeli Aug 23, 2023
933a0c4
Update _api-reference/ingest-apis/ip2geo.md
vagimeli Aug 23, 2023
1a357f3
Update _api-reference/ingest-apis/ip2geo.md
vagimeli Aug 23, 2023
ff19e23
Update _api-reference/ingest-apis/ip2geo.md
vagimeli Aug 23, 2023
17cc8a6
Update _api-reference/ingest-apis/ip2geo.md
vagimeli Aug 23, 2023
c84be08
Update _api-reference/ingest-apis/ip2geo.md
vagimeli Aug 23, 2023
3e3743a
Update _api-reference/ingest-apis/ip2geo.md
vagimeli Aug 23, 2023
936b757
Copy edits
vagimeli Aug 23, 2023
48d88cf
Address editorial feedback
vagimeli Aug 23, 2023
2322ee0
Address editorial feedback
vagimeli Aug 23, 2023
2e4eac0
Update ip2geo.md
vagimeli Aug 23, 2023
42ab53a
Copy edit to align format to processors template
vagimeli Sep 5, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
245 changes: 245 additions & 0 deletions _api-reference/ingest-apis/ip2geo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,245 @@
---
layout: default
title: IP2Geo
parent: Ingest processors
grand_parent: Ingest APIs
nav_order: 130
---

# IP2Geo
Introduced 2.10
{: .label .label-purple }

The `ip2geo` processor adds information about the geographical location of an IPv4 or IPv6 address. The `ip2geo` processor uses IP geolocation (GeoIP) data from an external endpoint and therefore requires an additional component, `datasource`, that defines from where to download GeoIP data and how frequently to update the data.

vagimeli marked this conversation as resolved.
Show resolved Hide resolved
The `ip2geo` processor maintains the GeoIP data mapping in system indexes. The GeoIP mapping is retrieved from these indexes during data ingestion to perform the IP to geolocation conversion on the incoming data. For optimal performance, it is preferable to have a node with both ingest and data roles. This configuration avoids internode calls reducing latency. Also, as the `ip2geo` processor searches GeoIP mapping data from the indexes, search performance is impacted.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
{: .note}

## Getting started

To get started with using the `ip2geo` processor, the `opensearch-geospatial` plugin must be installed. See [Installing plugins]({{site.url}}{{site.baseurl}}/install-and-configure/plugins/) to learn more.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

## Cluster settings

The IP2Geo data source and `ip2geo` processor node settings are listed in the following table.

| Key | Description | Default |
|--------------------|-------------|---------|
| plugins.geospatial.ip2geo.datasource.endpoint | Default endpoint for creating the data source API. | Defaults to https://geoip.maps.opensearch.org/v1/geolite2-city/manifest.json. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the terminating period shouldn't be part of the link.

| plugins.geospatial.ip2geo.datasource.update_interval_in_days | Default update interval for creating the data source API. | Defaults to 3. |
| plugins.geospatial.ip2geo.datasource.batch_size | Maximum number of documents to ingest in a bulk request during the IP2Geo data source creation process. | Defaults to 10,000. |
| plugins.geospatial.ip2geo.processor.cache_size | Maximum number of results that can be cached. There is only single cache used for all IP2Geo processors in each node | Defaults to 1,000. |
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
|-------------------|-------------|---------|

## Creating the IP2Geo data source

Before creating the pipeline that uses the `ip2geo` processor, create the IP2Geo data source. The data source defines the endpoint value to download GeoIP data and specifies the update interval.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

OpenSearch provides the following endpoints for GeoLite2 City, GeoLite2 Country, and GeoLite2 ASN databases from [MaxMind](https://dev.maxmind.com/geoip/geolite2-free-geolocation-data), which is shared under the [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/) license:

* GeoLite2 City: https://geoip.maps.opensearch.org/v1/geolite2-city/manifest.json
* GeoLite2 Country: https://geoip.maps.opensearch.org/v1/geolite2-country/manifest.json
* GeoLite2 ASN: https://geoip.maps.opensearch.org/v1/geolite2-asn/manifest.json

If an OpenSearch cluster cannot update a data source from the endpoints in 30 days, the cluster does not add GeoIP data to the documents, instead it adds `"error":"ip2geo_data_expired"`.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

### Data source options

The following table lists the data source options for the `ip2geo` processor.

| Name | Required | Default | Description |
|------|----------|---------|-------------|
| `endpoint` | Optional | https://geoip.maps.opensearch.org/v1/geolite2-city/manifest.json | The endpoint for downloading the GeoIP data. |
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
| `update_interval_in_days` | Optional | 3 | The frequency in days for updating the GeoIP data. The minimum value is 1. |
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

The following example creates an IP2Geo data source:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the example itself does not create a data source, a noun should follow "example", or the text should be revised to use a format similar to the one used on line 71.


```json
PUT /_plugins/geospatial/ip2geo/datasource/my-datasource
{
"endpoint" : "https://geoip.maps.opensearch.org/v1/geolite2-city/manifest.json",
"update_interval_in_days" : 3
}
```
{% include copy-curl.html %}

A `true` response means the request was successful and the server was able to process the request. A `false` reponse means check the request to make sure it is valid, check the URL to make sure it is correct, or try again.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
{. :tip}

### Sending a GET request

To get information about one or more IP2Geo data sources, send a GET request:

```json
GET /_plugins/geospatial/ip2geo/datasource/my-datasource
```
{% include copy-curl.html %}

You'll get the following response:
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

```json
{
"datasources": [
{
"name": "my-datasource",
"state": "AVAILABLE",
"endpoint": "https://geoip.maps.opensearch.org/v1/geolite2-city/manifest.json",
"update_interval_in_days": 3,
"next_update_at_in_epoch_millis": 1685125612373,
"database": {
"provider": "maxmind",
"sha256_hash": "0SmTZgtTRjWa5lXR+XFCqrZcT495jL5XUcJlpMj0uEA=",
"updated_at_in_epoch_millis": 1684429230000,
"valid_for_in_days": 30,
"fields": [
"country_iso_code",
"country_name",
"continent_name",
"region_iso_code",
"region_name",
"city_name",
"time_zone",
"location"
]
},
"update_stats": {
"last_succeeded_at_in_epoch_millis": 1684866730192,
"last_processing_time_in_millis": 317640,
"last_failed_at_in_epoch_millis": 1684866730492,
"last_skipped_at_in_epoch_millis": 1684866730292
}
}
]
}
```

### Updating an IP2Geo data source

See [Creating the IP2Geo data source]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/ip2geo/#creating-the-ip2geo-data-source) for endpoints and request field descriptions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"for a list of endpoints..."?


The following example updates the data source:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as on line 55.


```json
PUT /_plugins/geospatial/ip2geo/datasource/my-datasource/_settings
{
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
"endpoint": https://geoip.maps.opensearch.org/v1/geolite2-city/manifest.json,
"update_interval_in_days": 10
}
```
{% include copy-curl.html %}

### Deleting the IP2Geo data source

To delete the IP2Geo data source, you must first delete all processors associated with the data source. Otherwise, the request fails.

The following example deletes the data source:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same


```json
DELETE /_plugins/geospatial/ip2geo/datasource/my-datasource
```
{% include copy-curl.html %}

## Creating the pipeline

Once the data source is created, you can create the pipeline. The syntax for the `ip2geo` processor is:
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

```json
{
"ip2geo": {
"field":"ip",
"datasource":"my-datasource"
}
}
```
{% include copy-curl.html %}

### Configuration parameters

vagimeli marked this conversation as resolved.
Show resolved Hide resolved
The following table lists the required and optional parameters for the `ip2geo` processor.

| Name | Required | Default | Description |
|------|----------|---------|-------------|
| `field` | Required | - | The field that contains the IP address for geographical lookup. |
| `datasource` | Required | - | The data source name to use to look up geographical information. |
| `properties` | Optional | All fields in `datasource`. | The field that controls what properties are added to `target_field` from `datasource`. |
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
| `target_field` | Optional | ip2geo | The field that holds the geographical information looked up from the data source. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there something we can use other than "looked up"? Something like "retrieved"?

| `ignore_missing` | Optional | false | If `true` and `field` does not exist, the processor quietly exits without modifying the document. |
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The end of the description is a bit oddly worded. Do we just mean "the processor does not modify the document"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised to read: If set to true, the processor does not modify the document if the field does not exist or is null. Default is false.


## Using the processor

Follow these steps to use the processor in a pipeline.

**Step 1: Create pipeline.**
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

The following query creates a pipeline, named `my-pipeline`, that converts the IP address to geographical information:

```json
PUT /_ingest/pipeline/my-pipeline
{
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
"description":"convert ip to geo",
"processors":[
{
"ip2geo":{
"field":"ip",
"datasource":"my-datasource"
}
}
]
}
```
{% include copy-curl.html %}

**Step 2: Ingest a document into the index.**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"an index" instead of "the index"?


The following query ingests a document into the index named `my-index`:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"an index" instead of "the index"?


```json
PUT /my-index/_doc/my-id?pipeline=ip2geo
{
"ip": "172.0.0.1"
}
```
{% include copy-curl.html %}

**Step 3: View the ingested document.**

To view the ingested document, run the following query:

```json
GET /my-index/_doc/my-id
```
{% include copy-curl.html %}

**Step 4: Test the pipeline.**

To test the pipeline, run the following query:

```json
POST _ingest/pipeline/my-id/_simulate
{
"docs": [
{
"_index":"my-index",
"_id":"my-id",
"_source":{
"my_ip_field":"172.0.0.1",
"ip2geo":{
"continent_name":"North America",
"region_iso_code":"AL",
"city_name":"Calera",
"country_iso_code":"US",
"country_name":"United States",
"region_name":"Alabama",
"location":"33.1063,-86.7583",
"time_zone":"America/Chicago"
}
}
}
]
}
```
{% include copy-curl.html %}

You'll get the following response, which confirms the pipeline is working correctly and producing the expected output:
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

vagimeli marked this conversation as resolved.
Show resolved Hide resolved
<insert response following code freeze>