Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for AWS Sigv4 for UrlLib3. #547

Merged
merged 13 commits into from
Oct 23, 2023
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ Inspired from [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
- Added `pool_maxsize` for `Urllib3HttpConnection` ([#535](https://github.com/opensearch-project/opensearch-py/pull/535))
- Added benchmarks ([#537](https://github.com/opensearch-project/opensearch-py/pull/537))
- Added guide on making raw JSON REST requests ([#542](https://github.com/opensearch-project/opensearch-py/pull/542))
- Added support for AWS SigV4 for urllib3 ([#547](https://github.com/opensearch-project/opensearch-py/pull/547))
### Changed
- Generate `tasks` client from API specs ([#508](https://github.com/opensearch-project/opensearch-py/pull/508))
- Generate `ingest` client from API specs ([#513](https://github.com/opensearch-project/opensearch-py/pull/513))
Expand Down
16 changes: 14 additions & 2 deletions DEVELOPER_GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,19 @@ docker run -d -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" opensear

Tests require a live instance of OpenSearch running in docker.

This will start a new instance and run tests against the latest version of OpenSearch.
If you have one running.

```
python setup.py test
```

To run tests in a specific test file.

```
python setup.py test -s test_opensearchpy/test_connection.py
```

If you want to auto-start one, the following will start a new instance and run tests against the latest version of OpenSearch.

```
./.ci/run-tests
Expand Down Expand Up @@ -76,7 +88,7 @@ You can also run individual tests matching a pattern (`pytest -k [pattern]`).
```
./.ci/run-tests true 1.3.0 test_no_http_compression

test_opensearchpy/test_connection.py::TestUrllib3Connection::test_no_http_compression PASSED [ 33%]
test_opensearchpy/test_connection.py::TestUrllib3HttpConnection::test_no_http_compression PASSED [ 33%]
test_opensearchpy/test_connection.py::TestRequestsConnection::test_no_http_compression PASSED [ 66%]
test_opensearchpy/test_async/test_connection.py::TestAIOHttpConnection::test_no_http_compression PASSED [100%]
```
Expand Down
13 changes: 9 additions & 4 deletions guides/auth.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
- [Authentication](#authentication)
- [IAM Authentication](#iam-authentication)
- [IAM Authentication with a Synchronous Client](#iam-authentication-with-a-synchronous-client)
- [IAM Authentication with an Async Client](#iam-authentication-with-an-async-client)
- [Kerberos](#kerberos)

Expand All @@ -9,24 +10,28 @@ OpenSearch allows you to use different methods for the authentication via `conne

## IAM Authentication

Opensearch-py supports IAM-based authentication via `AWSV4SignerAuth`, which uses `RequestHttpConnection` as the transport class for communicating with OpenSearch clusters running in Amazon Managed OpenSearch and OpenSearch Serverless, and works in conjunction with [botocore](https://pypi.org/project/botocore/).
This library supports IAM-based authentication when communicating with OpenSearch clusters running in Amazon Managed OpenSearch and OpenSearch Serverless.

## IAM Authentication with a Synchronous Client

For `Urllib3HttpConnection` use `Urllib3AWSV4SignerAuth`, and for `RequestHttpConnection` use `RequestsAWSV4SignerAuth`.

```python
from opensearchpy import OpenSearch, RequestsHttpConnection, AWSV4SignerAuth
from opensearchpy import OpenSearch, Urllib3HttpConnection, Urllib3AWSV4SignerAuth
import boto3

host = '' # cluster endpoint, for example: my-test-domain.us-east-1.es.amazonaws.com
region = 'us-west-2'
service = 'es' # 'aoss' for OpenSearch Serverless
credentials = boto3.Session().get_credentials()
auth = AWSV4SignerAuth(credentials, region, service)
auth = Urllib3AWSV4SignerAuth(credentials, region, service)

client = OpenSearch(
hosts = [{'host': host, 'port': 443}],
http_auth = auth,
use_ssl = True,
verify_certs = True,
connection_class = RequestsHttpConnection,
connection_class = Urllib3HttpConnection,
pool_maxsize = 20
)

Expand Down
9 changes: 8 additions & 1 deletion opensearchpy/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,12 @@
UnknownDslObject,
ValidationException,
)
from .helpers import AWSV4SignerAsyncAuth, AWSV4SignerAuth
from .helpers import (
AWSV4SignerAsyncAuth,
AWSV4SignerAuth,
RequestsAWSV4SignerAuth,
Urllib3AWSV4SignerAuth,
)
from .helpers.aggs import A
from .helpers.analysis import analyzer, char_filter, normalizer, token_filter, tokenizer
from .helpers.document import Document, InnerDoc, MetaField
Expand Down Expand Up @@ -166,6 +171,8 @@
"OpenSearchWarning",
"OpenSearchDeprecationWarning",
"AWSV4SignerAuth",
"Urllib3AWSV4SignerAuth",
"RequestsAWSV4SignerAuth",
"AWSV4SignerAsyncAuth",
"A",
"AttrDict",
Expand Down
21 changes: 17 additions & 4 deletions opensearchpy/connection/http_urllib3.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
import ssl
import time
import warnings
from typing import Callable

import urllib3 # type: ignore
from urllib3.exceptions import ReadTimeoutError
Expand Down Expand Up @@ -128,10 +129,17 @@ def __init__(
opaque_id=opaque_id,
**kwargs
)
if http_auth is not None:
if isinstance(http_auth, (tuple, list)):
http_auth = ":".join(http_auth)
self.headers.update(urllib3.make_headers(basic_auth=http_auth))

self.http_auth = http_auth
if self.http_auth is not None:
if isinstance(self.http_auth, Callable):
pass
elif isinstance(self.http_auth, (tuple, list)):
self.headers.update(
urllib3.make_headers(basic_auth=":".join(http_auth))
)
else:
self.headers.update(urllib3.make_headers(basic_auth=http_auth))

pool_class = urllib3.HTTPConnectionPool
kw = {}
Expand Down Expand Up @@ -218,6 +226,7 @@ def perform_request(
url = "%s?%s" % (url, urlencode(params))

full_url = self.host + url

start = time.time()
orig_body = body
try:
Expand All @@ -240,6 +249,10 @@ def perform_request(
body = self._gzip_compress(body)
request_headers["content-encoding"] = "gzip"

if self.http_auth is not None:
if isinstance(self.http_auth, Callable):
request_headers.update(self.http_auth(method, full_url, body))

response = self.pool.urlopen(
method, url, body, retries=Retry(False), headers=request_headers, **kw
)
Expand Down
4 changes: 3 additions & 1 deletion opensearchpy/helpers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@
)
from .asyncsigner import AWSV4SignerAsyncAuth
from .errors import BulkIndexError, ScanError
from .signer import AWSV4SignerAuth
from .signer import AWSV4SignerAuth, RequestsAWSV4SignerAuth, Urllib3AWSV4SignerAuth

__all__ = [
"BulkIndexError",
Expand All @@ -54,6 +54,8 @@
"_process_bulk_chunk",
"AWSV4SignerAuth",
"AWSV4SignerAsyncAuth",
"RequestsAWSV4SignerAuth",
"Urllib3AWSV4SignerAuth",
]


Expand Down
1 change: 1 addition & 0 deletions opensearchpy/helpers/__init__.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -48,5 +48,6 @@ try:
from .._async.helpers.actions import async_streaming_bulk as async_streaming_bulk
from .asyncsigner import AWSV4SignerAsyncAuth as AWSV4SignerAsyncAuth
from .signer import AWSV4SignerAuth as AWSV4SignerAuth
from .signer import RequestsAWSV4SignerAuth, Urllib3AWSV4SignerAuth
except (ImportError, SyntaxError):
pass
125 changes: 79 additions & 46 deletions opensearchpy/helpers/signer.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
# GitHub history for details.

import sys
from typing import Any, Callable, Dict

import requests

Expand All @@ -17,38 +18,12 @@
from urllib.parse import parse_qs, urlencode, urlparse


def fetch_url(prepared_request): # type: ignore
class AWSV4Signer:
"""
This is a util method that helps in reconstructing the request url.
:param prepared_request: unsigned request
:return: reconstructed url
Generic AWS V4 Request Signer.
"""
url = urlparse(prepared_request.url)
path = url.path or "/"

# fetch the query string if present in the request
querystring = ""
if url.query:
querystring = "?" + urlencode(
parse_qs(url.query, keep_blank_values=True), doseq=True
)

# fetch the host information from headers
headers = dict(
(key.lower(), value) for key, value in prepared_request.headers.items()
)
location = headers.get("host") or url.netloc

# construct the url and return
return url.scheme + "://" + location + path + querystring


class AWSV4SignerAuth(requests.auth.AuthBase):
"""
AWS V4 Request Signer for Requests.
"""

def __init__(self, credentials, region, service="es"): # type: ignore
def __init__(self, credentials, region: str, service: str = "es") -> Any: # type: ignore
if not credentials:
raise ValueError("Credentials cannot be empty")
self.credentials = credentials
Expand All @@ -61,27 +36,20 @@ def __init__(self, credentials, region, service="es"): # type: ignore
raise ValueError("Service name cannot be empty")
self.service = service

def __call__(self, request): # type: ignore
return self._sign_request(request) # type: ignore

def _sign_request(self, prepared_request): # type: ignore
def sign(self, method: str, url: str, body: Any) -> Dict[str, str]:
"""
This method helps in signing the request by injecting the required headers.
:param prepared_request: unsigned request
:return: signed request
This method signs the request and returns headers.
:param method: HTTP method
:param url: url
:param body: body
:return: headers
"""

from botocore.auth import SigV4Auth
from botocore.awsrequest import AWSRequest

url = fetch_url(prepared_request) # type: ignore

# create an AWS request object and sign it using SigV4Auth
aws_request = AWSRequest(
method=prepared_request.method.upper(),
url=url,
data=prepared_request.body,
)
aws_request = AWSRequest(method=method.upper(), url=url, data=body)

# credentials objects expose access_key, secret_key and token attributes
# via @property annotations that call _refresh() on every access,
Expand All @@ -101,9 +69,74 @@ def _sign_request(self, prepared_request): # type: ignore
sig_v4_auth.add_auth(aws_request)

# copy the headers from AWS request object into the prepared_request
prepared_request.headers.update(dict(aws_request.headers.items()))
prepared_request.headers["X-Amz-Content-SHA256"] = sig_v4_auth.payload(
aws_request
headers = dict(aws_request.headers.items())
headers["X-Amz-Content-SHA256"] = sig_v4_auth.payload(aws_request)

return headers


class RequestsAWSV4SignerAuth(requests.auth.AuthBase):
"""
AWS V4 Request Signer for Requests.
"""

def __init__(self, credentials, region, service="es"): # type: ignore
self.signer = AWSV4Signer(credentials, region, service)

def __call__(self, request): # type: ignore
return self._sign_request(request) # type: ignore

def _sign_request(self, prepared_request): # type: ignore
"""
This method helps in signing the request by injecting the required headers.
:param prepared_request: unsigned request
:return: signed request
"""

prepared_request.headers.update(
self.signer.sign(
prepared_request.method,
self._fetch_url(prepared_request), # type: ignore
prepared_request.body,
)
)

return prepared_request

def _fetch_url(self, prepared_request): # type: ignore
"""
This is a util method that helps in reconstructing the request url.
:param prepared_request: unsigned request
:return: reconstructed url
"""
url = urlparse(prepared_request.url)
path = url.path or "/"

# fetch the query string if present in the request
querystring = ""
if url.query:
querystring = "?" + urlencode(
parse_qs(url.query, keep_blank_values=True), doseq=True
)

# fetch the host information from headers
headers = dict(
(key.lower(), value) for key, value in prepared_request.headers.items()
)
location = headers.get("host") or url.netloc

# construct the url and return
return url.scheme + "://" + location + path + querystring


# Deprecated: use RequestsAWSV4SignerAuth
class AWSV4SignerAuth(RequestsAWSV4SignerAuth):
pass


class Urllib3AWSV4SignerAuth(Callable): # type: ignore
def __init__(self, credentials, region, service="es"): # type: ignore
self.signer = AWSV4Signer(credentials, region, service)

def __call__(self, method: str, url: str, body: Any) -> Dict[str, str]:
return self.signer.sign(method, url, body)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RequestsAWSV4SignerAuth returns prepared request while Urllib3AWSV4SignerAuth returns the headers, should we keep them consistent?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The requests library expect a callable that derives from requests.auth.AuthBase and receives a request object. It's called inside the requests library.

The urllib3 library doesn't support such an interface. It actually splits the client constructor and exposes a perform_request method. That's where we have the method, URL, and body, and call auth. There's no "request" object here.

We could make a similar interface to AuthBase for urllib3, but it cannot be the same interface because that one exists in the requests library. Since these signer classes aren't interchangeable at all, they don't need to derive from the same class, and I already moved the common parts into AWSV4Signer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, thank you!

22 changes: 22 additions & 0 deletions samples/aws/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
## AWS SigV4 Samples

Create an OpenSearch domain in (AWS) which support IAM based AuthN/AuthZ.

```
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export AWS_SESSION_TOKEN=
export AWS_REGION=us-west-2

export SERVICE=es # use "aoss" for OpenSearch Serverless.
export ENDPOINT=https://....us-west-2.es.amazonaws.com

poetry run aws/search-urllib.py
```

This will output the version of OpenSearch and a search result.

```
opensearch: 2.3.0
{'director': 'Bennett Miller', 'title': 'Moneyball', 'year': 2011}
```
Loading
Loading