Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix AuthorizationException with AWSV4SignerAsyncAuth when the doc ID has special characters. #848

Merged
merged 5 commits into from
Nov 27, 2024

Conversation

nathaliellenaa
Copy link
Contributor

@nathaliellenaa nathaliellenaa commented Nov 19, 2024

Description

  1. Fix for [BUG] Bug with AWS if id has special characters #833.
  2. Added tests for _make_path.
  3. Added integration tests from Lifecycle integration tests. #856.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@nathaliellenaa nathaliellenaa changed the title Added failing unit test Fix URL bug in Python client Nov 19, 2024
Copy link

codecov bot commented Nov 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.41%. Comparing base (ba715b9) to head (2de6b44).
Report is 71 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #848      +/-   ##
==========================================
- Coverage   71.95%   70.41%   -1.54%     
==========================================
  Files          91      125      +34     
  Lines        8001     9300    +1299     
==========================================
+ Hits         5757     6549     +792     
- Misses       2244     2751     +507     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@dblock dblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

The responsibility of the signer is to sign data blindly, ie. it's not supposed to know anything about encoding/decoding. This implementation will fail if I actually encode the URL parameter on my own, then pass it into the client.

Look for a test that ensures that the URL being sent is the same URL that is being signed, and a fix where the URL is encoded before both sending and signing.

@nathaliellenaa nathaliellenaa changed the title Fix URL bug in Python client Fix bug where the URL being sent and being signed is different Nov 20, 2024
Copy link
Member

@dblock dblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the fix, see if you can simplify this test further. It's likely that you don't need any fields in the mock session because you're now stubbing signing altogether and thus you can get rid of super().sign_request ... in the mock signer. This also means you don't need the connection because you won't be making actual requests. Finally, don't rely on general exceptions in the test. Your mock signer could raise an expected error, but it would be best if you replace the connection with a mock that just returns without doing anything.

Copy link
Member

@dblock dblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test looks ok, but this is not the right fix as I mentioned in #848 (review).

  1. If the DOC ID is foo%3Abar this code will decode it to foo:bar and fail the same way. We encode the URL somewhere, find where, remove that, then pass the original url into perform_request, and only encode it later.
  2. The same problem (likely) exists in the synchronous path, add tests and a fix in both.

test_opensearchpy/test_async/test_signer.py Outdated Show resolved Hide resolved
test_opensearchpy/test_async/test_signer.py Outdated Show resolved Hide resolved
test_opensearchpy/test_async/test_signer.py Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
@dblock
Copy link
Member

dblock commented Nov 22, 2024

Much better! Get CI to pass.

I tested this using https://github.com/dblock/opensearch-python-client-demo.

osx ~/source/dblock/opensearch-python-client-demo/async (main)$ git diff example.py
diff --git a/async/example.py b/async/example.py
index 2360ebf..1754609 100644
--- a/async/example.py
+++ b/async/example.py
@@ -55,19 +55,23 @@ async def main():
 
   try:
     # index data
+    id = 'id@local'
     document = {'director': 'Bennett Miller', 'title': 'Moneyball', 'year': 2011}
-    await client.index(index=index, body=document, id='1')
+    await client.index(index=index, body=document, id=id)
 
     # wait for the document to index
     sleep(1)
 
+    doc = await client.get(index=index, id=id)
+    print(doc)
+
     # search for the document
     results = await client.search(body={'query': {'match': {'director': 'miller'}}})
     for hit in results['hits']['hits']:
       print(hit['_source'])
 
     # delete the document
-    await client.delete(index=index, id='1')
+    await client.delete(index=index, id=id)
   finally:
     # delete the index
     await client.indices.delete(index=index)
osx ~/source/dblock/opensearch-python-client-demo/async (main)$ git diff Pipfile
diff --git a/async/Pipfile b/async/Pipfile
index f2923c4..35cb795 100644
--- a/async/Pipfile
+++ b/async/Pipfile
@@ -4,10 +4,10 @@ verify_ssl = true
 name = "pypi"
 
 [packages]
-opensearch-py = "2.4.1"
+opensearch-py = {git = "git+https://github.com/nathaliellenaa/opensearch-py.git", ref = "python-client-bug"}
 boto3 = "*"
 
 [dev-packages]
 
 [requires]
-python_version = "3"
\ No newline at end of file
+python_version = "3"

Failed with 2.4.1 and worked great with your changes.

$ pipenv run python example.py
INFO:Found credentials in environment variables.
Using opensearch-py 2.7.2
INFO:GET https://search-.....us-west-2.es.amazonaws.com:443/ [status:200 request:0.304s]
opensearch: 2.7.0
INFO:PUT https://search-.....us-west-2.es.amazonaws.com:443/movies3 [status:200 request:0.478s]
INFO:PUT https://search-.....us-west-2.es.amazonaws.com:443/movies3/_doc/id@local [status:201 request:0.108s]
INFO:GET https://search-.....us-west-2.es.amazonaws.com:443/movies3/_doc/id@local [status:200 request:0.081s]
{'_index': 'movies3', '_id': 'id@local', '_version': 1, '_seq_no': 0, '_primary_term': 1, 'found': True, '_source': {'director': 'Bennett Miller', 'title': 'Moneyball', 'year': 2011}}
INFO:POST https://search-.....us-west-2.es.amazonaws.com:443/_search [status:200 request:0.082s]
{'director': 'Bennett Miller', 'title': 'Moneyball', 'year': 2011}
INFO:DELETE https://search-.....us-west-2.es.amazonaws.com:443/movies3/_doc/id@local [status:200 request:0.081s]
INFO:DELETE https://search-.....us-west-2.es.amazonaws.com:443/movies3 [status:200 request:0.099s]

@nathaliellenaa
Copy link
Contributor Author

nathaliellenaa commented Nov 22, 2024

Failed with 2.4.1 and worked great with your changes.

Do you know why it fails on some specific versions?

@dblock
Copy link
Member

dblock commented Nov 24, 2024

Failed with 2.4.1 and worked great with your changes.

Do you know why it fails on some specific versions?

Looks like legit test failures that have to do with the changes to when URLs get encoded.

=========================== short test summary info ============================
FAILED test_opensearchpy/test_client/test_utils.py::TestMakePath::test_handles_unicode - AssertionError: '/some-index/type/%E4%B8%AD%E6%96%87' != '/some-index/type/中文'
- /some-index/type/%E4%B8%AD%E6%96%87
+ /some-index/type/中文

@nathaliellenaa
Copy link
Contributor Author

Looks like legit test failures that have to do with the changes to when URLs get encoded.

=========================== short test summary info ============================
FAILED test_opensearchpy/test_client/test_utils.py::TestMakePath::test_handles_unicode - AssertionError: '/some-index/type/%E4%B8%AD%E6%96%87' != '/some-index/type/中文'
- /some-index/type/%E4%B8%AD%E6%96%87
+ /some-index/type/中文

Yes, do I need to modify this test since the behavior of the _make_path function has been modified?

@dblock
Copy link
Member

dblock commented Nov 25, 2024

Yes, do I need to modify this test since the behavior of the _make_path function has been modified?

Correct. Iterate till all tests pass.

@dblock
Copy link
Member

dblock commented Nov 25, 2024

FYI, I opened #851 where all tests pass, so something in this code is causing at least some of the failures. Make sure to rebase and diff carefully.

Copy link
Member

@dblock dblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran some tests and this change broke a few things.

Indices with / are not allowed, you get an error. In previous versions we'd encode the parameter and get the expected one.

opensearchpy.exceptions.RequestError: RequestError(400, 'invalid_index_name_exception', 'Invalid index name [movies/shmovies], must not contain the following characters [ , ", *, \\, <, |, ,, >, /, ?]')

With the updated version we don't encode it and get this error, which is incorrect.

opensearchpy.exceptions.RequestError: RequestError(400, 'no handler found for uri [/movies/shmovies] and method [PUT]', 'no handler found for uri [/movies/shmovies] and method [PUT]')

An index name in Russian, хорошо, now fails with an invalid signature even in the sync client. It did work in 2.4.1..

@nathaliellenaa nathaliellenaa changed the title Fix bug where the URL being sent and being signed is different Fix AuthorizationException with AWSV4SignerAsyncAuth when the doc ID has special characters. Nov 27, 2024
@dblock
Copy link
Member

dblock commented Nov 27, 2024

Tested with samples from #857.

osx ~/source/opensearch-project/opensearch-py/dblock-opensearch-py/samples (nathaliellenaa-python-client-bug)$ poetry run python aws/search_urllib3_async.py 
INFO:Found credentials in environment variables.
INFO:GET https://search-dblock-test-opensearch-27-xwd6xrwtykennk3q7uu7t7dtpi.us-west-2.es.amazonaws.com:443/ [status:200 request:0.314s]
opensearch: 2.7.0
INFO:PUT https://search-dblock-test-opensearch-27-xwd6xrwtykennk3q7uu7t7dtpi.us-west-2.es.amazonaws.com:443/movies [status:200 request:0.576s]
INFO:PUT https://search-dblock-test-opensearch-27-xwd6xrwtykennk3q7uu7t7dtpi.us-west-2.es.amazonaws.com:443/movies/_doc/foo%40bar [status:201 request:0.188s]
INFO:POST https://search-dblock-test-opensearch-27-xwd6xrwtykennk3q7uu7t7dtpi.us-west-2.es.amazonaws.com:443/_search [status:200 request:0.102s]
{'director': 'Bennett Miller', 'title': 'Moneyball', 'year': 2011}
INFO:DELETE https://search-dblock-test-opensearch-27-xwd6xrwtykennk3q7uu7t7dtpi.us-west-2.es.amazonaws.com:443/movies/_doc/foo%40bar [status:200 request:0.116s]
INFO:DELETE https://search-dblock-test-opensearch-27-xwd6xrwtykennk3q7uu7t7dtpi.us-west-2.es.amazonaws.com:443/movies [status:200 request:0.153s]
osx ~/source/opensearch-project/opensearch-py/dblock-opensearch-py/samples (nathaliellenaa-python-client-bug)$ 
poetry run python aws/search_urllib3_sync.py 
INFO:Found credentials in environment variables.
INFO:GET https://search-dblock-test-opensearch-27-xwd6xrwtykennk3q7uu7t7dtpi.us-west-2.es.amazonaws.com:443/ [status:200 request:0.321s]
opensearch: 2.7.0
INFO:PUT https://search-dblock-test-opensearch-27-xwd6xrwtykennk3q7uu7t7dtpi.us-west-2.es.amazonaws.com:443/movies [status:200 request:0.445s]
INFO:PUT https://search-dblock-test-opensearch-27-xwd6xrwtykennk3q7uu7t7dtpi.us-west-2.es.amazonaws.com:443/movies/_doc/foo%40bar [status:201 request:0.179s]
INFO:POST https://search-dblock-test-opensearch-27-xwd6xrwtykennk3q7uu7t7dtpi.us-west-2.es.amazonaws.com:443/_search [status:200 request:0.100s]
{'director': 'Bennett Miller', 'title': 'Moneyball', 'year': 2011}
INFO:DELETE https://search-dblock-test-opensearch-27-xwd6xrwtykennk3q7uu7t7dtpi.us-west-2.es.amazonaws.com:443/movies/_doc/foo%40bar [status:200 request:0.099s]
INFO:DELETE https://search-dblock-test-opensearch-27-xwd6xrwtykennk3q7uu7t7dtpi.us-west-2.es.amazonaws.com:443/movies [status:200 request:0.149s]

@dblock dblock merged commit b9e48dc into opensearch-project:main Nov 27, 2024
32 of 33 checks passed
@nathaliellenaa nathaliellenaa deleted the python-client-bug branch November 27, 2024 23:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants