-
-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supporting OpenSearch & ElasticSearch #311
Conversation
Hiya, thanks a ton for making this PR! As you can see from the age of the ticket, it's really something we've been dragging our heels on. The problem I have is that both us and a few of our customers are using Elastic cloud, the hosted offering by Elastic. Which - you guessed it - uses some sort of extra auth mechanism (cf. opensearch-project/opensearch-py#104). So I'm a little worried that switching to this lib will make it impossible to connect to our cluster. I think there's a couple of options: |
Hm, there's one more option, which I guess is to "crack" the ES client lib to remove the OpenSearch check. |
I kinda think option B would be ideal here because I can see that these two diverging more and more. Both on the library side and the engine side. What I am trying to say is that: we might be able to get away with using opensearch-py or tricking elastic's client. However, it will be a temporary solution. If you are happy with option B, I would get into implementing it*. Also, I only checked stuff up to * any preferences about how? a base class with two subclasses and an env var to pick between them? |
4db173a
to
753d6c7
Compare
Hey! After some reflection I also agree that B's the way to go. Both ES and OS are very vocal that these are starting to be two different applications, and so the layering will have to grow deeper and deeper over time. Regarding how: I guess class-based makes a lot of sense. One thing we should try and do is to wrap the exceptions that are being thrown and trade them in for our own If you do indexing and |
Great. This is gonna be my todo list:
|
Just an initial implementation to see if it is possible to support both OpenSearch and ElasticSearch in the upstream project itself.
ok, after a bit of focusing on other stuff, I can work on this again. In the meantime, we have been running yente on ECS with an OpenSearch cluster. No issues and everything works as expected. However, I definitely like to wrap up what we discussed and upstream OS' support. Not a fan of running forks. |
We're in a similar situation. |
Hello all! We've now implemented OpenSearch support in |
Thank you @pudo. We are definitely going to take a stab at it and will get back with patches if we notice any issues. :-) |
We'll try to set it up in our test environment as well, and report back!
Thank you!
Üdvözlettel:
Kizman József
…On Wed, Jul 17, 2024 at 12:28 PM Behrooz Shabani ***@***.***> wrote:
Thank you @pudo <https://github.com/pudo>. We are definitely going to
take a stab at it and will get back with patches if we notice any issues.
:-)
—
Reply to this email directly, view it on GitHub
<#311 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALI4WAR6TW7YWNNNS3JEXK3ZMZBOVAVCNFSM6AAAAABK76OLYCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZSHE3TGMZUG4>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Here's the announcement: https://www.opensanctions.org/articles/2024-07-24-yente4/ |
We're running yente in AWS EKS, would be great if it could use IAM Roles
for service accounts instead of configuring static access keys.
https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html
According to this it does not seem to support it yet:
https://www.opensanctions.org/faq/83/opensearch/
Üdvözlettel:
Kizman József
…On Wed, Jul 24, 2024 at 10:08 AM Friedrich Lindenberg < ***@***.***> wrote:
Here's the announcement:
https://www.opensanctions.org/articles/2024-07-24-yente4/
—
Reply to this email directly, view it on GitHub
<#311 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALI4WARZHPT6OFU52JE2OCLZN5OILAVCNFSM6AAAAABK76OLYCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBXGE3TGNBZHA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
That may be an issue with the documentation rather than the service: if you set |
Cheers!
Will definitely test!
Thanks!
…On Thu, Jul 25, 2024 at 3:04 PM Friedrich Lindenberg < ***@***.***> wrote:
That may be an issue with the documentation rather than the service: if
you set YENTE_OPENSEARCH_SERVICE and YENTE_OPENSEARCH_REGION then it
should be using whatever boto-compatible credentials it can find in the
environment - I assume this also includes workload identity inside EKS.
Worth a test?
—
Reply to this email directly, view it on GitHub
<#311 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALI4WAV3X45LQWRZQRJ4Z7TZODZS7AVCNFSM6AAAAABK76OLYCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJQGI3TAOBVGE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Ran a test today with 4.0.0 on EKS and service account, resulting in this
error when running a reindex:
Traceback (most recent call last):
File "/app/yente/provider/__init__.py", line 49, in with_provider
provider = await _create_provider()
^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/yente/provider/__init__.py", line 25, in _create_provider
return await OpenSearchProvider.create()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/yente/provider/opensearch.py", line 54, in create
await es.cluster.health(wait_for_status="yellow", timeout=5)
File
"/venv/lib/python3.12/site-packages/opensearchpy/_async/client/cluster.py",
line 131, in health
return await self.transport.perform_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv/lib/python3.12/site-packages/opensearchpy/_async/transport.py",
line 375, in perform_request
await self._async_call()
File "/venv/lib/python3.12/site-packages/opensearchpy/_async/transport.py",
line 198, in _async_call
await self._async_init()
File "/venv/lib/python3.12/site-packages/opensearchpy/_async/transport.py",
line 163, in _async_init
self.set_connections(self.hosts)
File "/venv/lib/python3.12/site-packages/opensearchpy/transport.py", line
255, in set_connections
connections = list(zip(map(_create_connection, hosts), hosts))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv/lib/python3.12/site-packages/opensearchpy/transport.py", line
253, in _create_connection
return self.connection_class(metrics=self.metrics, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/venv/lib/python3.12/site-packages/opensearchpy/_async/http_aiohttp.py",
line 149, in __init__
self.headers.update(urllib3.make_headers(basic_auth=http_auth))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv/lib/python3.12/site-packages/urllib3/util/request.py", line 121,
in make_headers
] = f"Basic {b64encode(basic_auth.encode('latin-1')).decode()}"
^^^^^^^^^^^^^^^^^
AttributeError: 'AWSV4SignerAuth' object has no attribute 'encode'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/venv/bin/yente", line 33, in <module>
sys.exit(load_entry_point('yente', 'console_scripts', 'yente')())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv/lib/python3.12/site-packages/click/core.py", line 1157, in
__call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv/lib/python3.12/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/venv/lib/python3.12/site-packages/click/core.py", line 1688, in
invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv/lib/python3.12/site-packages/click/core.py", line 1434, in
invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv/lib/python3.12/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/yente/cli.py", line 44, in reindex
asyncio.run(update_index(force=force))
File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/base_events.py", line 687, in
run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/app/yente/search/indexer.py", line 204, in update_index
async with with_provider() as provider:
File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/app/yente/provider/__init__.py", line 52, in with_provider
await provider.close()
^^^^^^^^
UnboundLocalError: cannot access local variable 'provider' where it is not
associated with a value
ENV Settings look like this:
export set YENTE_INDEX_TYPE="opensearch"
export set YENTE_INDEX_URL="https://redacted"
export set YENTE_OPENSEARCH_REGION="eu-central-1"
export set YENTE_OPENSEARCH_SERVICE="es"
Seems to be related to this fix:
opensearch-project/opensearch-py#547
…On Thu, Jul 25, 2024 at 7:36 PM József Kizman ***@***.***> wrote:
Cheers!
Will definitely test!
Thanks!
On Thu, Jul 25, 2024 at 3:04 PM Friedrich Lindenberg <
***@***.***> wrote:
> That may be an issue with the documentation rather than the service: if
> you set YENTE_OPENSEARCH_SERVICE and YENTE_OPENSEARCH_REGION then it
> should be using whatever boto-compatible credentials it can find in the
> environment - I assume this also includes workload identity inside EKS.
> Worth a test?
>
> —
> Reply to this email directly, view it on GitHub
> <#311 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ALI4WAV3X45LQWRZQRJ4Z7TZODZS7AVCNFSM6AAAAABK76OLYCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJQGI3TAOBVGE>
> .
> You are receiving this because you commented.Message ID:
> ***@***.***>
>
|
Tried 4.1 release, and getting a different error now: Traceback (most recent call last): |
o/. I just wanted to run this by you before we push any further with it.
Background: we run AWS and we prefer to use managed services as much as possible. As you know, ElasticSearch is not available as an AWS managed service anymore and they offer OpenSearch. So we were wondering if we could get the basic functionality working with OpenSearch and the result is the commit that comes with this PR.
However, obviously, I understand that you would not want to move away from ElasticSearch in your own infrastructure. Hence, I would like to discuss the possibility of supporting both engines in the code. That way, we do not have to fork the code and maintain it ourselves.
After sorting out OpenSearch, we want to help with or implement incremental scans.
rel: #113