Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUILD BREAK] SeedHostsResolver failed to resolve host | OpenSSLTest test failures #2612

Closed
peternied opened this issue Mar 30, 2023 · 5 comments
Assignees
Labels
bug Something isn't working triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable.

Comments

@peternied
Copy link
Member

What is the bug?
Both these tests have started to fail on main and 2.x branches

  • org.opensearch.security.ssl.OpenSSLTest.testNodeClientSSL
  • org.opensearch.security.ssl.OpenSSLTest.testNodeClientSSLwithOpenSslTLSv13

I believe this is the new error that is occuring

[2023-03-30T15:13:53,991][WARN ][org.opensearch.discovery.SeedHostsResolver] failed to resolve host [null:0]
  1> java.net.UnknownHostException: null
  1> 	at java.net.InetAddress$CachedAddresses.get(InetAddress.java:797) ~[?:?]
  1> 	at java.net.InetAddress.getAllByName0(InetAddress.java:1524) ~[?:?]
  1> 	at java.net.InetAddress.getAllByName(InetAddress.java:1382) ~[?:?]
  1> 	at java.net.InetAddress.getAllByName(InetAddress.java:1306) ~[?:?]
  1> 	at org.opensearch.transport.TcpTransport.parse(TcpTransport.java:615) ~[opensearch-2.7.0-SNAPSHOT.jar:2.7.0-SNAPSHOT]
  1> 	at org.opensearch.transport.TcpTransport.addressesFromString(TcpTransport.java:557) ~[opensearch-2.7.0-SNAPSHOT.jar:2.7.0-SNAPSHOT]
  1> 	at org.opensearch.transport.TransportService.addressesFromString(TransportService.java:1051) ~[opensearch-2.7.0-SNAPSHOT.jar:2.7.0-SNAPSHOT]
  1> 	at org.opensearch.discovery.SeedHostsResolver.lambda$resolveHostsLists$0(SeedHostsResolver.java:182) ~[opensearch-2.7.0-SNAPSHOT.jar:2.7.0-SNAPSHOT]
  1> 	at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]

How can one reproduce the bug?
It reproduces 100% when run on GitHub Action, but does not reproduce locally for me, investigating

Do you have any additional context?

See failure log

@peternied peternied added bug Something isn't working untriaged Require the attention of the repository maintainers and may need to be prioritized labels Mar 30, 2023
@peternied peternied changed the title [BUG] [BUG] SeedHostsResolver failed to resolve host | OpenSSLTest test failures Mar 30, 2023
@cwperks
Copy link
Member

cwperks commented Mar 30, 2023

The failing tests are:

  • OpenSSLTest.testNodeClientSSLwithOpenSslTLSv13
  • OpenSSLTest.testNodeClientSSL

In both instances, the test create a three node ssl only cluster with 1 manager and 2 data nodes and in the test case they try adding a 4th node to the cluster, but the 4th node cannot discover the cluster manager.

These tests only run on JDK < 12.

@DarshitChanpura
Copy link
Member

Another re-occurring error log:

Error: 023-03-30T16:20:51,711][ERROR][org.opensearch.security.ssl.DefaultSecurityKeyStore] Your keystore or PEM does not contain a key. If you specified a key password, try removing it. If you did not specify a key password, perhaps you need to if the key is in fact password-protected. Maybe you just confused keys and certificates.

Seem like an issue parsing node-4.key during test execution

@RyanL1997
Copy link
Collaborator

Based on the log, it seems like we couldn't setup the openSSL connection correctly. For this error:

[ERROR][org.opensearch.security.ssl.http.netty.SecuritySSLNettyHttpServerTransport] Exception during establishing a SSL connection: io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 474554202f5f6f70656e64697374726f2f5f73656375726974792f73736c696e666f3f70726574747920485454502f312e310d0a486f73743a203132372e302e302e313a31303237340d0a436f6e6e656374696f6e3a204b6565702d416c6976650d0a557365722d4167656e743a204170616368652d48747470436c69656e742f342e352e313320284a6176612f31312e302e3138290d0a4163636570742d456e636f64696e673a20677a69702c6465666c6174650d0a0d0a

It shows like there is a communication mismatch. I decoded the record, and it gives a http request:

GET /_opendistro/_security/sslinfo?pretty HTTP/1.1
Host: 127.0.0.1:8628
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.5.13 (Java/11.0.18)
Accept-Encoding: gzip,deflate

I think by enabling the openSSL, we need a https request instead.

@peternied
Copy link
Member Author

Looks like root cause was:

I've updated comments on how I think we can get unblock #2598 (comment)

Impacted PRs:

@peternied peternied added triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable. and removed untriaged Require the attention of the repository maintainers and may need to be prioritized labels Mar 30, 2023
@peternied peternied changed the title [BUG] SeedHostsResolver failed to resolve host | OpenSSLTest test failures [BUILD BREAK] SeedHostsResolver failed to resolve host | OpenSSLTest test failures Mar 30, 2023
@RyanL1997
Copy link
Collaborator

RyanL1997 commented Mar 31, 2023

I went through the OpenSSLTest.testNodeClientSSLwithOpenSslTLSv13 and OpenSSLTest.testNodeClientSSL and I notice that the test was failing on the step of adding 4th node in to the cluster which has a default configuration of 1 clustermanager node + 2 data nodes. It would give a exception: ClusterManagerNotDiscoveredException. This means that the new node we were trying to add could not find the the ClusterManager node as a reference, so that it could not be added to the cluster.

On behalf of opensearch-project/OpenSearch#6331, We have fixed the failure of OpenSSLTest by cherry pick @peternied's PR (#2062 ) . Because instead of using legacy settings, we have switched to use setting with new terminology for adding the 4th node in the above test (See: https://github.com/opensearch-project/security/pull/2598/files#diff-1cc5ae19a3dd9f2679af9bfaece275a7795a8f1ad6838f88e3db88036fa205d0L531-L535). However, I do think that the change in core should also support legacy settings until the last release version of 2.x line. I will follow up with the core team to see if we/they need to do something for supporting these legacy terminologies in 2.x branch.

We can close this issue for now, once we merge #2598. And I will create a separate issue to track that if needed. Thanks for the support team!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable.
Projects
None yet
Development

No branches or pull requests

4 participants