Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle SSL_WANT_READ/WRITE errors #619

Merged
merged 1 commit into from
Jun 27, 2020
Merged

Conversation

jeblair
Copy link
Contributor

@jeblair jeblair commented Jun 22, 2020

This adds a simple recovery path in case an SSL connection receives
an SSL_WANT_READ or WRITE error. Either error can occur while
reading or writing. The error indicates that the underlying
operation should be retried after the socket is once again readable
or writable (per the error code).

Fixes #618

@ceache
Copy link
Contributor

ceache commented Jun 22, 2020

Hi, thanks a lot for the issue report and the PR.

Do you mind amending your commit message to follow our recommendations?
Specifically https://github.com/python-zk/kazoo/blob/master/CONTRIBUTING.md#git-commit-guidelines

Secondly, as I understand it, this bug only tiggers on "large" operations (that do not fit in the SSL socket buffer, requiring a second read/write from the underlying socket).
Would you mind engineering a tests case for this? You mention creating 8k nodes would trigger the issue reliably.
This issue underscore the subtleties of SSL sockets that I would like permanently and regularly exercised in our test cases.

Thanks!

@jeblair
Copy link
Contributor Author

jeblair commented Jun 22, 2020

No problem on the commit message, sorry I didn't check the file first.

I'll look into adding a test.

@jeblair
Copy link
Contributor Author

jeblair commented Jun 22, 2020

Quick clarification: are there any existing SSL/TLS tests?

kazoo/tests/test_client.py Outdated Show resolved Hide resolved
kazoo/testing/harness.py Outdated Show resolved Hide resolved
@jeblair
Copy link
Contributor Author

jeblair commented Jun 22, 2020

I didn't see any existing tests, so I added some commits to add an initial framework for SSL tests. The fist upgrades the version of Zookeeper used by the tests to 3.6.1 since a newer version is required for SSL support at all. The second updates the server to listen on an SSL port and adds a single simple client test over SSL.

Note that I have not yet written the requested test (8k children); I wanted to get this up before the end of my day.

@jeblair
Copy link
Contributor Author

jeblair commented Jun 22, 2020

I wrote the following test, and ran it with the ssl_want_read fix reverted, and it succeeded regardless (I also tried variations with more children):

    def test_many_children(self):
        client = self.client
        path = client.create("/test")
        for x in range(8200):
            path = client.create("/test/000000000000000000%s" % (x,))
        self.client.restart()
        client.get_children("/test")

While the error is consistently reproducible in our production environment, it may be more difficult to reproduce is other environments, such as tests. Note that in particular, I ran my production test on a host that was remote from the Zookeeper server. That introduces a considerable amount of delay (compared to a localhost connection) which may cause it more likely to hit the error.

I'd be happy to push up a commit with that test, but it may not be worth doing since it doesn't confirm the behavior. I'll omit it for now, but let me know if you would rather me add it anyway.

@jeblair jeblair force-pushed the ssl branch 2 times, most recently from 24df233 to 9bae5d4 Compare June 23, 2020 21:35
@ceache
Copy link
Contributor

ceache commented Jun 26, 2020

Thanks you @jeblair , it seems I sent you down a much bigger path than I expected.
I think your explanation of why you cannot reproduce the error makes perfect sense and, I don't see how you can reproduce this with out current test harness. Thank you very much for your contribution.

Code wise, things look good to me (other than the minor missing pytest module.
My only uncertainty with your PR lies in the sslfixture module: I don't know how the other @python-zk/maintainers feel about having certificates stored in the repo.

I'll let them comment.

@jeblair
Copy link
Contributor Author

jeblair commented Jun 26, 2020

No problem -- I think adding more SSL tests is great and I'm happy to help. :)

Given that we're unlikely to be able to exercise this change in unit tests, and adding the SSL tests may need a bit more work (especially with respect to Zookeeper versions, certificates in tests, etc), I went a head and split that into a new PR. So this one is now back to just the want_read/write fix. The added SSL testing is in #620.

This adds a simple recovery path in case an SSL connection receives
an SSL_WANT_READ or WRITE error.  Either error can occur while
reading or writing.  The error indicates that the underlying
operation should be retried after the socket is once again readable
or writable (per the error code).

Closes python-zk#618
Copy link
Member

@StephenSorriaux StephenSorriaux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your work in this fix!

@StephenSorriaux StephenSorriaux merged commit cbdc474 into python-zk:master Jun 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ConnectionLoss when reading large data over SSL
4 participants