-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
THREESCALE-8404: Add ACL and TLS support for Redis #350
Conversation
docs/configuration.md
Outdated
- Applies to: listener, worker, cron. | ||
- Format: path to file as string. | ||
|
||
### CONFIG_QUEUES_KEY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if all of these make more sense to have standard file locations. And users would be able to mount secrets at these locations with the file contents. Because these files will still need some way to be mounted into the containers anyway. So it would be double the work. Not sure, just a thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's good to have the variables, since they can be very useful for development: the standard file locations you mention would probably be out of the project folder. What we could do is adding a default in the case they are not provided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I meant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in a81a4e9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will we have these variables ever empty? Just wondering how it will work. As if empty, I assume empty value may confuse us. Or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed some changes to ensure empty variables doesn't reach Redis: c429e71.
Some redis methods that used to return boolean in the old version now return whatever the redis server returns. To keep using the boolifyed versions we need to call the same methods but with a `?` suffix. Also, now the redis client only accepts strings as parameters
bc44398
to
b99e307
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review still in progress, first comments
For some reason the call to It needs to be fixed |
The block seemed to be caused by different threads sharing the redis client. async-redis calls `Async` to run the commands, and Async uses Fibers internally, so sharing the redis client between threads led to sharing fibers across threads, which is unsupported.
@@ -46,124 +48,13 @@ def initialize(opts) | |||
port ||= DEFAULT_PORT | |||
|
|||
endpoint = Async::IO::Endpoint.tcp(host, port) | |||
@redis_async = Async::Redis::Client.new( | |||
@redis_async = Concurrent::ThreadLocalVar.new{ Async::Redis::Client.new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with such an approach if using proper async is not trivial. But it appears to me that async-redis
was built to handle concurrency. So mayeb gove a little more explanation why do we need a separate client per thread?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
async-redis
relies on Fibers to implement concurrency, but concurrency doesn't mean parallelism. What async-redis
does, AFAIK, is to switch between fibers on IO operations, e.g. when one fiber is waiting for answer from Redis. But all Fibers run in the same Thread. In Our case, we are calling Storage.instance
from different Threads, @redis_async
was being shared between threads and that caused an error Fiber called across threads
which blocked the test.
@eguzki Does this make sense to you?
@eguzki This fixed it: c94bd27 What do you think? |
You are saying that multiple threads (were) are using the same async redis client. Which threads are those? |
For the failing test, there are three threads accessing the storage in parallel:
|
What changed in your PR to be broken? It makes all the sense that an async client is not shared between threads. Is it shared in (current) If the multithreading is added in the tests, then we should change the tests, not the base code of apisonator. |
I don't know.
Yes
I don't know, maybe I'm wrong. Ideas?
The tests are the same as before, if they passed before, they should pass now to ensure the behavior is the same. |
@akostadinov I created this script to prove # frozen_string_literal: true
require "bundler/inline"
gemfile(true) do
source "https://rubygems.org"
git_source(:github) { |repo| "https://github.com/#{repo}.git" }
gem 'concurrent-ruby', '~> 1.1.6'
gem 'hiredis', '~> 0.6.1'
gem 'async-redis', '~> 0.7.0'
gem 'redis', git: 'https://github.com/3scale/redis-rb', branch: 'apisonator'
end
require 'concurrent'
require 'async/io'
require 'async/redis/client'
require 'hiredis'
require 'redis'
def async_thread_isolate
client = Concurrent::ThreadLocalVar.new do
endpoint = Async::IO::Endpoint.tcp('localhost', 22121)
Async::Redis::Client.new(endpoint, limit: 5)
end
Async { client.value.set("random_key", "a_key") }.wait
result = []
5.times.map { Thread.new {
Async { result.push client.value.get("random_key") }
}}.each(&:join)
puts result
end
def async_thread_share
endpoint = Async::IO::Endpoint.tcp('localhost', 22121)
client = Async::Redis::Client.new(endpoint, limit: 5)
Async { client.set("random_key", "a_key") }.wait
result = []
5.times.map { Thread.new {
Async { result.push client.get("random_key") }
}}.each(&:join)
puts result
end
def sync_thread_share
client = Redis.new({url: 'redis:localhost:22121'})
client.set("random_key", "a_key")
result = []
5.times.map { Thread.new {
result.push client.get("random_key")
}}.each(&:join)
puts result
end
method = ARGV.first || "async_thread_isolate"
send(method) I run it from inside the apisonator container, like this (I saved the script as jlledom@fedora apisonator:master=]$ make dev
[ruby@apisonator-dev apisonator]$ bundle exec script/services start
ruby@apisonator-dev apisonator]$ ruby tmp/async_tests.rb sync_thread_share
Fetching https://github.com/3scale/redis-rb
Fetching gem metadata from https://rubygems.org/.....
Resolving dependencies...
Using nio4r 2.6.1 (was 2.5.9)
Using protocol-redis 0.6.1
Using bundler 2.3.5
Using hiredis 0.6.3
Using redis 4.1.3 from https://github.com/3scale/redis-rb (at apisonator@7210a9d)
Using fiber-annotation 0.2.0
Using fiber-local 1.0.0
Using timers 4.3.5
Using concurrent-ruby 1.1.10 (was 1.1.6)
Using console 1.23.2 (was 1.16.2)
Using async 1.31.0
Using async-pool 0.4.0 (was 0.3.12)
Using async-io 1.38.0 (was 1.34.3)
Using async-redis 0.7.0
a_key
a_key
a_key
a_key
a_key
[ruby@apisonator-dev apisonator]$ ruby tmp/async_tests.rb async_thread_isolate
Fetching https://github.com/3scale/redis-rb
Fetching gem metadata from https://rubygems.org/.....
Resolving dependencies...
Using fiber-annotation 0.2.0
Using fiber-local 1.0.0
Using timers 4.3.5
Using bundler 2.3.5
Using redis 4.1.3 from https://github.com/3scale/redis-rb (at apisonator@7210a9d)
Using nio4r 2.6.1 (was 2.5.9)
Using protocol-redis 0.6.1
Using concurrent-ruby 1.1.10 (was 1.1.6)
Using hiredis 0.6.3
Using console 1.23.2 (was 1.16.2)
Using async 1.31.0
Using async-io 1.38.0 (was 1.34.3)
Using async-pool 0.4.0 (was 0.3.12)
Using async-redis 0.7.0
a_key
a_key
a_key
a_key
a_key
[ruby@apisonator-dev apisonator]$ ruby tmp/async_tests.rb async_thread_share
Fetching https://github.com/3scale/redis-rb
Fetching gem metadata from https://rubygems.org/.....
Resolving dependencies...
Using bundler 2.3.5
Using fiber-local 1.0.0
Using nio4r 2.6.1 (was 2.5.9)
Using timers 4.3.5
Using protocol-redis 0.6.1
Using hiredis 0.6.3
Using redis 4.1.3 from https://github.com/3scale/redis-rb (at apisonator@7210a9d)
Using concurrent-ruby 1.1.10 (was 1.1.6)
Using fiber-annotation 0.2.0
Using console 1.23.2 (was 1.16.2)
Using async 1.31.0
Using async-pool 0.4.0 (was 0.3.12)
Using async-io 1.38.0 (was 1.34.3)
Using async-redis 0.7.0
0.7s warn: Async::Pool::Controller [oid=0xab4] [ec=0xac8] [pid=263] [2023-11-28 16:19:03 +0000]
| Closing resource while still in use!
| {"resource":"#<Async::Redis::Protocol::RESP2::Connection:0x00007fd8d0072ac8>","usage":1}
0.7s warn: Async::Task [oid=0xadc] [ec=0xaf0] [pid=263] [2023-11-28 16:19:03 +0000]
| Task may have ended with unhandled exception.
| NoMethodError: undefined method `zero?' for nil:NilClass
| → /home/ruby/.rbenv/versions/3.0.2/lib/ruby/gems/3.0.0/gems/async-pool-0.4.0/lib/async/pool/controller.rb:236 in `reuse'
| /home/ruby/.rbenv/versions/3.0.2/lib/ruby/gems/3.0.0/gems/async-pool-0.4.0/lib/async/pool/controller.rb:112 in `release'
| /home/ruby/.rbenv/versions/3.0.2/lib/ruby/gems/3.0.0/gems/async-pool-0.4.0/lib/async/pool/controller.rb:102 in `ensure in acquire'
| /home/ruby/.rbenv/versions/3.0.2/lib/ruby/gems/3.0.0/gems/async-pool-0.4.0/lib/async/pool/controller.rb:102 in `acquire'
| /home/ruby/.rbenv/versions/3.0.2/lib/ruby/gems/3.0.0/gems/async-redis-0.7.0/lib/async/redis/client.rb:119 in `call'
| /home/ruby/.rbenv/versions/3.0.2/lib/ruby/gems/3.0.0/gems/protocol-redis-0.6.1/lib/protocol/redis/methods/strings.rb:62 in `get'
| tmp/async_tests.rb:46 in `block (3 levels) in async_thread_share'
| /home/ruby/.rbenv/versions/3.0.2/lib/ruby/gems/3.0.0/gems/async-1.31.0/lib/async/task.rb:261 in `block in make_fiber'
a_key
a_key
a_key
a_key
I haven't found any mentions to thread-safety in the It seems logical to me: an I still don't have an answer to why does it work in production, where |
I'm inclined to think that the reason not to fail in production is either that we don't really use threaded workers there, or load is not high enough or pure luck. I agree that At this part there is a small issue with the test bed. 5.times.map { Thread.new {
Async { result.push client.get("random_key") }
}}.each(&:join)
puts result We wait for each thread but don't wait for each Async task. So Threads can die and the program may exit before all async tasks have completed. And we may (or may not) see the error message you posted above. If we add a But a concurrency of 5 is not very stressing. So I changed it a little to create 55 async tasks in 55 threads, then wait for all threads and then for all async tasks before printing the results. Like this: 55.times.map { Thread.new {
Async { result.push client.get("random_key") }
}}.map(&:value).map(&:wait)
puts result And the result was quiet telling. First of all I saw in console several messages:
And additionally the process has hanged indefinitely until I interrupted it with Needless to say, the isolated client per thread method worked without issue, see https://gist.github.com/akostadinov/d7a0b00fabb48ca74f08fb54090993e7/revisions#diff-240e2d0b41b99e3ebf81ae17215a432402bfe29406f7e71df271abee4b1b544f Unfortunately I'm looking at the
This reads to me as this is a known design limitation. So overall the taken approach by you - create an async client per thread and avoid sneaky modifications of instance variables, are the right (if not the only) way to go forward. Thank you for doing a great detective job and for the nice test bed for thread safety. An interesting question still is, why test in the P.S. I'm running the script locally, just using Ruby 3.0.4, it is reproducible like that just fine at least for me. |
@akostadinov Thanks for your research. In fact, calling About why it doesn't fail in master, I have a couple of theories. In About the listener process, I'm not sure how multithreaded is it really, I guess Puma creates a thread for every request, but as proven in the test script we wrote, it can cope fine with a certain amount of parallelism without failing, maybe we never reach that workload in production. On the other hand, there's that problem about all threads sending their commands to the pipeline whenever any thread opens any pipeline anywhere, and those commands are eventually sent to redis one after another from a single thread when the pipeline closes. That would contribute to reduce the number of situations where there are multiple threads actually accessing |
For sentinels with ACL and TLS
A test fails in CircleCI due to this
They are expected.
end | ||
|
||
# Authenticated RESP2 if credentials are provided, RESP2 otherwise | ||
def make_redis_protocol(opts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this code just new custom development, or copy from upstream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's custom. The async-redis
client is so simple we need to implement the most basic things like sending the credentials to the redis server when it requires authentication. We also have to handle the SSL connections ourselves and. All private
methods in the Client
class are there to deal with these things.
The code it's base in the examples from the repo:
https://github.com/socketry/async-redis/blob/main/examples/auth/protocol.rb
https://github.com/socketry/async-redis/tree/main/guides/getting-started
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, that's sad that it doesn't even do the basic stuff...
Thanks for the info!
private_constant :METHODS_TO_BOOLIFY | ||
|
||
METHODS_TO_BOOLIFY.each do |method| | ||
command = method.to_s.delete('?') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the story behind all the methods that got an extra ?
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, probably because of
apisonator/lib/3scale/backend/storage_async/client.rb
Lines 46 to 49 in 8e26728
# 2) Methods that need to be "boolified". These are methods for which | |
# redis-rb returns a boolean, but redis just returns an integer. | |
# For example, Redis returns 0 or 1 for the EXISTS command, but | |
# redis-rb transforms that into a boolean. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aisonator uses the API from the redis-rb
gem, which we use in sync mode. In order to work in both sync and async modes, our custom async mode replicates the redis-rb API. The Methods
class implement this.
The API changed in redis-rb 5, now they addded the methods :exists?
:sadd?
and :srem?
https://github.com/redis/redis-rb/blob/master/CHANGELOG.md#500. We have to adapt our API to reflect that.
For the commands coming with an ending ?
we need to remove it to get the actual command to send to Redis.
# | ||
# These are the different cases: | ||
# 1) Methods that can be called directly. For example SET: | ||
# @redis_async.call('SET', some_key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# @redis_async.call('SET', some_key) | |
# call('SET', some_key) |
Not sure if that's correct, but assuming that there is no mention of @redis_async
in this file, and the calls are "delegated" to @redis_async
anyway...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I copied that comment from the old file and didn't check it.
lib/3scale/backend/service.rb
Outdated
@@ -277,7 +277,7 @@ def persist_attribute(attribute, value, ignore_nils = false) | |||
def persist_sets | |||
storage.sadd storage_key_by_provider(:ids), id | |||
storage.sadd encode_key("services_set"), id | |||
storage.sadd encode_key("provider_keys_set"), provider_key | |||
storage.sadd encode_key("provider_keys_set"), provider_key unless provider_key.nil? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can provider key be nil? Should we conditionally add or should we raise an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replied at #384 (comment)
lib/3scale/backend/job_fetcher.rb
Outdated
@@ -85,6 +85,8 @@ def try_pop_from_queue(max=nil) | |||
def wait_pop_from_queue | |||
queue, encoded = @redis.blpop(*@queues, timeout: @fetch_timeout) | |||
[queue, [encoded]] if encoded | |||
rescue RedisClient::ReadTimeoutError => _e | |||
# Do nothing, we expect frequent timeouts from the sync client when the queue is empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, when @fetch_timeout
is reached, the method will return nil
, not raise. A read timeout on the other hand is a low-level TCP timeout that shouldn't happen regularly.
On the other hand, we don't need to be very much bothered by a connection timeout, if we can just reconnect. It's good to be resilient in case of network glitches.
I would suggest though to log the error, can be without a trace, just dome text like:
Que redis (ip/hostname of server if possible) connection timeout at #{FILE}:#{LINE}
In this way we can have an indication of network stability and see whether particular servers or locations are more problematic. This will help debugging and improving the cluster setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replied at #384 (comment)
# Conflicts: # lib/3scale/backend/storage_async/client.rb
Closing this PR. I created #391 to make this easier to review: |
This is a work in progress to allow Apisonator to connect to Redis using ACL and TLS credentials.
Jira Issue:
THREESCALE-8404
Summary of changes:
redis
4.3.1 fork and use upstreamredis
5.0.7:sadd
returned a boolean inredis
4.3.1, but now they return what they get from the Redis server.redis
5.0.7 includes boolified versions of those methods, like:sadd?
, which behave like the old:sadd
redis
4.6.0 introduced some changes in thepipelined
method, (Check Changelog).redis
, andasync-redis
gems provide different interfaces. We have an async client that reproduces theredis
interface and translates it toasync-redis
calls.async-redis
client (c94bd27)redis
gem, the async mode uses theasync-redis
gem.async-redis
hasn't been updated, but I modified our client to add support for TLS and ACL.Connecting to Redis using TLS and ACL is done. Running Apisonator in sync mode works fine and all tests pass in sync and async mode.
I'm blocked due to a deadlock in the async mode which couldn't fix yet. To reproduce it, just run the tests with
make test
. The test failing is spec/integration/worker_async_spec.rbPending tasks:
test/unit/storage_sync_test.rb
to add tests for TLS connections..3scale_backend.conf
file.ca_path
ssl paramredis
andresque
forks and started using the upstream gems.ERR AUTH
errorAbout the forks mentioned above, the problem is we were using a forked version of the
redis
gem before: https://github.com/3scale/redis-rbThat fork included 3 PRs which are not upstream:
Since the fork is stuck in
redis-rb 4.1.3
and we need>= 5
. Those three PRs are lost in this PR, so we need to investigate what were they for and whether are they still required.The same happens with
resque
. We were using a fork before: https://github.com/3scale/resqueThe fork included one PR and a few commits not in upstream:
Configure server and client
This requires three steps:
3.1. TLS Client
3.2. Mutual TLS Client
Generate keys and certificates:
The first thing we need is a certification authority. For that, we need to create its key and certificate.
openssl genrsa -out ca-root-key.pem 4096
openssl req -new -x509 -days 365 -key ca-root-key.pem -out ca-root-cert.pem
Then we'll have to create a key and certificate for the server, and sign it with our CA. Ensure setting
localhost
asCN
, if that's the domain you're going to use to connect to Redis:openssl genrsa -out redis.key 4096
openssl req -new -key redis.key -out redis.csr
openssl x509 -req -days 365 -in redis.csr -CA ca-root-cert.pem -CAkey ca-root-key.pem -CAcreateserial -out redis.crt
Configure Redis:
First, we need a Redis configuration file which enables ACL and TLS, defines the users and sets the certificate the server will use and the CA the server will trust. This is the minimal
redis.conf
for that purpose:Next step is to launch a container that has access to the certificates, keys and Redis config files created above. We create a volume for each file and modify the container start command to reference the configuration file we created. This is how I configured the redis pod in my
docker-compose.yml
. But the container can be launched by other ways:The
:z
at the end of the volume definitions is required when using Selinux, for instance in Fedora.Configure Apisonator:
Apisonator can be configured as a regular TLS client, when only the server needs a certificate, or a Mutual TLS Client, where both client and server must provide a certificate.
As TLS Client:
What I did is create a
~/.3scale-backend.conf
, based on openshift/3scale_backend.conf and set these properties:Note how the
ssl_params
sections are not always required. We are using them here because the server is using a certificate signed by an unknown authority we've just created, so we need to explicitly tell Porta to trust that authority. If the server were using a certificate signed by any of the well-known CAs, thessl_params
sections could be omitted.Another possible situation is when the server is using a self-signed certificate. When this happens, there's no trusted CA to add to
ssl_params
, so the only way to go is to skip certificate validation. Unfortunately we don't support this yet because theredis-client
gem still haven't added support for verify modes (see: redis-rb/redis-client#133) so we can't setSSL_VERIFY_NONE
to the client. Anyway this is only useful for development purposes, we'll have to use our own CA for development for the moment.As Mutual TLS Client:
Generate a key and a signed certificate for Apisonator:
openssl genrsa -out apisonator.key 4096
openssl req -new -key apisonator.key -out apisonator.csr
openssl x509 -req -days 365 -in apisonator.csr -CA ca-root-cert.pem -CAkey ca-root-key.pem -CAcreateserial -out apisonator.crt
Then update the config file to set the next properties:
As mentioned above, the
ca_file
fields are only required when the server isn't using a certificate signed by one of the well-known CAs.It could be also configured using the new ENV variables (see docs/configuration.md)