Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add admin-port to let administrator connect to the server even maxclient number is reached #1120

Open
wants to merge 5 commits into
base: unstable
Choose a base branch
from

Conversation

hwware
Copy link
Member

@hwware hwware commented Oct 3, 2024

Background:

The parameter "maxclient" default value is 10000. Client administrators could set higher value to accept more client connections.
However, there is one situation: If the connection number of client reaches the maxclient, then even administrators have no chance to connect to the server to process some urgent issues or update some configurations.

To solve above case, by connecting to the admin port, an administrator or cloud provider has the chance to connect to the server even the max number of connected clients reaches

Of course, there is one risk by introducing the admin port: hacker has one more chance to invade the Valkey server, but
I think this kind of risky is less than what we thought. Because in standlone mode, primary node could expose ports to multiply replicas and sentinel nodes, and in cluster mode. primary node has port to connect with other nodes. Thus one more port to connect with administrator is not a big issue.

Copy link

codecov bot commented Oct 3, 2024

Codecov Report

Attention: Patch coverage is 65.21739% with 8 lines in your changes missing coverage. Please review.

Project coverage is 70.68%. Comparing base (6b3e122) to head (8ac9019).
Report is 1 commits behind head on unstable.

Files with missing lines Patch % Lines
src/networking.c 36.36% 7 Missing ⚠️
src/server.c 91.66% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1120      +/-   ##
============================================
- Coverage     70.83%   70.68%   -0.16%     
============================================
  Files           118      118              
  Lines         63561    63583      +22     
============================================
- Hits          45025    44943      -82     
- Misses        18536    18640     +104     
Files with missing lines Coverage Δ
src/config.c 77.63% <ø> (ø)
src/server.h 100.00% <ø> (ø)
src/server.c 87.40% <91.66%> (-0.02%) ⬇️
src/networking.c 88.29% <36.36%> (-0.24%) ⬇️

... and 13 files with indirect coverage changes

/* Limit the number of connections we take at the same time.
*
* Admission control will happen before a client is created and connAccept()
* called, because we don't want to even start transport-level negotiation
* if rejected. */
if (listLength(server.clients) + getClusterConnectionsCount() >= server.maxclients) {
if (port != server.admin_port && (listLength(server.clients) + getClusterConnectionsCount() >= server.maxclients)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

connections from admin-port can consume clients out of maxclients, but the eventloop's size is fixed, do we need to make eventloop to be auto resizing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @soloestoy,

Yes, you’re correct—the event loop size is fixed. However, since the port is known only to the administrator and accepts only local connections, we don't anticipate a large number of clients on the port. Additionally, we've reserved 96 extra FDs in the event loop, which I believe should be sufficient to handle the expected client load on the port. ref:

server.el = aeCreateEventLoop(server.maxclients + CONFIG_FDSET_INCR);

That said, I think I overlooked this. Do you think we should track and add a condition for the number of clients connected to the port?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a fixed number and limit the number of admin connections. We don't need to count the admin connections separately. We can just accept connections on the admin port as long as the total number of connections is less than for example server.maxclients + 20.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a check for the number of connections accepted on the port. Currently, it's set to 20, but I believe 10 would be sufficient. Additionally, the event loop size also been updated based on the maximum number of clients accepted on the port.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will also need to track admin clients separately from server.clients or we are eating into the actual end-user client quote, something like below, assuming server.clients is the total

if (port != server.admin_port && 
    (listLength(server.clients) - listLength(server.admin_clients) + getClusterConnectionsCount() >= server.maxclients)) 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PingXie Why is that a problem? Do you think clients are surprised they can make fewer connections because an admin is connected?

Or do you think the admins need to be completely invisible to other clients, like not shown in CLIENT LIST, etc.?

I think the admins can be visible and I don't see why it's a problem that a admin eating the users' quota is a problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @PingXie, We have fixed the number of admin clients to 20(it can be adjusted) and could not see how it will impact the client quota. As Viktor pointed, do you want to hide admins from end user clients?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At amazon we have a fixed number of clients ontop of what is requested as well (although I believe we chose 256). It's worth mentioning that clusterbus connections also eat into the overhead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this feature could solve part of this problem in cluster mode.

Copy link
Member

@PingXie PingXie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we close the discussion at valkey-io/valkey-rfc#3 (comment)?

/* Limit the number of connections we take at the same time.
*
* Admission control will happen before a client is created and connAccept()
* called, because we don't want to even start transport-level negotiation
* if rejected. */
if (listLength(server.clients) + getClusterConnectionsCount() >= server.maxclients) {
if (port != server.admin_port && (listLength(server.clients) + getClusterConnectionsCount() >= server.maxclients)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will also need to track admin clients separately from server.clients or we are eating into the actual end-user client quote, something like below, assuming server.clients is the total

if (port != server.admin_port && 
    (listLength(server.clients) - listLength(server.admin_clients) + getClusterConnectionsCount() >= server.maxclients)) 

@madolson madolson added the major-decision-pending Major decision pending by TSC team label Oct 14, 2024
@hwware hwware force-pushed the admin-port-imple branch 2 times, most recently from 632e9e0 to e12dc1e Compare November 3, 2024 07:00
@hwware hwware force-pushed the admin-port-imple branch 3 times, most recently from 23e662e to 66129c6 Compare November 13, 2024 15:51
@hwware hwware force-pushed the admin-port-imple branch 2 times, most recently from 08aebb5 to 8ac9019 Compare December 5, 2024 20:06
@hwware hwware changed the title Add admin-port feature Add admin-port to let administrator connect to the server even maxclient number is reached Dec 5, 2024
@hwware
Copy link
Member Author

hwware commented Dec 12, 2024

Should we close the discussion at valkey-io/valkey-rfc#3 (comment)?

It is closed right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major-decision-pending Major decision pending by TSC team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants