-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPCC-32184 socket nonblocking improvements #18841
HPCC-32184 socket nonblocking improvements #18841
Conversation
Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-32184 Jirabot Action Result: |
@mckellyln - please take a look. Thanks. |
I am seeing this often in the dafilesrv log from the hpcc4j client test (non-secure) - altho things seem to work ok - |
And 10000s of these in roxie log - |
6e8a360
to
dd72b2e
Compare
system/jlib/jsocket.cpp
Outdated
|
||
//// | ||
void ScopedNonBlockingMode::init(ISocket *_socket) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add here -
if (_socket->connectionless())
return;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's incorrect at the moment.
It could make sense for UDP readtms/writetms to call and return blocked before it could read anything,
and not that we use it anywhere, if it was UDP+SSL, I suspect it would be necessary too.
Saying that, in our current use patterns, I don't think it's useful (we didn't need it before, so we don't need it now?), so I agree, to minimize the impact, we'll switch off the non-blocking mode for UDP for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized - if we turn off non-block for UDP then we must put the wait_read() back into the beginning of readtms() for UDP - before the recvfrom.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could just change recv() and send() to use MSG_DONTWAIT flag ?
So we dont have issue with one thread in reading and one thread writing and getting non-blocking flag mixed up ?
(but still need to decide about UDP - if it gets wait_read() added back in or we also use MSG_DONTWAIT))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakesmith I think what remains for this PR is:
1). UDP - if we decide to leave as blocking - put wait_read() into jsocket readtms() / wait_write() into writetms().
2). set all TCP sockets as non-blocking after connect()/accept() or leave in readtms()/writetms(), but don't restore to blocking, just leave socket in non-blocking mode always.
system/jlib/jsocket.cpp
Outdated
@@ -2094,8 +2138,10 @@ size32_t CSocket::writetms(void const* buf, size32_t minSize, size32_t size, uns | |||
errclose(); | |||
err = JSOCKERR_broken_pipe; | |||
} | |||
if ((err == JSE_WOULDBLOCK) && nonblocking) | |||
if (err == JSE_WOULDBLOCK) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also check if err == EAGAIN here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I think should. I'll change it.
system/jlib/jsocket.cpp
Outdated
@@ -1969,11 +1982,13 @@ void CSocket::readtms(void* buf, size32_t min_size, size32_t max_size, size32_t | |||
} | |||
else | |||
{ | |||
if (nonblocking && (err == JSE_WOULDBLOCK || err == EAGAIN)) // if EGAIN or EWOULDBLOCK - no more data to read | |||
if (err == JSE_WOULDBLOCK || err == EAGAIN) // if EGAIN or EWOULDBLOCK - no more data to read |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spelling in comment - EGAIN
Should we add a JSE_EAGAIN ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I'll add JSE_EGAIN for completeness.
fe4d4f0
to
8b8566c
Compare
If a legacy client tries to connect to Dali (ie sending only 12 bytes) or perhaps some rogue client - how long would Dali/MP keep the connection open waiting for the remaining data ? |
It should be clearing up stalled connected sockets after ~90 seconds (maintenance thread runs every 10s).. |
system/mp/mpcomm.cpp
Outdated
} | ||
bool closeIfTimedout(cycle_t now) | ||
{ | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a legacy client tries to connect to Dali (ie sending only 12 bytes) or perhaps some rogue client - how long would Dali/MP keep the connection open waiting for the remaining data ? In my tests the connection seems to remain open for a very long time.
@mckellyln - that would explain why it isn't closing them! :) [ I must have left in whilst testing at some point ]
Will remove!
|
||
if (remaining) | ||
{ | ||
sock->readtms(activeptr, 0, remaining, szRead, WAIT_FOREVER); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WAIT_FOREVER here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was made that, in "sympathy" to the way it was previously coded:
sock->read(&hdr,sizeof(hdr));
but we could rationalize at the same time as this PR too..
To be conservative, could make it a reasonably long time JIC e.g. 15 minutes - of course it should never take that long, but... don't want to change this too much at same time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed the main point here... in above reply.
It is readtms with min_size = 0, the timeout is basically irrelevant - it can't hit the timeout, because if blocked it will bail out.
There's a comment in CSocket::readtms:
* NB: timeout is meaningless if min_size is 0
763ddb6
to
636e712
Compare
CMPServer *parent; | ||
int mpSoMaxConn; | ||
unsigned acceptThreadPoolSize = 0; | ||
unsigned maxListenHandlerSockets = 60000; // what is a sensible default limit? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a comment, if you expect a large number of fds then when creating the selectHandler you can specify an argument to it for how many fds it handles before creating a new thread for more:
createSocketSelectHandler(traceName, fdsPerThread)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should leave that for a separate discussion/change. It (MP) has always used the default (UINT_MAX) and hence 1 CSocketEpollThread. Actually looks like everywhere uses the default. May be worth revisiting to see if it helps when a lot, but separate issue.
This limit (maxListenHandlerSockets), is limiting the number of clients that are actively connecting at once - once they're actually connected they are processed by handleAcceptedSocket and added to the standard MP select handler.
Not sure what a sensible # is here, perhaps much lower, but it's here to catch a crazy burst of connections coming in (like a DoS).
fs/dafsserver/dafsserver.cpp
Outdated
msgWritePtr = (byte *)msg.reserveTruncate(sizeof(size32_t)); | ||
size32_t szRead; | ||
if (readtmsCheckGC(sock, msgWritePtr, 0, sizeof(size32_t)-left, szRead, WAIT_FOREVER)) | ||
THROWJSOCKEXCEPTION(JSOCKERR_graceful_close); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this different than a sock->read(msgWritePtr, 0, sizeof(size32_t)-left, szRead, WAIT_FOREVER, false); ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it's the same. I'll change it.
i.e. if hit GC before then hit max then we haven't receive the packet size and we want to throw GC exception.
@jakesmith this is currently targeting master, but can/should it target an existing minor? |
unfortunately it's not a regression. It's a fundamental flaw in the way jsocket/securesocket and their clients (dafilesrv/MP) handle traffic, but are only really noticeable when SSL traffic is involved. The intent was to target 9.8, this PR was created before 9.8 was branched I think, I will change the base. However, this issue fundamentally affects SSL handling, non-SSL handling although potentially problematic (blocking traffic from all clients in select handler) was not susceptible to the same SSL stall. @jpmcmu - has the non-ssl (BM) issues been tested in 9.6 since then? |
@jakesmith It looks like our automated testing on bare-metal isn't running into the timeout issue after 9.6.26. I will run some further testing but it looks like HPCC-32174 likely fixed the issue on bare-metal |
I've run:
FWIW I've run this against the current candidate-9.6.x on a BM system (with no SSL) and it passed. |
8393010
to
f96e1d9
Compare
f96e1d9
to
6989e42
Compare
I think this is close to being ready to merge. @ghalliday - please also review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakesmith I only spotted a few minor typos/optimisations. The new code seems more logical.
system/jlib/jsocket.cpp
Outdated
// that could do with clearing up. Some of it is very similar to wait_write | ||
|
||
// As they current stand: | ||
// connect_timeout if successful, will always leave the socket in a non-bocking state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: blocking
system/jlib/jsocket.cpp
Outdated
|
||
// loop around to read more, or detect blocked. | ||
} | ||
else // NB rc < 0, (if rc == 0 handeld already above) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: handled
system/mp/mpcomm.cpp
Outdated
portProbeLastLog = get_cycles_now(); | ||
else | ||
{ | ||
cycle_t nowCycles = get_cycles_now(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: Could move this outside the critical section, and use in line 461 as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will change.
system/mp/mpcomm.cpp
Outdated
{ | ||
cycle_t nowCycles = get_cycles_now(); | ||
cycle_t cyclesSinceLastLog = nowCycles - portProbeLastLog; | ||
if ((cyclesSinceLastLog >= (queryOneSecCycles()*60))) // log max every minute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cache (queryOneSecCycles()*60) into a variable if portProbeLastLog is 0 (since this is in a crtisec)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't have thought significant enough to worry about, but will change.
{ | ||
portProbeLastLog = nowCycles; | ||
unsigned __int64 totalProbes = portProbeCloseCounts[CloseType_graceful] + portProbeCloseCounts[CloseType_error] + portProbeCloseCounts[CloseType_timeout]; | ||
DBGLOG("Port probes: %" I64F "u [graceful=%" I64F "u (%" I64F "u ms),error=%" I64F "u (%" I64F "u ms),timedout=%" I64F "u (%" I64F "u ms). Last: %s, type=%s, time: %" I64F "u", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this logging outside the critical section.
system/mp/mpcomm.cpp
Outdated
} | ||
const char *queryPeerHostText() const | ||
{ | ||
if (0 == peerHostText.length()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any potential problems with thread safety? Also function below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think very unlikely, but theoretically, and in practice, every [normal] connection calls queryPeerHostText(), so I am going to do in ctor instead.
system/mp/mpcomm.cpp
Outdated
if (it == handlers.end()) | ||
break; | ||
CSocketHandler *socketHandler = *it; | ||
if (socketHandler->closeIfTimedout(get_cycles_now())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call get_cycles_now() once before the critical section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will move.
handlers.emplace_back(socketHandler); | ||
numHandlers = handlers.size(); | ||
} | ||
if (0 == (numHandlers % 100)) // for info. log at each 100 boundary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this cause excessive logging if the number of handlers oscillates around a multiple of 100?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect only if of major concern if under attack.. in which case the tracing may be useful sign it is(?)
system/mp/mpcomm.cpp
Outdated
switch (exceptionCode) | ||
{ | ||
case JSOCKERR_timeout_expired: | ||
trackPortProbe(get_cycles_now() - socketHandler.queryCreateTime(), socketHandler.queryPeerEndpointText(), CloseType_timeout); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trivial: If you passed in socketHandler.queryCreateTime() instead you could avoid a call to get_cycles_now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will change
sock->getPeerEndpoint(ep1); | ||
PROGLOG("MP: Connect Thread: socket accepted from %s",ep1.getEndpointHostText(s).str()); | ||
#endif | ||
sock->set_keep_alive(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a new call. Worth a comment why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not new really. It's moved and earlier.
It's now upfront/early, here after the accept, before adding to the new connection select handler.
It used to be in handleAcceptedSocket(), which was also logically after the accept() (though delayed until threadpool thread started)
@ghalliday - please see review changes in new commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me
running - |
A - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to look into my my two issues posted above.
I've just tested this. Daserver does close the client socket. I stepped over the point (after 90 seconds) where it does this (in CConnectSelectHandler::close - mpcomm.cpp, line 747).
looks like nc remains running despite the fact the socket was closed? |
@mckellyln - can you attach echo6.xml ? |
I guess this is a custom configuration, enabling port 19876, or one I'm not familiar with at least. |
I'll send the steps and info to you and also debug myself. |
soapplus -ssl crashes in 9.6.x and 9.8.x (in CSecureSocket::wait_read). I'll open a separate JIRA - but not directly related to this PR. |
NB: soapplus (http.cpp) is the only place in the system that calls createSecureSocket with a handle vs an ISocket |
have checked 9.4, it doesn't hit the same issue - but more by luck than design (because happens to avoid calling wait_read) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it looks good, approved.
4f17ad5
to
b406298
Compare
I have added 1 more more commit, that fixes securesocket when created with a socket handle only (vs ISocket). |
Tighten up semantics of jsocket and securesocket and their use cases. Specifically, this fixes issues with SSL traffic being handled by select handlers, where previously it would tend to deadlock reading because the underlying buffering nature of SSL meant that either not all the data was available, or more than the amount read was available, and no future notify event triggered more reading. jsocket - ensure reads all available, return if >= min - clearup nonblocking toggle, only change if necessary - clarify write() semantics (always writes all or throw an exception) - add option to throw graceful close after reading min_size jutil - add incidental hold() utility method mp - remove slowclient handling, add trackPortProbe - track and trace port probes - use a dedicated selecthandler for connections, and refactor connect logic, to avoid blocking - maintain list of connecting clients, delete any that have stalled if timedout via a maintenance thread. - refactor main MP packet reader handling to be non-blocking securesocket - ensure readtms semantics match jsocket - correct handling of ssl error states (SSL_ERROR_WANT_WRITE, SSL_ERROR_WANT_READ, SSL_ERROR_SYSCALL, SSL_ERROR_ZERO_RETURN), triggering wait_read or wait_write - Fix securesocket with fd only issues. dafilesrv - rework dafsserver notifySelected loop slightly, to correctly expect/deal with 0 data. - Default all accepted dafilesrv sockets to non-blocking general - readtms - ensure consistenly waits for min_size. - tidy up looping code in various places (outside of jsocket) that relied on readtms 0 reading >=1 and looped instead of supplying min=max=len. - change calls that relied on read/readtms 0 reading >=1 - add readtmsAllowClose for common case where code treats graceful close as 0 bytes. Signed-off-by: Jake Smith <[email protected]>
b406298
to
6ca6595
Compare
6cae8ca
into
hpcc-systems:candidate-9.8.x
Tighten up semantics of jsocket and securesocket and their use
cases.
Specifically, this fixes issues with SSL traffic being handled
by select handlers, where previously it would tend to deadlock
reading because the underlying buffering nature of SSL meant that
either not all the data was available, or more than the amount read
was available, and no future notify event triggered more reading.
jsocket
jutil
mp
securesocket
dafilesrv
general
Type of change:
Checklist:
Smoketest:
Testing: