Handle sendto() function failure conditions #333

pranit-sirsat-imgtec · 2017-01-20T07:22:29Z

The sendto() function can fail because of multiple conditions.
Currently we are handling few conditions. If there is an error due
to another condition, the process gets stuck in repeatedly trying
to call sendto() function and failing with the same error condition.
For example, when awa session is established and after that
network interface goes down(ex. ethernet cable is removed),
the sendto() fuction fails with error 'ENETUNREACH'. This error
is not handled and the process gets stuck in repeatedly calling
sendto() fuction. There are also other possible error conditions.
In this change handling all error conditions, so that process
does not get stuck in endless loop.

Ref: AWA-311
Signed-off-by: Pranit Tanaji Sirsat [email protected]

The sendto() function can fail because of multiple conditions. Currently we are handling few conditions. If there is an error due to another condition, the process gets stuck in repeatedly trying to call sendto() function and failing with the same error condition. For example, when awa session is established and after that network interface goes down(ex. ethernet cable is removed), the sendto() fuction fails with error 'ENETUNREACH'. This error is not handled and the process gets stuck in repeatedly calling sendto() fuction. There are also other possible error conditions. In this change handling all error conditions, so that process does not get stuck in endless loop. Ref: AWA-311 Signed-off-by: Pranit Tanaji Sirsat <[email protected]>

boyvinall · 2017-01-20T10:09:26Z

Ref: AWA-311

@pranit-sirsat-imgtec do you mean #311 ?

pranit-sirsat-imgtec · 2017-01-20T10:13:48Z

@pranit-sirsat-imgtec do you mean #311 ?

yes, i added that in commit message.
i am using the github for first time so don't know if something has gone wrong

boyvinall · 2017-01-20T10:46:05Z

@pranit-sirsat-imgtec No problem. To refer to github issues or pull requests, take a look at https://help.github.com/articles/autolinked-references-and-urls/. Always worth checking the "preview" tab when you're writing comments too, at least until you get used to it.

datachi7d · 2017-01-25T05:28:08Z

@boyvinall - this looks like it removes error checking. Can you review this again please?

datachi7d · 2017-01-25T05:29:39Z

@pranit-sirsat-imgtec is it possible to write some tests that show the issue without your fix? This will help us understand how your changes fix the issue you saw.

pranit-sirsat-imgtec · 2017-01-25T06:17:01Z

Hi @datachi7d, thank you for your comment.

can you please explain how it removes the error checking ?

the if else condition was previously checking for few possible errors on which sendto() could not have recovered and returning NetworkSocketError_SendError. for other error conditions, considering those conditions will recover, it was continuing in while loop and calling sendto(). Now if the error condition is unrecoverable then the process is stuck in while loop and useless after that. one example as wrote in commit message was of ENETUNREACH.

now in the new code, as per my understanding EWOULDBLOCK and EINTR are immediately recoverable so remaining in while loop, otherwise for other conditions it will return NetworkSocketError_SendError. this way it will never get stuck in while loop. in case the error was recoverable, the NetworkSocket_Send() function will get called again after some time with same coap data (at least i had checked about coap_transactions)

list of possible errors http://pubs.opengroup.org/onlinepubs/009695399/functions/sendto.html

pranit-sirsat-imgtec · 2017-01-25T06:44:49Z

as we use the awa_clientd daemon
to reproduce the issue,

1] get awa_clientd daemon connected with device server
2] call awa-client-get on command line, should work
3] down the interface, ifconfig down ......
4] wait for few second i think the poll duration in main loop (some 3-4 sec.)
5] call awa-client-get on command line, should fail, with timeout error

for test case, if i get time i will take look at other tests, but for this we will need to up/down interface. do you have fixed name for interfaces on test device ? will that affect the CI system ?

boyvinall · 2017-01-25T07:37:41Z

@datachi7d I don't think this does remove error checks? Previously, the code was testing a bunch of lastError values but handling them all the same way. There were potentially some values of lastError that were not tested for. The patch has reduced identical handling into a single else which also catches any other values of lastError .. so actually it's adding error handling?

boyvinall · 2017-01-25T07:38:42Z

@datachi7d However, the point about a test case to catch this in future is a good one. @pranit-sirsat-imgtec we would welcome that if you get the chance, thanks.

datachi7d · 2017-01-25T20:30:41Z

@boyvinall my mistake - I was confused by the title combined with the removal of code (I was expecting something to be added). So is there a possibility that some of the returned values caught by the else are not errors? For example EAGAIN.

datachi7d · 2017-01-25T20:57:16Z

@pranit-sirsat-imgtec yes this looks tricky to test. Having a mechanism to test this scenario would help with creating transparency of what's happening with the network stack (#319 or #313) as it would allow testing of conditions that lead to network related errors. Using a tun/tap device to create a pseudo interface that can be managed from the CI tests (ifconfig tun0 down) could be a place to start.

@DavidAntliff do we run our CI jobs in docker containers? If the tests aren't in a container getting permission to create the tun/tap device means sudo-ing the tests (very bad).

datachi7d · 2017-01-25T21:04:12Z

@DavidAntliff looks like the Jenkins CI jobs are in a container but what about the Travis job?

DavidAntliff · 2017-01-25T21:05:03Z

@datachi7d yes, the tests run in docker containers. I don't recall there being any issue with using tun/tap inside them, but I can't be sure. Travis uses Docker too, but implicitly.

boyvinall added this to the 0.2.5 milestone Jan 20, 2017

boyvinall added the bug label Jan 20, 2017

cheekyhalf merged commit e9e2c84 into ConnectivityFoundry:master Jan 20, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle sendto() function failure conditions #333

Handle sendto() function failure conditions #333

pranit-sirsat-imgtec commented Jan 20, 2017

boyvinall commented Jan 20, 2017

pranit-sirsat-imgtec commented Jan 20, 2017

boyvinall commented Jan 20, 2017

datachi7d commented Jan 25, 2017

datachi7d commented Jan 25, 2017

pranit-sirsat-imgtec commented Jan 25, 2017

pranit-sirsat-imgtec commented Jan 25, 2017

boyvinall commented Jan 25, 2017

boyvinall commented Jan 25, 2017

datachi7d commented Jan 25, 2017

datachi7d commented Jan 25, 2017

datachi7d commented Jan 25, 2017

DavidAntliff commented Jan 25, 2017

Handle sendto() function failure conditions #333

Handle sendto() function failure conditions #333

Conversation

pranit-sirsat-imgtec commented Jan 20, 2017

boyvinall commented Jan 20, 2017

pranit-sirsat-imgtec commented Jan 20, 2017

boyvinall commented Jan 20, 2017

datachi7d commented Jan 25, 2017

datachi7d commented Jan 25, 2017

pranit-sirsat-imgtec commented Jan 25, 2017

pranit-sirsat-imgtec commented Jan 25, 2017

boyvinall commented Jan 25, 2017

boyvinall commented Jan 25, 2017

datachi7d commented Jan 25, 2017

datachi7d commented Jan 25, 2017

datachi7d commented Jan 25, 2017

DavidAntliff commented Jan 25, 2017