Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backport v3.7.99-ncs1-branch] net: coap: Pull various fixes to CoAP client #2220

Merged
merged 19 commits into from
Nov 4, 2024

Conversation

NordicBuilder
Copy link
Contributor

Backport 53b26fb~19..53b26fb from #2205.

hubertmis and others added 19 commits November 4, 2024 13:57
Constrained application can require to avoid waiting for response
from a server. CoAP No-response option defined in RFC 7967 allows
to do that by suppressing selected response types.

Signed-off-by: Hubert Miś <[email protected]>
(cherry picked from commit 70c9bf1)
(cherry picked from commit 67983c5)
This commit makes sure we continue to wait for extra confirmations even
after the request is done so we can handle duplicate confirmations if any.

Detailed description:

rfc7252#section-4.5 specifies that:

"The recipient SHOULD acknowledge each duplicate copy of a
 Confirmable message".

So if, for example, the client sends to a multicast destination address,
the server will get multiple requests and will confirm all of them.

Without this commit, the client will set the request to done after
receiving the first answer.
From here the request object will be marked as free and the duplicate
acknowledgements will stay buffered in the network stack.
Once the client tries to send a new request, it will unbuffer those
duplicate acknowledgements but now the request object is unallocated
so the client won't be able to handle those acknowledgements as duplicates.
It will instead treat it as an unexpected ACK.

To work around this issue, rfc7252#section-4.8.2 states that:

"EXCHANGE_LIFETIME is the time from starting to send a Confirmable
 message to the time when an acknowledgement is no longer expected,
 i.e., message-layer information about the message exchange can be
 purged."

Keeping the request object allocated for EXCHANGE_LIFETIME ensures that
duplicate acknowledgements can be handled accordingly.

This commit adds a basic implementation of what is stated in the RFC.

EXCHANGE_LIFETIME has been arbitrarily set to 3 * ACK_TIMEOUT which
seems more reasonable than the 247 seconds stated in the RFC.

Signed-off-by: Francois Gervais <[email protected]>
(cherry picked from commit bfed8d0)
(cherry picked from commit b56bec6)
…llocated

As confirmable requests will stay allocated for (3 * ACK_TIMEOUT), we
need to adjust the timings so all requests are unallocated by the end
of the test.

Signed-off-by: Francois Gervais <[email protected]>
(cherry picked from commit 13cad59)
(cherry picked from commit 04d4c68)
Passing MACRO arguments that need expansion require _CONCAT.

Signed-off-by: Pieter De Gendt <[email protected]>
(cherry picked from commit 476c12f)
(cherry picked from commit ce8edeb)
The goal of this commit is to reduce the overall test time by optimizing
the time spent sleeping.

However while doing so, a few glitches became apparent and those have
been fixed as well.

1. The way the ZSOCK_POLLIN event is managed has been modified

In most tests, the event would stick for the whole test and this would
sometimes results in the client receiving more data than intended which
in turn would results in various unexpected warnings and errors.

The management of the ZSOCK_POLLIN event has now been mostly moved to
the send/receive overrides which better represent how things would
happen outside of the test scenario.

For example, when sending data, if this send would normally result in
receiving some response, the send function sets the ZSOCK_POLLIN event.
Then when receiving, we clear the event as the data has been received.

There are a few exceptions to this for cases where we need to simulate
some specific abnormal behavior.

2. The `messages_needing_response` queue now includes a valid bit

The test manages a queue of messages id to be able to respond to the
correct id in the receive hooks.

It was built in a way that the id 0 would be treated as invalid
although it is a valid id.
In practice, it would not be an issue in most cases however, it would
result in an unexpected behavior if the `test_multiple_requests` test
would be run first.

To fix this issue in the simplest way, the queue has been changed to
uint32_t which gives us 16 extra bits for the management of the queue,
1 of which is used to tell if the entry is valid or not.

Signed-off-by: Francois Gervais <[email protected]>
(cherry picked from commit 736344b)
(cherry picked from commit 6983b3c)
…rable

Extend the `coap_transmission_parameters` struct with the field
`ack_random_percent`. This was the last remaining CoAP transmission
parameter that was not configurable at runtime.

Signed-off-by: Adrian Friedli <[email protected]>
(cherry picked from commit 98289e5)
(cherry picked from commit adb4d33)
Before this patch, any unexpected socket error during poll (caused by
LTE disconnects for instance) would lead to a infinite loop because the
error state was never cleared, handled or even signaled to the user.

Signed-off-by: Benjamin Lindqvist <[email protected]>
(cherry picked from commit f8a7035)
(cherry picked from commit c5074d7)
Protect global list of clients with mutex.

Signed-off-by: Seppo Takalo <[email protected]>
(cherry picked from commit 1ea569d)
(cherry picked from commit a75540e)
…ailed to send

When transmission of first request fails, reset the internal request
buffer as there is no ongoing CoAP transaction.

Application can deal with the failure.

Signed-off-by: Seppo Takalo <[email protected]>
(cherry picked from commit 46b7c84)
(cherry picked from commit 42a691b)
…etrying CoAP msg

Refactor the CoAP retry handling into the handle_poll() function,
so that we only try to send retries if the socket reports POLLOUT.

Also move the receiving into same loop, so when poll() reports POLLIN
we recv() the message and handle it before proceeding to other sockets.

Also fix tests to handle POLLOUT flag and add support for testing
multiple clients.

Signed-off-by: Seppo Takalo <[email protected]>
(cherry picked from commit 4c6dd4c)
(cherry picked from commit a10fdc4)
…ng loop

Forward recv() errors to handle_poll(), so there is only one place to
handle error codes.

Signed-off-by: Seppo Takalo <[email protected]>
(cherry picked from commit 6481b0e)
(cherry picked from commit dec4578)
… send() failure

If send() fails, we have not technically send the CoAP retry yet, so
restore the same pending structure, so our timeouts and retry counters
stay the same.

This will trigger a retry next time the poll() return POLLOUT, so we
know that we can send.

Signed-off-by: Seppo Takalo <[email protected]>
(cherry picked from commit 623a1ff)
(cherry picked from commit 179c07f)
…stead of flagging

It is error prone to flag separate booleans, so try to use
reset_internal_request() every time we release the internal request
structure.

Also refactor the reset_internal_request() so that we reset the
timeout value so it does not trigger again.

Signed-off-by: Seppo Takalo <[email protected]>
(cherry picked from commit a14f083)
(cherry picked from commit a3c908a)
Fix handling of received CoAP reset.

Signed-off-by: Seppo Takalo <[email protected]>
(cherry picked from commit 1890dbd)
(cherry picked from commit 95c1c85)
Add helper API to construct CoAP Reset message.

Signed-off-by: Seppo Takalo <[email protected]>
(cherry picked from commit e96e95b)
(cherry picked from commit eb52d8c)
Response tokens are already compared in get_request_with_token().

Signed-off-by: Seppo Takalo <[email protected]>
(cherry picked from commit 1dc2487)
(cherry picked from commit 72d0c39)
When receiving unknown response, respond with CoAP Reset.

Signed-off-by: Seppo Takalo <[email protected]>
(cherry picked from commit 350d20e)
(cherry picked from commit d199300)
If we receive poll() error during a request, it must
be forwarded to application.

If instead receive poll() error later, it should not be
forwarded as it might be result of application closing the
socket.

Signed-off-by: Seppo Takalo <[email protected]>
(cherry picked from commit 9c9dc9f)
(cherry picked from commit 2e57969)
In receiving thread, continuing the loops is based on
has_ongoing_exchanges() so it does not need atomic
coap_client_recv_active variable.

When idling, it wakes from semaphore. But there was potential
deadlock when coap_client_schedule_poll() would not signal the
semaphore, if atomic variable was already showing that it runs.
Removing the atomic variable removes this deadlock.

Signed-off-by: Seppo Takalo <[email protected]>
(cherry picked from commit 1e5a537)
(cherry picked from commit 53b26fb)
@rlubos rlubos merged commit 95a105e into v3.7.99-ncs1-branch Nov 4, 2024
18 of 19 checks passed
@rlubos rlubos deleted the backport-2205-to-v3.7.99-ncs1-branch branch November 6, 2024 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants