[Backport v3.7.99-ncs1-branch] net: coap: Pull various fixes to CoAP client #2220

NordicBuilder · 2024-11-04T13:57:55Z

Backport 53b26fb~19..53b26fb from #2205.

Constrained application can require to avoid waiting for response from a server. CoAP No-response option defined in RFC 7967 allows to do that by suppressing selected response types. Signed-off-by: Hubert Miś <[email protected]> (cherry picked from commit 70c9bf1) (cherry picked from commit 67983c5)

This commit makes sure we continue to wait for extra confirmations even after the request is done so we can handle duplicate confirmations if any. Detailed description: rfc7252#section-4.5 specifies that: "The recipient SHOULD acknowledge each duplicate copy of a Confirmable message". So if, for example, the client sends to a multicast destination address, the server will get multiple requests and will confirm all of them. Without this commit, the client will set the request to done after receiving the first answer. From here the request object will be marked as free and the duplicate acknowledgements will stay buffered in the network stack. Once the client tries to send a new request, it will unbuffer those duplicate acknowledgements but now the request object is unallocated so the client won't be able to handle those acknowledgements as duplicates. It will instead treat it as an unexpected ACK. To work around this issue, rfc7252#section-4.8.2 states that: "EXCHANGE_LIFETIME is the time from starting to send a Confirmable message to the time when an acknowledgement is no longer expected, i.e., message-layer information about the message exchange can be purged." Keeping the request object allocated for EXCHANGE_LIFETIME ensures that duplicate acknowledgements can be handled accordingly. This commit adds a basic implementation of what is stated in the RFC. EXCHANGE_LIFETIME has been arbitrarily set to 3 * ACK_TIMEOUT which seems more reasonable than the 247 seconds stated in the RFC. Signed-off-by: Francois Gervais <[email protected]> (cherry picked from commit bfed8d0) (cherry picked from commit b56bec6)

…llocated As confirmable requests will stay allocated for (3 * ACK_TIMEOUT), we need to adjust the timings so all requests are unallocated by the end of the test. Signed-off-by: Francois Gervais <[email protected]> (cherry picked from commit 13cad59) (cherry picked from commit 04d4c68)

Passing MACRO arguments that need expansion require _CONCAT. Signed-off-by: Pieter De Gendt <[email protected]> (cherry picked from commit 476c12f) (cherry picked from commit ce8edeb)

The goal of this commit is to reduce the overall test time by optimizing the time spent sleeping. However while doing so, a few glitches became apparent and those have been fixed as well. 1. The way the ZSOCK_POLLIN event is managed has been modified In most tests, the event would stick for the whole test and this would sometimes results in the client receiving more data than intended which in turn would results in various unexpected warnings and errors. The management of the ZSOCK_POLLIN event has now been mostly moved to the send/receive overrides which better represent how things would happen outside of the test scenario. For example, when sending data, if this send would normally result in receiving some response, the send function sets the ZSOCK_POLLIN event. Then when receiving, we clear the event as the data has been received. There are a few exceptions to this for cases where we need to simulate some specific abnormal behavior. 2. The `messages_needing_response` queue now includes a valid bit The test manages a queue of messages id to be able to respond to the correct id in the receive hooks. It was built in a way that the id 0 would be treated as invalid although it is a valid id. In practice, it would not be an issue in most cases however, it would result in an unexpected behavior if the `test_multiple_requests` test would be run first. To fix this issue in the simplest way, the queue has been changed to uint32_t which gives us 16 extra bits for the management of the queue, 1 of which is used to tell if the entry is valid or not. Signed-off-by: Francois Gervais <[email protected]> (cherry picked from commit 736344b) (cherry picked from commit 6983b3c)

…rable Extend the `coap_transmission_parameters` struct with the field `ack_random_percent`. This was the last remaining CoAP transmission parameter that was not configurable at runtime. Signed-off-by: Adrian Friedli <[email protected]> (cherry picked from commit 98289e5) (cherry picked from commit adb4d33)

Before this patch, any unexpected socket error during poll (caused by LTE disconnects for instance) would lead to a infinite loop because the error state was never cleared, handled or even signaled to the user. Signed-off-by: Benjamin Lindqvist <[email protected]> (cherry picked from commit f8a7035) (cherry picked from commit c5074d7)

Protect global list of clients with mutex. Signed-off-by: Seppo Takalo <[email protected]> (cherry picked from commit 1ea569d) (cherry picked from commit a75540e)

…ailed to send When transmission of first request fails, reset the internal request buffer as there is no ongoing CoAP transaction. Application can deal with the failure. Signed-off-by: Seppo Takalo <[email protected]> (cherry picked from commit 46b7c84) (cherry picked from commit 42a691b)

…etrying CoAP msg Refactor the CoAP retry handling into the handle_poll() function, so that we only try to send retries if the socket reports POLLOUT. Also move the receiving into same loop, so when poll() reports POLLIN we recv() the message and handle it before proceeding to other sockets. Also fix tests to handle POLLOUT flag and add support for testing multiple clients. Signed-off-by: Seppo Takalo <[email protected]> (cherry picked from commit 4c6dd4c) (cherry picked from commit a10fdc4)

…ng loop Forward recv() errors to handle_poll(), so there is only one place to handle error codes. Signed-off-by: Seppo Takalo <[email protected]> (cherry picked from commit 6481b0e) (cherry picked from commit dec4578)

… send() failure If send() fails, we have not technically send the CoAP retry yet, so restore the same pending structure, so our timeouts and retry counters stay the same. This will trigger a retry next time the poll() return POLLOUT, so we know that we can send. Signed-off-by: Seppo Takalo <[email protected]> (cherry picked from commit 623a1ff) (cherry picked from commit 179c07f)

…stead of flagging It is error prone to flag separate booleans, so try to use reset_internal_request() every time we release the internal request structure. Also refactor the reset_internal_request() so that we reset the timeout value so it does not trigger again. Signed-off-by: Seppo Takalo <[email protected]> (cherry picked from commit a14f083) (cherry picked from commit a3c908a)

Fix handling of received CoAP reset. Signed-off-by: Seppo Takalo <[email protected]> (cherry picked from commit 1890dbd) (cherry picked from commit 95c1c85)

Add helper API to construct CoAP Reset message. Signed-off-by: Seppo Takalo <[email protected]> (cherry picked from commit e96e95b) (cherry picked from commit eb52d8c)

Response tokens are already compared in get_request_with_token(). Signed-off-by: Seppo Takalo <[email protected]> (cherry picked from commit 1dc2487) (cherry picked from commit 72d0c39)

When receiving unknown response, respond with CoAP Reset. Signed-off-by: Seppo Takalo <[email protected]> (cherry picked from commit 350d20e) (cherry picked from commit d199300)

If we receive poll() error during a request, it must be forwarded to application. If instead receive poll() error later, it should not be forwarded as it might be result of application closing the socket. Signed-off-by: Seppo Takalo <[email protected]> (cherry picked from commit 9c9dc9f) (cherry picked from commit 2e57969)

In receiving thread, continuing the loops is based on has_ongoing_exchanges() so it does not need atomic coap_client_recv_active variable. When idling, it wakes from semaphore. But there was potential deadlock when coap_client_schedule_poll() would not signal the semaphore, if atomic variable was already showing that it runs. Removing the atomic variable removes this deadlock. Signed-off-by: Seppo Takalo <[email protected]> (cherry picked from commit 1e5a537) (cherry picked from commit 53b26fb)

hubertmis and others added 19 commits November 4, 2024 13:57

[nrf fromtree] include: zephyr: net: coap_service: Use _CONCAT expansion

82eb864

Passing MACRO arguments that need expansion require _CONCAT. Signed-off-by: Pieter De Gendt <[email protected]> (cherry picked from commit 476c12f) (cherry picked from commit ce8edeb)

[nrf fromtree] net: lib: coap_client: Protect initialization with mutex

6d0dc73

Protect global list of clients with mutex. Signed-off-by: Seppo Takalo <[email protected]> (cherry picked from commit 1ea569d) (cherry picked from commit a75540e)

[nrf fromtree] net: lib: coap_client: Fix reset handling

d50bb85

Fix handling of received CoAP reset. Signed-off-by: Seppo Takalo <[email protected]> (cherry picked from commit 1890dbd) (cherry picked from commit 95c1c85)

[nrf fromtree] net: coap: Add API to send reset message

c893f6a

Add helper API to construct CoAP Reset message. Signed-off-by: Seppo Takalo <[email protected]> (cherry picked from commit e96e95b) (cherry picked from commit eb52d8c)

[nrf fromtree] net: lib: coap_client: Remove duplicate token comparison

d4afbcf

Response tokens are already compared in get_request_with_token(). Signed-off-by: Seppo Takalo <[email protected]> (cherry picked from commit 1dc2487) (cherry picked from commit 72d0c39)

[nrf fromtree] net: lib: coap_client: Send RST for unknown queries

8b75121

When receiving unknown response, respond with CoAP Reset. Signed-off-by: Seppo Takalo <[email protected]> (cherry picked from commit 350d20e) (cherry picked from commit d199300)

NordicBuilder added the Backport label Nov 4, 2024

NordicBuilder mentioned this pull request Nov 4, 2024

manifest: sdk-zephyr: [Backport v3.7.99-ncs1-branch] net: coap: Pull various fixes to CoAP client nrfconnect/sdk-nrf#18524

Merged

rlubos approved these changes Nov 4, 2024

View reviewed changes

rlubos merged commit 95a105e into v3.7.99-ncs1-branch Nov 4, 2024
18 of 19 checks passed

rlubos deleted the backport-2205-to-v3.7.99-ncs1-branch branch November 6, 2024 08:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Backport v3.7.99-ncs1-branch] net: coap: Pull various fixes to CoAP client #2220

[Backport v3.7.99-ncs1-branch] net: coap: Pull various fixes to CoAP client #2220

NordicBuilder commented Nov 4, 2024

[Backport v3.7.99-ncs1-branch] net: coap: Pull various fixes to CoAP client #2220

[Backport v3.7.99-ncs1-branch] net: coap: Pull various fixes to CoAP client #2220

Conversation

NordicBuilder commented Nov 4, 2024