Add client retry support to .map #2571

rohansingh · 2024-11-25T22:23:09Z

Implements similar logic for .map as #2403 did for .remote.

To do this, we create a TimedPriorityQueue, which holds all pending inputs and pending retries, along with the timestamp at which they should be sent to the server.

When inputs are first read from the generator, they are added to the queue with a timestamp of "now". And if a failed output is received from the server, it's added back to the queue with a configured retry delay.

When two items had the same timestamp, we would try to sort by the actual item value, which breaks for types that don't support comparison. Instead use a nonce when inserting an item, to ensure that we never have to compare the item value itself.

Though very unlikely outside of unit tests, it's possible to have an output returned before the corresponding retry context has been put into the `pending_outputs` dict.

Once the input queue filled up, we had no more room to put pending retries. And since we had no more room to put retries, we stopped fetching new outputs. And since we stopped fetching new outputs, the server stopped accepting new inputs. As a result, the input queue would never burn down. Instead, use a semaphore to ensure we never have more than 1000 items outstanding.

Instead of using a priority queue, just use the event loop to schedule retries in the future. This significantly simplifies the implementation and makes it much more like the original. Note that we still do have a semaphore that ensures that no more than 1K inputs are in flight (i.e., sent to the server but not completed).

rohansingh · 2025-01-08T19:55:52Z

Continued in #2734.

rohansingh changed the title ~~Rohan/retry map~~ Add client retry support to .map Nov 25, 2024

rohansingh force-pushed the rohan/retry-map branch 6 times, most recently from 00957f4 to 55349c4 Compare December 2, 2024 15:48

rohansingh marked this pull request as ready for review December 2, 2024 15:48

rohansingh force-pushed the rohan/retry-map branch 8 times, most recently from 85f41bb to bd1ee57 Compare December 2, 2024 21:08

rohansingh added 2 commits December 2, 2024 21:21

Add client retry support to .map

283190c

Track in-flight inputs by idx instead of input_id

0869507

rohansingh force-pushed the rohan/retry-map branch from bd1ee57 to e7ba872 Compare December 2, 2024 21:22

rohansingh requested review from freider and gongy December 2, 2024 21:27

rohansingh force-pushed the rohan/retry-map branch from e7ba872 to cf8b561 Compare December 2, 2024 21:44

Add unit tests for map retries

bb647de

rohansingh requested a review from rculbertson December 2, 2024 22:43

rohansingh added 4 commits December 3, 2024 17:01

Respect MODAL_CLIENT_RETRIES setting for map

c585288

Fix race condition between inputs/outputs

775ad63

Though very unlikely outside of unit tests, it's possible to have an output returned before the corresponding retry context has been put into the `pending_outputs` dict.

rohansingh force-pushed the rohan/retry-map branch from 6ed9306 to eb82dd4 Compare December 6, 2024 17:18

rohansingh closed this Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add client retry support to .map #2571

Add client retry support to .map #2571

rohansingh commented Nov 25, 2024 •

edited

Loading

rohansingh commented Jan 8, 2025

Add client retry support to .map #2571

Add client retry support to .map #2571

Conversation

rohansingh commented Nov 25, 2024 • edited Loading

rohansingh commented Jan 8, 2025

rohansingh commented Nov 25, 2024 •

edited

Loading