Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Retries to handle MongoDB AutoReconnect exception #90

Merged
merged 14 commits into from
May 24, 2022

Conversation

viniarck
Copy link
Member

@viniarck viniarck commented Apr 15, 2022

Fixes #83

Currently, pymongo driver will retry once reads and writes, however it's responsibility of the application to handle AutoReconnect, so this PR handles it accordingly and also provides a class decorator for all methods to avoid too much boilerplate, in the future, we can move this decorator to kytos core to be easily reused once other NApps start integrating with MongoDB.

This PR is on top of #84

Description of the change

  • Retries to handle database AutoReconnect exception

Practical example

  • Currently, pymongo serverselectiontimeoutms is 30s (which is configurable too), so operations are subjected to it.

Prior to this PR

  • Once AutoReconnect is raised it fails after 30s, in this case when handling a KytosEvent it would end up in the dead letter:
kytos $> 2022-04-15 14:15:45,637 - INFO [kytos.napps.kytos/of_core] (ThreadPoolExecutor-0_7) Modified Interface('s1-eth1', 1, Switch('00:00:00:00:00:00:00:01')) 00:00:00:00:00:00:00:01:
1
2022-04-15 14:15:45,644 - INFO [kytos.napps.kytos/of_core] (ThreadPoolExecutor-0_7) Modified Interface('s1-eth1', 1, Switch('00:00:00:00:00:00:00:01')) 00:00:00:00:00:00:00:01:1
kytos $>

kytos $> 2022-04-15 14:16:15,917 - ERROR [kytos.core.helpers] (ThreadPoolExecutor-0_3) listen_to handler: <function Main.on_interface_link_down at 0x7f1a00628ee0>, args: (<Main(topology, stopped 139748639241792)>, KytosEvent('kytos/of_core.switch.interface.link_down', {'interface': Interface('s1-eth1', 1, Switch('00:00:00:00:00:00:00:01'))})), exception: <class 'pymongo.errors.ServerSelectionTimeoutError'>: mongo1:27017: [Errno 111] Connection refused,mongo2:27018: [Errno 111] Connection refused,mongo3:27019: [Errno 111] Connection refused, Timeout:
 30.0s, Topology Description: <TopologyDescription id: 625866fc2238e8ec180b7597, topology_type: ReplicaSetNoPrimary, servers: [<ServerDescription ('mongo1', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('mongo1:27017: [Errno 111] Connection refused')>, <ServerDescription ('mongo2', 27018) server_type: Unknown, rtt: None, error=AutoReconnect('mongo2:27018: [Errno 111] Connection refused')>, <ServerDescription ('mongo3', 27019) server_type: Unknown, rtt: None, error=AutoReconnect('mongo3:27019: [Errno 111] Connection refused')>]>
kytos $>

With this PR

  • Notice that it was retried 3 times, and during the third attempt it fails again and will end up in the dead letter:
kytos $> 2022-04-15 14:24:06,003 - INFO [kytos.napps.kytos/of_core] (ThreadPoolExecutor-0_2) Modified Interface('s1-eth1', 1, Switch('00:00:00:00:00:00:00:01')) 00:00:00:00:00:00:00:01:1
2022-04-15 14:24:06,007 - INFO [kytos.napps.kytos/of_core] (ThreadPoolExecutor-0_2) Modified Interface('s1-eth1', 1, Switch('00:00:00:00:00:00:00:01')) 00:00:00:00:00:00:00:01:1
kytos $>

kytos $> 2022-04-15 14:24:36,238 - WARNING [kytos.napps.kytos/topology] (ThreadPoolExecutor-0_17) Retry #1 for deactivate_interface, args: (<napps.kytos.topology.controllers.TopoController object at 0x7f11f46b1d00>, '00:00:00:00:00:00:00:01:1'), kwargs: {}, seconds since start: 30.23
2022-04-15 14:25:07,348 - WARNING [kytos.napps.kytos/topology] (ThreadPoolExecutor-0_17) Retry #2 for deactivate_interface, args: (<napps.kytos.topology.controllers.TopoController object at 0x7f11f46b1d00>, '00:00:00:00:00:00:00:01:1'), kwargs: {}, seconds since start: 61.34
2022-04-15 14:25:38,449 - ERROR [kytos.core.helpers] (ThreadPoolExecutor-0_17) listen_to handler: <function Main.on_interface_link_down at 0x7f11f46a4b80>, args: (<Main(topology, stopped 139715151918656)>, KytosEvent('kytos/of_core.switch.interface.link_down', {'interface': Interface('s1-eth1', 1, Switch('00:00:00:00:00:00:00:01'))})), exception: <class 'tenacity.Re
tryError'>: RetryError[<Future at 0x7f11f43dcac0 state=finished raised ServerSelectionTimeoutError>]
kytos $>
❯ http localhost:8181/api/kytos/core/dead_letter/
HTTP/1.0 200 OK
Access-Control-Allow-Origin: *
Content-Length: 319
Content-Type: application/json
Date: Fri, 15 Apr 2022 17:26:31 GMT
Server: Werkzeug/1.0.1 Python/3.9.12

{
    "kytos/of_core.switch.interface.link_down": {
        "53d95dae-35bf-41a5-b1ea-22b8b095b243": {
            "content": "{'interface': Interface('s1-eth1', 1, Switch('00:00:00:00:00:00:00:01'))}",
            "id": "53d95dae-35bf-41a5-b1ea-22b8b095b243",
            "name": "kytos/of_core.switch.interface.link_down",
            "reinjections": 0,
            "timestamp": "2022-04-15T17:24:06"
        }
    }
}

Copy link

@ajoaoff ajoaoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very good, @viniarck. Just a minor comment.

controllers/__init__.py Outdated Show resolved Hide resolved
viniarck added 2 commits May 10, 2022 12:44
MONGO_AUTO_RETRY_STOP_AFTER_ATTEMPT
MONGO_AUTO_RETRY_WAIT_RANDOM_MIN
MONGO_AUTO_RETRY_WAIT_RANDOM_MAX
@viniarck
Copy link
Member Author

@ajoaoff since kytos core PR 216 has landed, I pushed this commit 66d07b5 to re-use the decorators from core

Base automatically changed from feature/mongo to master May 24, 2022 17:19
@viniarck viniarck force-pushed the feature/mongo_retries branch from 584f30b to 099ea44 Compare May 24, 2022 17:39
@viniarck viniarck merged commit bbea492 into master May 24, 2022
@viniarck viniarck deleted the feature/mongo_retries branch May 24, 2022 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Auto retry potential DB transient failures with tenacity
2 participants