-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Feature - rework comms utils / add fluxd zmq #1361
Draft
MorningLightMountain713
wants to merge
16
commits into
development
Choose a base branch
from
feature/rework_comms_utils
base: development
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
MorningLightMountain713
force-pushed
the
feature/rework_comms_utils
branch
from
July 4, 2024 12:10
a3c6259
to
fb42807
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Work in progress, code is dirty, tests will be broken, untested on old versions
I am running this on a node now.
What this pull does
fluxCommunicationsUtils
module - renamed tonetworkStateService
fullnode
inline package to be more flexible (and parse / write the config)fluxCommunication
stuffThis pull kind of got away on me a little. It's too big. I will split out the zmq / fluxd stuff so we can enable it in stages, as I need to make sure I get the fluxd config file stuff correct - for obvious reasons.
Background
I'll do some diagrams and stuff to make it easier to show what is going on, both before and after.
Historically,
fluxOS
pollsfluxd
for various data, mainly thedeterministicfluxnodelist
which I am now calling thenetworkState
(I think that is a more appropriate identifier). The other heavy user of the fluxd api is theblockProcessor
which polls every 5 seconds. The blockProcessor is out of scope here - but could benefit greatly from using the pub / sub zmq feature in the future.The deterministicfluxnodelist right now is 8.22Mb. So every call to get this list is a lot of i/o.
fluxOS
has multiple levels of caching. One at the rpc level (daemonFluxNodeRpcs) where there is a 20 second LRU cache, on both the entire networkState, or if a specific pubkey is searched, then that is cached.There was a second level of caching in the fluxCommsUtils - this would cache the same thing, but for 4 minutes instead. However, when pulling the full list, it wasn't indexed, so any pubkey searches had to go out to the api again, and were subsequently cached.
fluxOS would then go out every 2 minutes and update the full list. Due to the nature of the way blocks can come in, an LRU cache isn't really a good fit for what we're trying to achieve here; meaning that if a bunch of blocks come in over a short period, the list can get stale (and invalid) quite quick.
In this update, we now subscribe to the fluxd zmq endpoint for updates, now, whenever a block is generated, we get notified immediately, and fetch the full list. There is no more polling for the list; Except if there is an issue with the zmq socket, we fall back to polling.
So now we get up to date data. We don't use any LRU caching at all. (it's enabled by default in the daemonFluxNodeRpcs but I've added an override feature) As soon as a block is generated, the old data is invalidated. I.e. old nodes may have been removed etc.
We don't need caching as we are building our own pubkey and endpoint indexes. This means you can look up a pubkey or ip:port locally without having to go out to the rpc. During testing, with ~13k nodes, it takes approx 8ms to build the indexes. Normally this would block the event queue, which is bad. However, I've partitioned the index building so it is at most O(1000), instead of O(n). We are building the index in chunks of 1000 nodes, then yield to the event queue, and move on to the next chunk. This way, no matter how big the node count gets, it won't block the event queue. I was initially using worker threads for this, but overkill for our application.
Old way:
New way:
Will add more detail here soon.