Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] Dev -> stable merge #1779

Draft
wants to merge 1,144 commits into
base: stable
Choose a base branch
from
Draft

Conversation

jagerman
Copy link
Member

Merge dev -> stable, with conflict/merge resolutions.

This should stay unmerged/draft for now, but is helpful to have around for some other testing I am doing w.r.t. stable nodes running new/dev code.

Doy-lee and others added 30 commits September 4, 2024 14:56
As Jason stated (of which he correctly surmised):

(I believe the original intention was that this would match up to the contract's ServiceNodeLiquidated event, but we don't use that because liquidation emits both that and a ServiceNodeRemoval, and we need the latter to get the imposed liquidation penalty -- and can tell them apart because a penalty of 0 is a non-liquidation removal).

Co-authored-by: Jason Rhinelander <[email protected]>
This avoids the cruft of a std::move or needing to pull it out into a temp variable.

Co-authored-by: Jason Rhinelander <[email protected]>
When `height` wasn't passed it used the default current height for the
returned RPC value, but *not* for the actual call to retrieve the
current staking requirement, so it was passing a height of 0 and thus
returning the old Oxen testnet requirement of 100 instead of the new
SENT 120 requirement.
This adds support for oxend propagating a staking requirement change
from the staking contract into oxend, via the existing L2 event
confirmation mechanism.

Staking requirement is now retrieved via the service_node_list state,
rather than a free `get_staking_requirement` function as it now depends
on state changes and confirmations.  The old free function is renamed to
`get_default_staking_requirement` to make it clear that it isn't
(necessarily) the correct staking requirement value anymore.

This also drops the `height` input field from the
`get_staking_requirement` RPC endpoint; it was only really there for
when the staking requirement was height dependent, which it hasn't been
for a long time, and won't work with the change here for historical
values because we don't have a state_t for every possible height.

Co-authored-by: Doyle <[email protected]>
Fixes bugs from PR oxen-io#1714 - stake changes event txes weren't getting
properly extracted or processed.  This fixes them, and simplifies the
eth L2 state change processing code a bit.
Doy-lee and others added 30 commits December 20, 2024 12:10
… early

Delayed payments with a delay blocks of 1 was being applied to the SQL
DB before its SQL height was updated. The delayed payment was being committed at
the current height of the SQL DB (which was behind by one block as it
had not yet processed the block yet).

The block would be added to the DB updating the height of the SQL DB and
cause the triggers to execute and remove the delayed payments scheduled
for the current height of the SQL DB (e.g. the new one we just added)
before it can be paid out to the user in question.
- Remove 'get_version' in data_for_serialisation. In the past we might
of had a dynamic version that was assigned and it might of changed
during a change in the hardfork. This is no longer the case and we
always return a fixed value which means we don't use this feature.
Inline and remove.

- Remove the long/short term data cache in the transient struct on the
SNL. In the past this was probably modified outside of the serialisation
function 'service_node_list::store()' but now it is only modified in the
single function and everytime we enter this function we clear out the
cache and re-fill it in with runtime data.

Potentially, it was cached to save a heap allocation on store, in this
case I would switch to thread-local storage OR a static to reuse the
vector storage. I've inline it for now onto the stack to reduce
complexity in the header.

- Pull data_for_serialization into the implementation file. This
structure is not used outside of the SNL.

- Rename 'state_added_to_archive' to 'long_term_dirty_flag' to more
accurately convey the intent of this flag (e.g. when it's dirty, the next
store will have to write the [long term] data to disk ...)

- Remove the cache data blob, similar reasoning as above. This is
cleared everytime we enter the function. I suspect it was done to save
a heap allocation.

The solution I'd advise here is thread-local storage or static. Simplifies
the complicated SNL header.

- Remove the for-loop to initialise the data for serialisation which
just clears the structures and reassigns the version. The version is
static now and default initialised so no need to set it. The data
structures, potentially were there for heap allocation optimisation.
Again recommended solution is thread local storage or static.

--

I've chosen to use static because

- There will always only be one SNL list instance in the entire program
  so there aren't multiple SNL's that we have to synchronise against.
- Storing requires taking a mutex so the instruction level lock will
  never be contended.
- Store will be called from multiple threads (multiple P2P threads
  AFAICT) so we'd end up with multiple per-thread 'caches' if we used
  thread-local storage.
We are running out of file descriptors currently on stagenet when doing
a BLS signature fanout connection to all nodes on the network.
Decommed nodes (i.e. funded but inactive) are still potentially
available for signing off on BLS requests, but we were only considering
active SNs for making connections for BLS sigs.

This also splits up/genericizes the different modes of the
`is_service_node()` call to remove the optional bool changing the
meaning of the function, and adds a new `is_funded_service_node()` for
what we need here.
This removes the `MAX_CONNECTIONS` constant and attempted handling as it
really didn't do what it was supposed to: it only controlled how many
connections got initiated at once in a single request, but:

- didn't account for semi-persistent connections, which virtually all of
  our connections are (using X pubkey lookup for connection and its
  default 30s idle timeout).
- didn't account for multiple BLS signature requests happening at once,
  which can easily happen.
- since we didn't avoid hitting the FD limit anyway, and since that is
  now fixed by the previous commit, the limit seems pointless and we can
  just blast out all the connections at once for a slight speed up of
  how long it takes to get everything.  (Essentially the speed now will
  be the slowest response, or a single timeout, rather than potentially
  longer chain of cascading slow responses and/or timeouts).

This also allows somewhat simplifying the logic of the
`nodes_request_data`, which now only worries about when to invoke the
final callback, but no longer has various state or code to deal with
initiating new requests after receiving responses that drop below the
limit.
We are running out of file descriptors currently on stagenet when doing
a BLS signature fanout connection to all nodes on the network.
The archive window was being calculated as the interval but also additionally
the mod subtraction does not calculate what we intended. This has been
fixed now so that the window is the number of blocks generated in
2 years and that the SQL trigger is correctly using the 2 year window.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants