-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add EIP: Hardware and Bandwidth Recommendations for Validators and Full Nodes #9270
base: master
Are you sure you want to change the base?
Add EIP: Hardware and Bandwidth Recommendations for Validators and Full Nodes #9270
Conversation
File
|
- Statista states that as of January 2024: | ||
- The global average download for broadband is 92 Mbps and the global average upload is 43 Mpbs. | ||
- The global average download for mobile is 50 Mbps and 11 Mbps | ||
- GSMA report showing the state of mobile internet connectivity in 2024 shows that: | ||
- The mobile upload speeds in Higher Income Countries (HIC) is about 18 Mbps | ||
- The global average mobile download is 48 Mbps. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Page 39 for the upload speeds of HIC and page 5 for the 48 Mbps figure.
Statista link is: https://www.statista.com/statistics/896779/average-mobile-fixed-broadband-download-upload-speeds/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know how you would want these to be referred to in the document
|
||
## Backwards Compatibility | ||
|
||
This EIP is informational and requires no protocol changes. We recommend that future EIPs include an assessment of their impact on these hardware requirements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the recommendation here require another EIP if this gets finalized?
In any case, I don't think it belongs here since this section is explicitly about whether the EIP breaks backwards compatibility
|
||
RAM/memory is dominated by state cache. As of January 2025, it is possible to run a full node with 16GB of RAM, however this has been known to not work with all combinations of EL and CL clients in the past. | ||
|
||
On 32GB vs 64GB; 32GB works right now, however we recommend 64GB as [preliminary benchmarks](https://hackmd.io/@han/bench-hash-in-snark) have shown that zk-STARKS can consume a significant amount of memory and the difference in cost relative to the entire hardware setup for a validator is insignificant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw that EIPs 2926, 2938, 3298, 3416 and 3607 also used hackmd links however the contents of them can easily be changed, so I wonder if there is a rule about using them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given EIPs are immutable once they become final, does it make sense to place these requirements in an EIP? I expect requirements will need to be updated in a couple of years. In that case, we will need to create a different EIP amending this one. Then, operators will need to check 2 different EIPs to know what the current requirements are. Seems impractical to me.
good idea! |
In the call someone mentioned that, the status of this could be changed to "live" and we modify it in-place instead of creating a new EIP each time. I would defer to the EIP maintainers regarding that and whats possible there. I agree that creating a new EIP whenever we change specs is undesirable. |
I just got a discussion link created for me :) Will comment here and I can paste it over there: I have not thought much about this, so I would gather opinions from you and others who have before forming a strong opinion. If I had to choose one right now, I would perhaps tend towards "x% of a validator price can get you" or "y% of a validator's annual reward can get you" -- and we could figure out what x or y is by answering the question of "how long should it take a validator to recoup their hardware capex" |
Co-authored-by: Tim Beiko <[email protected]>
Co-authored-by: Mercy Boma Naps Nkari <[email protected]>
Co-authored-by: Mercy Boma Naps Nkari <[email protected]>
Co-authored-by: Mercy Boma Naps Nkari <[email protected]>
Co-authored-by: Mercy Boma Naps Nkari <[email protected]>
- Samsung 990 Pro | ||
- Seagate Firecuda 530 | ||
- Teamgroup MP44 | ||
- WD Black SN850X |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have to list specific commercial models in an EIP? Can you focus of i/o metrics that are brand agnostic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed i/o metrics would be more generalised and open up more broader choices for people
- Samsung 990 Pro | ||
- Seagate Firecuda 530 | ||
- Teamgroup MP44 | ||
- WD Black SN850X |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have to list specific commercial models in an EIP? Can you focus on i/o metrics that are brand agnostic?
|
||
- The CPU rating is a measure of how powerful a particular CPU is for single and multithreaded respectively. All numbers referenced are normalized according to PassMark. | ||
|
||
### Recommended Prebuilds (Attester and Local block builder) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if you suggest different options, could you stick to a single one for the benchmarks in the future? 2 as max if we include RPI which otherwise is definitely non-represented.
Meaning this wouldn't just be the hardware recommendations but also the common denominator for all benchmarks given from now on.
In that way, is significantly easier to compare cryptographic protocols rather than giving benchmarks on an M4 Max and then having no clue if is suitable for Attesters or block builders etc..
If not fixed to a machine, at least fixed to a number of cores a threads would definitely help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not good idea nor practical to do benchmarks on single hardware. You can instead tune your CPU to the provided rating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly say so because depending on the CPU, it might have AVX-{} support or not. Same with custom instruction sets for asm. RPI won't support Intel's fast modular field custom instruction set for example.
These are all issues that adjusting to rating won't support (IIUC). Also could help to dismiss or incentivize the implementation of such specific backends to squeeze as much perf as possible.
Finally, maybe not as big of an issue (I ignore that part) but different pipelining, cache lines and other kinds of similar properties can also influence some of the stuff we bench. Specifically when we work sometimes within the realm of us/ns
.
This was the main reason behind the suggestion. But if there are well-known alternatives that solve that or this can also be "adjusted", please let me know to resolve the comment! cc: @chfast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Storage: That this requires a certain amount of speed may be good to point out. Home stakers have had great success with NVMe that uses TLC NVRAM and has DRAM cache. DRAMless or QLC drives may or may not work, depending on model and client mix.
SATA SSD as well may or may not work, it depends on the model and exact client mix.
Given the strong recommendations for RAM and CPU, I suggest also recommending NVMe with DRAM cache and TLC flash for storage.
|
||
- The CPU rating is a measure of how powerful a particular CPU is for single and multithreaded respectively. All numbers referenced are normalized according to PassMark. | ||
|
||
### Recommended Prebuilds (Attester and Local block builder) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the EIP meant to explicitly avoid providing a basis for these commercial product/hardware recommendations? Still not sure I understand the point/value here without providing transparency into the reasoning and thought process driving the decisions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was in the original hardware requirements document, but it was not copied over due to the extensive use of links. I haven't seen any feedback re these builds being unreasonable.
NUC was specifically added because a lot of the existing staker community uses NUC and we provided an alternative model that Pari from panda ops suggested that he has experience with, because Yorick from Eth stakers noted that some places may not sell NUCs. If a user does not want to use these models, they can look at the cpu rating and choose accordingly or make a custom build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok cool, got it @kevaundray and mostly makes sense. As discussed before, my concern really only surrounds the empirical nature and openness of the suggestions and reasoning/data supporting them - especially considering coming from an "official" source like an EIP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MOVE
Feedback from EIP maintainers:
@SamWilsn just want to confirm, I did not miss anything? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions! 😄 👍
- Single Thread: 3903 | ||
- Multithread: 30367 | ||
|
||
#### ASUS NUC 14 Pro |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An idea, but maybe move those recommended prebuilds to an external site (as part of removing the brand names). HackMD will do fine. If there is some kind of external site which actually document and review setups that would also be great (but not aware of any).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense -- I think people will want some document that has more precise suggestions. It could be a hackmd that says something like "concrete suggestions for EIP XXXX"
|
||
## Specification | ||
|
||
### Recommended hardware and bandwidth specifications |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe include a date here, so if users stumble upon this EIP they will know that this is recent, and not recommended specifications of 3 years ago.
|
||
This EIP is informational and requires no protocol changes. We recommend that future EIPs include an assessment of their impact on these hardware recommendations. | ||
|
||
## Security Considerations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd like to see a section about "Future". What could we expect in the future? Would we need more storage? If I build a node using the recommended settings, what is the target duration that this setup is usable and needs no hardware upgrades?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Off the top of my head, this is what I can think of. Kind of hard to be comprehensive and I'm sure there are nuances that are missed or mistakes:
Full node
- Download and upload speed goes down once we have peerDAS
- Storage will go down once we have EIP444
- Storage and RAM will go down once we have stateless (unclear how stateless clients will do state cache for re-org protection)
- CPU could go down further once we snarkify the EL block
- CPU and storage might go up depending on how far we increase the gas limit and whether we have EIP4444 and stateless
Attesters
- They inherit all of the changes from full nodes (though attesters sample more, so higher bandwidth)
- Download speed goes up due to increase in blobs
Block builder
- CPU usage goes up once we have stateless due to the need to create the stateless proofs
- Download and upload speed initially goes down to peerDAS
- Upload speed and cpu requirements goes down due to epbs due to having more time to create the execution payload
- Download and upload speed goes up due to increase in blobs
- CPU requirements increase due to the addition of heavier proofs like zkvm (probably will need a GPU)
(One note here is that until we have zkvm proofs, you can still block build but at a lower throughput by setting the maximum number of blobs you can do in your EL based on your bandwidth requirements)
I think the main entity that will change significantly is the block builder.
Other changes to consider:
- State network: This refers to the portal state network -- it would in theory, allow us to increase the gas limit and so it would increase builder requirements and the download speed of attesters and full nodes.
The argument here is that if we increase the gas limit, we increase the rate of state growth and eventually no-one but centralized parties will be able to hold the state and they could deny access or censor the rest of the network. But with state network, we would always be able to retrieve the state (at a slower throughput).
-
Delayed state root: depending on your client and how long it was taking to compute the state root, the builder cpu requirements can go down
-
FOCIL; This effectively puts a lower bound on the block size and I think mostly affects block builders, in practice, I think the FOCIL limit will be negligible and won't affect the specs.
-
Rainbow staking derivatives: This will add more roles -- separating the centralizing components like block building from the other components at the very least.
It has been argued that this might give folks reason to pump up the requirements for the centralizing components even further like block building. I am unsure about how this will play out.
Co-authored-by: Jochem Brouwer <[email protected]>
### Recommended Prebuild (Full node) | ||
|
||
#### Raspberry Pi 5 | ||
|
||
- 4 core | ||
- 4 threads | ||
- CPU Rating | ||
- Single thread: 1487 | ||
- Multithred: 3428 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little concerned that a RP5 would be unable to function as a full node. Can anyone confirm this works? For one, storage speeds are quite important and it only operates at PCIe 2.0 speeds (like ~500MB/s). This might be do-able, but it's not something I would recommend today. Instead, I would suggest a Rock 5 Model B which is all-around better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently a RP5 works. Very cool. Consider my concern withdrawn.
- 4 threads | ||
- CPU Rating | ||
- Single thread: 1487 | ||
- Multithred: 3428 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Multithred: 3428 | |
- Multithread: 3428 |
- Teamgroup MP44 | ||
- WD Black SN850X | ||
|
||
In particular, we recommend purchasing NVMe M.2 instead of SATA as NVMe has higher throughput. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, we should mention that we recommend NVMe drives with DRAM and TLC flash. QLC flash is slower and has a lower write endurance, meaning they may fail sooner.
|
||
For validators that want to build blocks locally, we recommend global bandwidth figures inline with the global average for fixed broadband: 100 Mbps download and 50 Mbps upload. | ||
|
||
*If a block builder does not have the recommended bandwidth, we recommend that they build a block that is partially full and or one that includes less blobs.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*If a block builder does not have the recommended bandwidth, we recommend that they build a block that is partially full and or one that includes less blobs.* | |
*If a block builder does not have the recommended bandwidth, we recommend that they build a block that is partially full and or one that includes fewer blobs.* |
### RAM | ||
|
||
RAM/memory is dominated by state cache. As of January 2025, it is possible to run a full node with 16GB of RAM, however this has been known to not work with all combinations of EL and CL clients in the past. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a nit. I would prefer if the section header were "Memory" and replace "RAM" with "memory" in the paragraph. At the beginning, we could do "Memory (RAM) is dominated by..."
| Node type | Storage | RAM | CPU Cores | CPU Single Thread/Multithread rating | Download/Upload speed | | ||
| -------- | -------- | -------- | --------| -------- |--------| | ||
| Full node | 4TB | 32GB | 4 cores/8 threads | 1000 / 3000 | 50 Mbps / 15 Mbps | | ||
| Attester | 4TB | 64GB | 8 cores/16 threads | 3500 / 25000 | 50 Mbps / 25 Mbps | | ||
| Local block builder | 4TB | 64GB | 8 cores/16 threads | 3500 / 25000 | 100 Mbps / 50 Mbps | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we format this table so everything is aligned? So it's easier to read when viewing markdown source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, for consistency with the other fields, we should add spacing around "/" in the CPU cores column. And "cores" should be lowercase in the column title.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, for future reference, I used: https://codebeautify.org/markdown-formatter#
- Single Thread: 3520 | ||
- Multithread: 25158 | ||
|
||
### Recommended Prebuild (Full node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To match the order of the recommended hardware table, the full node section should be first.
|
||
- Statista states that as of January 2024: | ||
- The global average download for broadband is 92 Mbps and the global average upload is 43 Mbps. | ||
- The global average download for mobile is 50 Mbps and 11 Mbps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing punctuation.
- The global average download for mobile is 50 Mbps and 11 Mbps | |
- The global average download for mobile is 50 Mbps and 11 Mbps. |
- The global average download for broadband is 92 Mbps and the global average upload is 43 Mbps. | ||
- The global average download for mobile is 50 Mbps and 11 Mbps | ||
- GSMA report showing the state of mobile internet connectivity in 2024 shows that: | ||
- The mobile upload speeds in Higher Income Countries (HIC) is about 18 Mbps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing punctuation.
- The mobile upload speeds in Higher Income Countries (HIC) is about 18 Mbps | |
- The mobile upload speeds in Higher Income Countries (HIC) is about 18 Mbps. |
|
||
#### Attesters | ||
|
||
For attesters, we recommend: 50 Mbps download and 25 Mbps upload. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The colon isn't really necessary here.
For attesters, we recommend: 50 Mbps download and 25 Mbps upload. | |
For attesters, we recommend 50 Mbps download and 25 Mbps upload. |
The commit 93c4c0b (as a parent of 51efb62) contains errors. |
Original documents are here (for validators):