[SPEC] Automated testnet deployment #664
Labels
optimization ⚙️
Tasks that are refactor, optimize, or are considered chores.
p3 🔵
Issues should be resolved eventually
task ✔️
Overview
As we know, pipelines passing is never enough to determine if we're actually stable enough for a merge into master. Even though it's a good indicator that, in well-behaved environments, the code will run for at least a limited number of sessions, we must use the live testnet to prove that we can successfully run hundreds to thousands of sessions without fail. If we wish for increasing long-term stability, the practice of using the testnet before merging code into master should be codified into our development process to ensure we don't undo our hard work.
Currently, we have no enforced policies in Github Actions that require us to pass a testnet before merging into master. Deploying to testnets is manual, whereby @1xstj deploys the latest commit from the PR branch onto the testnet using SSH to execute a remote terminal.
Sometimes, we do not actually need a testnet, and other times, we do need a testnet. Because of this indeterminacy, we can use environments and auto-defined deployment targets to adjust the deployment target. One target will be the testnet deployment, and the other deployment target will be the null deployment. For the testnet deployment, the job passes when e.g. 500 sessions passes. We can use websockets or polling to determine when to terminate. For the null deployment, the job passes instantly. So long as one of these passes (as well as the normal PR checks), we can then confidently merge into master assured of the stability of the PR.
Task List
The ideal workflow:
Case A: User makes a PR that does not affect core logic
The testnet-deployer runs and detects that this PR does not need a testnet and instantly returns true
Case B: User makes a PR that affects core logic
The testnet-deployer runs and detects that this PR needs a testnet and submits a testnet deployment request. A manual approval must then be given in the GitHub interface. Next, the testnet-deployer continues by executing the relevant SSH commands, thus starting a testnet. Then, the testnet-deployer uses websockets (or polls) until it either notices stalling or the 500 session target is reached. Finally, the testnet-deployer either returns success or falure depending on the previous result.
Further discussion
We may not want auto-detection when choosing a deployment target. If we wish for manual selection of the deployment target, the testnet-deployer can instead send two simultaneous requests to both the testnet deployment target and the null deployment target and wait for approval. If we decide we need a testnet, we deny the null deployment and accept the testnet, and if we decide not to need a testnet, we deny the testnet request and accept the null deployment. The testnet-deployer, in either case, will succeed once either request returns with a success. The accepted testnet deployment does not succeed until the 500 sessions are reached, whereas the null deployment succeeds immediately.
The text was updated successfully, but these errors were encountered: