Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Provider insolvency #278

Open
wants to merge 59 commits into
base: main
Choose a base branch
from
Open

Conversation

snowmead
Copy link
Contributor

@snowmead snowmead commented Nov 29, 2024

Mark providers as insolvent and slash / operations not allowed as insolvent provider

Process providers in the on_idle hook to mark them as insolvent and slash any remaining deposit they may have.

Many extrinsics in the pallet_file_system, pallet_storage_providers and pallet_payment_streams now fail if the provider is insolvent.


Charging users only for the period when the provider is solvent

Charging users via pallet_payment_streams now takes into account the block at which the provider was insolvent. For dynamic and fixed rate payment streams, the user will only be charged for the time that the provider was solvent.


Deleting insolvent provider is done manually

A new extrinsic delete_provider allows anyone to delete a provider if certain conditions are met:

  • The provider must be insolvent
  • The provider must not have any active payment streams

A new runtime API can_delete_provider is exposed to have a central piece of logic that handles these conditions and gives opportunity for offchain processes to automate provider deletion.


Storage request with custom replication target

File system pallet's issue_storage_request extrinsic now sets the replication_target to a new parameter called DefaultReplicationTarget supplied in the Config. This is simply to decouple the maximum replication target we accept for a storage request to be fulfilled and the default we set if it is not supplied by the caller.


Client and runtime side submit_proof bug fixes

There was a bug in the submit_proof extrinsic where the checkpoint challenge period was set to be the greatest challenge period. This led to issues where the checkpoint challenge a BSP would submit a proof for, would potentially be changed mid way via the on_poll hook and therefore miss the checkpoint challenge. This also inadvertently introduced an potential attack vector where BSPs can intentionally wait a certain number of blocks while still remaining in their challenge tick tolerance period, for the checkpoint challenges to be outdated, i.e. the checkpoint challenge period has been reached and is no longer awaited to be responded to. This would allow BSPs to skip unwanted checkpoint challenges.

To solve this issue, we now have the checkpoint challenge being set to greatest_challenge_period + challenge_tick_tolerance + 1. This allows no possibility for a checkpoint challenge to be skipped.

The runtime and client side conditions for including checkpoint challenges has been crosschecked and fixed to be identical in nature. There was a discrepency in the client side check to include checkpoint challenges when the runtime was not expecting any to be included.


BSP volunteering task: Error handling and optimizations

The BSP task that handles volunteering now skips the file if the storage request is no longer open to volunteering and if the volunteer extrinsic fails, the file is deleted from file storage and we unregister the file. This fixes a bug where if the another storage request were to be issued after failing the first volunteer check, the second volunteer would fail due to the file already being in the file storage with the error FileAlreadyExists.


TODOs

A lot of the TODOs below were added due to the storage-delete integration test which raised a lot of bugs or unimplemented logic that didn't handle non-happy path extrinsics being executed. Like for example, revoking a storage request but then a BSP requesting to stop storing the file aftewards and having the runtime automatically apply the mutation in a checkpoint challenge when the BSP submits a proof, and the bspConfirmStopStoring extrinsic failing afterwards due to local and runtime root mismatch.

Modify wait_for_tick so we can pass an arbitrary closure like the one below, which would execute it and determine if it should continue waiting for the tick to be reached.

This to close the task earlier if possible in the event when a storage request has been fulfilled or revoked mid-way in the waiting process.

// TODO: Have this dynamically called at every tick in `wait_for_tick` to exit early without waiting until `earliest_volunteer_tick` in the event the storage request
// TODO: is closed mid-way through the process.
let can_volulnteer = self
.storage_hub_handler
.blockchain
.is_storage_request_open_to_volunteers(file_key.into())
.await
.map_err(|e| anyhow!("Failed to query file can volunteer: {:?}", e))?;

Checking if the storage request is still open to volunteering does not work in the event when the BSP is catching up to the latest block. For example:

  1. BSP node is down at block 1
  2. chain advances to block 3
  3. storage request is opened
  4. chain advances to block 5
  5. storage request is revoked
  6. BSP node is up
  7. BSP node catches up to block 3
  8. BSP node volunteers for storage request and fails the extrinsic

This is because the BSP is unaware of the revoked storage request at block 5 since he hasn't processed it yet. When we call is_storage_request_open_to_volunteers blockchain service command, we use the best block hash which is block 3 in this case at the time of volunteering, when we should query the state of the latest block in the chain.

// Skip volunteering if the storage request is no longer open to volunteers.
// TODO: Handle the case where were catching up to the latest block. We probably either want to skip volunteering or wait until
// TODO: we catch up to the latest block and if the storage request is still open to volunteers, volunteer then.
if !can_volulnteer {
let err_msg = "Storage request is no longer open to volunteers. Skipping volunteering.";
warn!(
target: LOG_TARGET, "{}", err_msg
);
return Err(anyhow::anyhow!(err_msg));
}

For some reason we are revoking a storage request before the first BSP tries to confirm stop storing, and this is the reason why the test is failing at the end.

// Revoke the storage request otherwise the new storage request event is not being triggered
await userApi.sealBlock(userApi.tx.fileSystem.revokeStorageRequest(fileKey), shUser);

Also right now the BSP that requested to stop storing for a file is re-volunteering for it automatically through the tasks. I propose we add the following listed rpc methods that allow BSPs to skip storage requests based on file key/buckey/user blacklists.

// TODO: create an RPC to automatically execute everything below
// TODO: everything below should be removed and replaced with other testing logic
const inclusionForestProof = await bspApi.rpc.storagehubclient.generateForestProof(null, [
fileKey
]);
await userApi.sealBlock(
bspApi.tx.fileSystem.bspRequestStopStoring(
fileKey,
bucketId,
location,
userApi.shConsts.NODE_INFOS.user.AddressId,
fingerprint,
fileSize,
false,
inclusionForestProof.toString()
),
bspKey
);
await userApi.assert.eventPresent("fileSystem", "BspRequestedToStopStoring");
// When requested to stop storing a file we should also receive an event new storage request
// to replace the bsp leaving
// TODO: add blacklisting file keys to avoid volunteering for files that the BSP requested to stop storing
// TODO: add rpc to remove file key from blacklisted file keys
// TODO: add rpc to add user to blacklisted users to skip any storage request from them
// TODO: add rpc to add bucket to blacklisted buckets to skip any storage request from them
await userApi.assert.eventPresent("fileSystem", "NewStorageRequest");

This is broken due to the reasons mentioned above and therefore this section below is commented out for now until we resolve it.

// TODO: commented out since this extrinsic will inevitably fail. The reason being is that the revoke storage request executed above
// TODO: created a checkpoint challenge remove mutation which the bsp responded to and removed the file key from the forest before executing this extrinsic
// TODO: we should remove this entirely and implement a task or rpc to handle automatic stop storing
// await userApi.sealBlock(
// userApi.tx.fileSystem.bspConfirmStopStoring(
// fileKey,
// inclusionForestProof
// ),
// bspKey
// );
// await userApi.assert.eventPresent(
// "fileSystem",
// "BspConfirmStoppedStoring"
// );

snowmead and others added 30 commits November 12, 2024 13:32
@snowmead snowmead marked this pull request as ready for review December 2, 2024 19:30

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 44 out of 59 changed files in this pull request and generated no suggestions.

Files not reviewed (15)
  • api-augment/src/interfaces/lookup.ts: File too large
  • api-augment/src/interfaces/types-lookup.ts: File too large
  • pallets/providers/runtime-api/src/lib.rs: Evaluated as low risk
  • api-augment/src/interfaces/registry.ts: Evaluated as low risk
  • pallets/payment-streams/src/lib.rs: Evaluated as low risk
  • api-augment/src/interfaces/augment-api-consts.ts: Evaluated as low risk
  • pallets/proofs-dealer/src/mock.rs: Evaluated as low risk
  • pallets/payment-streams/src/mock.rs: Evaluated as low risk
  • pallets/bucket-nfts/src/mock.rs: Evaluated as low risk
  • pallets/file-system/src/mock.rs: Evaluated as low risk
  • client/indexer-service/src/handler.rs: Evaluated as low risk
  • api-augment/src/interfaces/augment-api-tx.ts: Evaluated as low risk
  • pallets/providers/Cargo.toml: Evaluated as low risk
  • api-augment/src/interfaces/augment-api-errors.ts: Evaluated as low risk
  • api-augment/src/interfaces/augment-api-events.ts: Evaluated as low risk
Comments skipped due to low confidence (2)

pallets/payment-streams/src/utils.rs:75

  • [nitpick] The error message Error::<T>::ProviderInsolvent should be reviewed to ensure it is clear and helpful.
if <T::ProvidersPallet as ReadProvidersInterface>::is_provider_insolvent(*provider_id) {

pallets/file-system/src/lib.rs:1356

  • The assertion message should use T::DefaultReplicationTarget::zero() instead of T::ReplicationTargetType::zero() for consistency.
assert!(default_replication_target > T::ReplicationTargetType::zero(),
@snowmead
Copy link
Contributor Author

snowmead commented Dec 3, 2024

Should we add a check to the ReadChallengeableProvidersInterface::is_provider implementation that returns false if the provider is insolvent so that the proofs dealer pallet does not accept any proofs from them?

I think we should.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants