Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Config file format changes (what do do about deals.csv, metadata.yml) #145

Open
linuskendall opened this issue Aug 8, 2024 · 3 comments
Assignees

Comments

@linuskendall
Copy link
Contributor

linuskendall commented Aug 8, 2024

problem statement

storing and utilising deals.csv and metadata.yaml is a bit of a hack, and we would like to remove this.

current state

currently we use deals.csv to do a lookup from Piece CID to SP. I.e. given a specific piece, which SP has it stored.

currently we use metadata.yaml to map a byte offset in the "full car file" to:
a) a specific piece CID,i.e. given offset X in the full car file, which piece does this offset fall within
b) an offset within this piece, i.e. given offset X in the full car file, what's the offset within that piece that this offset corresponds to

Pieces has some header data and padding, and that's why b is needed.

suggested improvement

If we implement #122 then we can make this a lot simpler. instead of deals.csv and meatadata.yaml we could just use the following config:

pieces:
   - subset: <subset_cid>
     pcid: <piece_cid>
     sps:
        - <sp_id1>
        - <sp_id2>
        - ...

so when trying to fetch a CID, we would just look up the CID in the cid-to-subset. then using the piece config above, we would find the correct pcid and the sp that has it stored. then we could simply

we can have a tool for now that reads deals.csv and metadata.yaml and produce this config file.

this config file is the permanent version if it.

future options

we could also infer the subset from a specific piece. if we just configured faithful wiht a list of pieces and sps, faithful could read the root CID from the piece and then know which subset this is.

impacts

  1. we should create a new config file format version
  2. this config file format version woudl support the index in Add indexing on (new) split car files #122
  3. this config file format version woudl not support deals.csv and metadata.yaml

benefits

in theory we can restore deals.csv and metadata.yaml using this approach by crawsling our address and then looking at the deals made.

for each deal we can lookup the piece CID from chain (as far as I understand) and from piece CID + sp ID we could then read the root block in the piece to get the subset. From this we would fully be able to use existing indexes and offsets etc.

since the offsets are unique the piece this makes it quite easy to reuse indexes.

@linuskendall
Copy link
Contributor Author

So current suggestion is:

data:
  car:
    from_pieces:
      - ipfs://<subset cid>
      - sp123213:<piece cid>
      - file://tank/faithful/baga123123122.car
      - http://abc.com/faithful/baga123213.car

We should be able to turn sp123123: into https:///ipfs/ and then use it like a normal url
For ipfs we should be able to use ipfs retrievals to just fetch the subset block.
For file we can use just file system operations already in PR #166 .
For http we can use the method outlined in #169.

@linuskendall
Copy link
Contributor Author

Related to #120

@linuskendall
Copy link
Contributor Author

linuskendall commented Oct 17, 2024

What's left on this issue:

  • Get the new iterator that can read from full car file or from split car file
  • Update the config file format for epochs
  • Make sure that the file and http support identifying the sizes of the files and their ordering (in progress)
  • Make sure that the index lookups can translate the offsets from an index like cid2offset -> offset within split file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants