Config file format changes (what do do about deals.csv, metadata.yml) #145

linuskendall · 2024-08-08T10:36:04Z

problem statement

storing and utilising deals.csv and metadata.yaml is a bit of a hack, and we would like to remove this.

current state

currently we use deals.csv to do a lookup from Piece CID to SP. I.e. given a specific piece, which SP has it stored.

currently we use metadata.yaml to map a byte offset in the "full car file" to:
a) a specific piece CID,i.e. given offset X in the full car file, which piece does this offset fall within
b) an offset within this piece, i.e. given offset X in the full car file, what's the offset within that piece that this offset corresponds to

Pieces has some header data and padding, and that's why b is needed.

suggested improvement

If we implement #122 then we can make this a lot simpler. instead of deals.csv and meatadata.yaml we could just use the following config:

pieces:
   - subset: <subset_cid>
     pcid: <piece_cid>
     sps:
        - <sp_id1>
        - <sp_id2>
        - ...

so when trying to fetch a CID, we would just look up the CID in the cid-to-subset. then using the piece config above, we would find the correct pcid and the sp that has it stored. then we could simply

we can have a tool for now that reads deals.csv and metadata.yaml and produce this config file.

this config file is the permanent version if it.

future options

we could also infer the subset from a specific piece. if we just configured faithful wiht a list of pieces and sps, faithful could read the root CID from the piece and then know which subset this is.

impacts

we should create a new config file format version
this config file format version woudl support the index in Add indexing on (new) split car files #122
this config file format version woudl not support deals.csv and metadata.yaml

benefits

in theory we can restore deals.csv and metadata.yaml using this approach by crawsling our address and then looking at the deals made.

for each deal we can lookup the piece CID from chain (as far as I understand) and from piece CID + sp ID we could then read the root block in the piece to get the subset. From this we would fully be able to use existing indexes and offsets etc.

since the offsets are unique the piece this makes it quite easy to reuse indexes.

The text was updated successfully, but these errors were encountered:

linuskendall · 2024-10-09T11:38:36Z

So current suggestion is:

data:
  car:
    from_pieces:
      - ipfs://<subset cid>
      - sp123213:<piece cid>
      - file://tank/faithful/baga123123122.car
      - http://abc.com/faithful/baga123213.car

We should be able to turn sp123123: into https:///ipfs/ and then use it like a normal url
For ipfs we should be able to use ipfs retrievals to just fetch the subset block.
For file we can use just file system operations already in PR #166 .
For http we can use the method outlined in #169.

linuskendall · 2024-10-09T11:41:42Z

Related to #120

linuskendall · 2024-10-17T10:41:11Z

What's left on this issue:

Get the new iterator that can read from full car file or from split car file
Update the config file format for epochs
Make sure that the file and http support identifying the sizes of the files and their ordering (in progress)
Make sure that the index lookups can translate the offsets from an index like cid2offset -> offset within split file

linuskendall assigned anjor and gagliardetto Oct 9, 2024

linuskendall mentioned this issue Oct 9, 2024

Add indexing on (new) split car files #122

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Config file format changes (what do do about deals.csv, metadata.yml) #145

Config file format changes (what do do about deals.csv, metadata.yml) #145

linuskendall commented Aug 8, 2024 •

edited

Loading

linuskendall commented Oct 9, 2024

linuskendall commented Oct 9, 2024

linuskendall commented Oct 17, 2024 •

edited

Loading

Config file format changes (what do do about deals.csv, metadata.yml) #145

Config file format changes (what do do about deals.csv, metadata.yml) #145

Comments

linuskendall commented Aug 8, 2024 • edited Loading

problem statement

current state

suggested improvement

future options

impacts

benefits

linuskendall commented Oct 9, 2024

linuskendall commented Oct 9, 2024

linuskendall commented Oct 17, 2024 • edited Loading

linuskendall commented Aug 8, 2024 •

edited

Loading

linuskendall commented Oct 17, 2024 •

edited

Loading