Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

Graphsync v1.1.0, encoded as CBOR #354

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 93 additions & 1 deletion block-layer/graphsync/graphsync.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,99 @@ type GraphSyncNet interface {

## Network Messages

### Protocol Version 1.1.0

Graphsync network messages are encoded in DAG-CBOR. They have the following schema

```ipldsch
type GraphSyncExtensions {string:Any}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string should be String in this position. I think Any should be fine for now if it can't be enumerated or properly pinned down, Any just hasn't fully filtered through our specs or stack because it's quite tricky. The original pb has bytes so might that be a safe choice for now? It could be turned into a union later or enumerated in some other way if needed. What types of things go into this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, just re-read your comment about why you didn't use bytes! ignore that bit then, but I'm still interested in what kind of data this might contain and if it could be enumerated? could it be limited to scalar values to rule out maps and lists perhaps? we have AnyScalar for that case if it works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirm Any should be fine.

(We still have some things to wade through in #318, it seems, but I think it's something we'll have to reach a conclusion on sooner or later, and I don't think there's any way we won't have an Any by the end.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It definitely isn't AnyScalar. We already use arrays and maps.

There are some other challenges for actually writing this in code-- the string represents the extension name, and the type is based on that, but we support user defined extensions, so it's not possible to know everything ahead of time.

There is a whole concept in cbor-gen world of Deferred -- where you don't deserialize till later, leaving in byte representation till you know what you want to deserialize to. I'm not sure this is the right approach for IPLD. But this will probably come up as well with https://github.com/ipld/go-ipld-adl-hamt -- I see there you've defined an Any type, but many HAMTs have a specific type at the bottom -- people may want to paramaterize their HAMTs and use code gen'd nodes for the day in the leaves. We should probably figure out how this might work.

Anyway though, just putting Any here at least allows me to define the schema

type GraphSyncRequestID int
type GraphSyncPriority int

type GraphSyncMetadatum struct {
Link &Any
BlockPresent bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/bool/Bool/

The rule is based on whether you're refering to a kind (which is a keyword) or a type (which are by convention TitleCase). (We're not quite the same as golang here, and don't have support them interchangably.)

type Foo int is a declaration using a kind keyword, so it's lowercase.

type Foo struct { Field Bool } is refering to a type, so it's using a name.

(There's an implicit prelude in a schema which defines type Bool bool, type String string, etc, so those are all available as a type name to make this less arduous.)

}
type GraphSyncMetadata [GraphsyncMetadatum]

type GraphSyncResponseCode enum {
# RequestAcknowledged means the request was received and is being worked on.
| RequestAcknowledged ("10")
# AdditionalPeers means additional peers were found that may be able
# to satisfy the request and contained in the extra block of the response.
| AdditionalPeers ("11")
# NotEnoughGas means fulfilling this request requires payment.
| NotEnoughGas ("12")
# OtherProtocol means a different type of response than GraphSync is
# contained in extra.
| OtherProtocol ("13")
# PartialResponse may include blocks and metadata about the in progress response
# in extra.
| PartialResponse ("14")
# RequestPaused indicates a request is paused and will not send any more data
# until unpaused
| RequestPaused ("15")

# Success Response Codes (request terminated)

# RequestCompletedFull means the entire fulfillment of the GraphSync request
# was sent back.
| RequestCompletedFull ("20")
# RequestCompletedPartial means the response is completed, and part of the
# GraphSync request was sent back, but not the complete request.
| RequestCompletedPartial ("21")

# Error Response Codes (request terminated)

# RequestRejected means the node did not accept the incoming request.
| RequestRejected ("30")
# RequestFailedBusy means the node is too busy, try again later. Backoff may
# be contained in extra.
| RequestFailedBusy ("31")
# RequestFailedUnknown means the request failed for an unspecified reason. May
# contain data about why in extra.
| RequestFailedUnknown ("32")
# RequestFailedLegal means the request failed for legal reasons.
| RequestFailedLegal ("33")
# RequestFailedContentNotFound means the respondent does not have the content.
| RequestFailedContentNotFound ("34")
# RequestCancelled means the responder was processing the request but decided to top, for whatever reason
| RequestCancelled ("35")
} representation int

type GraphSyncRequest struct {
Id GraphSyncRequestID # unique id set on the requester side
Root &Any # a CID for the root node in the query
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kinda an interesting one. It's a link, alright, so this line as stated is certainly valid. But I'd like to think out loud about it for a moment.

While this is a link, it's not a link we'd ever really mean to traverse when processing the graphsync messages. It's something we expect the agent processing this message to pull out -- as a link, not immediately traversing it -- and then do some logic before it dives in to operations that load that link.

It was just "bytes" in the old protobuf, and that worked fine because of the above. And it seems we could readily still have it be Bytes in this revamp, too. In the old implementation with protobuf, the application logic did a step to reify those bytes into a CID. We could have that step be explicit in a new implementation too, if we wanted to.

This doesn't seem to make a huge difference either way. I guess the main discernible difference is that if someone ran a selector with a bunch of wildcards in it over the GraphSyncRequest itself, if this field is &Any, they might get a mouthful back, and if it's Bytes, the results wouldn't recurse as far. But does this matter? Not really, as far as I can imagine. (I don't know why someone would do that; and even if they did, selectors are full of mechanisms that can be used to control this.)

Alright, that's it for thinking out loud. This seems fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hah, that is an interesting point! probably doesn't matter here but worth keeping in mind that viewing this schema from a certain lens might lead to a misunderstanding about intent. &Any does provide some constraints on what the bytes can be, mainly the leading varints must make for a valid CID, but beyond that if you're not traversing then it really is bytes.

Selector Selector # see https://github.com/ipld/specs/blob/master/selectors/selectors.md
Extensions GraphSyncExtensions # side channel information
Priority GraphsyncPriority # the priority (normalized). default to 1
Cancel bool # whether this cancels a request
Update bool # whether this is an update to an in progress request
Copy link
Contributor

@warpfork warpfork Jan 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

type GraphSyncResponse struct {
ID GraphSyncRequestID # the request id we are responding to
Status GraphSyncResponseStatusCode # a status code.
Metadata GraphSyncMetadata # metadata about response
Extensions GraphSyncExtensions # side channel information
}

type GraphSyncBlock struct {
Prefix bytes # CID prefix (cid version, multicodec and multihash prefix (type + length)
Data bytes
Copy link
Contributor

@warpfork warpfork Jan 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

type GraphSyncMessage struct {
Requests [GraphSyncRequest]
Responses [GraphSyncResponse]
Blocks [GraphSyncBlock]
}
```

### Legacy Protocol Version 1.0.0

An earlier version of graphsync encoded messages using protobufs

```protobuf
message GraphsyncMessage {

Expand Down Expand Up @@ -91,7 +184,6 @@ message GraphsyncMessage {
}
```


### Extensions

The Graphsync protocol is extensible. A graphsync request and a graphsync response contain an `extensions` field, which is a map type. Each key of the extensions field specifies the name of the extension, while the value is data (serialized as bytes) relevant to that extension.
Expand Down