-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API v2 proposal: remove PinResults.count #86
Comments
Can we not just add support for a v2 (and maybe a versions) endpoint and avoid breaking changes here? IIRC when it was previously discussed that we wouldn't use libp2p and leverage multistream for protocol negotiation, but instead use HTTP the claim was that having version negotiation would be similarly easy. |
Yes, if we every do this, then we would introduce this breaking change as v2 + versioning scheme. So far this proposal got no feedback from any pinning service – let's park this until there is interest and/or we gather more items under |
Besides being expensive, looking at the count while paginating has problematic edge cases, e.g. pins being added/deleted while paginating. If we're going to make a breaking change, we should consider using an opaque pagination token instead of the |
@lidel just now seeing this and definitely supporting this change. Count is super expensive for larger pinsets and can cause a lot of issues at scale. I'm in favor of moving towards a created timestamp paradigm here. We had to switch our normal pinList endpoint to using this paradigm as the previous method which required counting was becoming extremely expensive for larger users. |
@lidel I would also add here that getting rid of count would allow us to raise our rate limits, which is something I know the PL team / ipfs-desktop users have been asking for on this specific endpoint |
I think we should remove count, and use an opaque token like "nextToken". Gus said it best:
|
We could pretty easily implement things right away if we did time based pagination, as we already switched the rest of our application over to doing this, but I'm guessing opaque token pagination might be a lot bigger of a lift. @SgtPooki @guseggert could you both elaborate on what you have in mind here? Any example articles you could point to? |
@obo20 absolutely. First here's a quick summary in my own words: I think of pagination tokens like Those records can be sorted/filtered however you wish, and you can encode any sorting/filtering/pages/users/etc.. within that token. This flexibility is great, but does come with drawbacks that I've personally experienced and complained about plenty during my time at Amazon, but for service owners, it gives you a lot more control over how your resources are consumed no matter how many consumers or objects to consume you have. I believe @guseggert implemented more than a few APIs at Amazon where we were required to use this pagination strategy when dynamoDB was our backend (it was most of the time), so he can correct me or speak further to the details if you need more info from someone who's got more experience. Below are some great articles I've found that should help the discussion here for anyone interested. The writeup by slack is a fantastic read that goes into the pros and cons of many approaches. Apollo GraphQLNot as good as the others, but informative in a basic sense. Very specific to apollographql
hackernoon articleGreat writeup covering unidirectional and bidirectional pagination from the view of an engineer.
SlackAn extremely thorough writeup of slack's considerations when upgrading their pagination API
|
At a high level, you can encode whatever state you need to fetch the next page in the pagination token (you might want to encrypt it too, so clients can't reverse it). E.g. for a relational database, you can encode a JSON object like Concretely,
and then the subsequent request would be like The downside from a client's perspective is that you can't parallelize queries. That's an upside from a server's perspective, and generally a desirable thing for a multitenant API. But I'm not sure how this would impact existing impls. Essentially though, I think we should decouple the query params / filters from the pagination mechanism, so that we don't need to make assumptions about the query capabilities of the DB in order to paginate, and the most general way that I know of to do that is with an opaque token. |
iirc we were discussing backward-compatible ways of removing the cost of returning exact Removing or changing type of Next steps here (I have no bandwidth for this, but happy to review PRs):
|
PinResults.count
is used for two things:(1) Reading pin count per status (used in
ipfs pin remote service ls --stat
):(2) pagination (where it acts as an equivalent of
more: true
:Problem
PinResults.count
is expensive, especially when running expensive filters likecid
,name
orstatus
GET /pins?cid=Qm1.Qm2...Qm10
whichProposed change
PinResults.count
PinResults.more
(true/false bool)/service/pins/healthcheck
that allows inexpensive health check/service/pins/stats
that returns total pin counts for each statusThis would be a breaking change that requires v2.x.x of this spec and coordinated effort across the stack:
go-ipfs, js-ipfs, ipfs-webui, ipfs-desktop and Pinata.
Is it really breaking? Needs analysis.
I've been thinking a bit about a way to do this in backward compatible way and we could have
PinResults.count
as an optional field, and always return1
so the pagination and health checks in old clients still works.👉 Before we consider doing this, the main question is how old client in go-ipfs will react to an unexpected
PinResults.more
– if it is ignored and nothing crashes, then we have a better path forward.If not, we would break our users, and need to coordinate go-ipfs/ipfs-webui/ipfs-desktop and Pinata to make the change around the same time to limit the damage. TBD is this is acceptable, considering the performanc benfits in the long run.
cc @obo20 @aschmahmann
The text was updated successfully, but these errors were encountered: