Integrate API adapters into witnet-rust node #2050

drcpu-github · 2021-09-01T15:25:20Z

drcpu-github
Sep 1, 2021
Collaborator

Problem

A lot of API's require that you specify an API key for requesting certain data. This is obviously not ideal in the context of a public blockchain since you don't want your private API keys lingering around on the blockchain until eternity.

Current solutions

Submit a data request using a plain-text API key that can be invalidated on single use, after some time, ... To the best of my knowledge, you cannot automate something like this with many services. One could always invalidate an API key manually after a data request is processed, but that is quite tedious if the data request is of a repeating nature.
Use a middleware service where the data request queries the middleware server which controls private API key and which in turn queries the official API. Ideally, there are multiple, independently operated versions of said middleware service (as that somewhat covers the otherwise very weak link in a decentralized protocol). However, in the end, the request or the returned data are not guaranteed tamper-proof as they originate from a middleware service and not from the destination API. Furthermore, I think it's hard to prove that those middleware services are really independently operated.

Integrate API adapters into witnet-rust node

An alternative design that is more in line with the decentralized ethos of Witnet could be to allow creation of data requests with an API-key variable. A node operator could then fill in said variable with his own private API-key before launching a query to the API-service.

This could work as follows:

Each node operator can decide that it wants to serve a specific set of API's by defining a set of private API-keys in the witnet.toml configuration file.
A data request contains a variable specifying the need for an API-key: https://api.service.com/{request_something}?api-key={my_api_key}.
Whenever a node is eligible to solve a data request, the node checks whether it has an API key for said service.
Right now, the commit stage can already span up to four epochs to make sure enough commits are generated to solve a data request. This mechanism could be extended towards being able to serve a certain API. If during the first epoch there are not enough commits because not enough node operators can serve a specific API, the difficulty target is lowered so more nodes should be able to solve the request.
After iteratively reducing the RepPoE difficulty, either enough commits are found and the request is resolved like it is normally.
Or there are simply not enough nodes that can serve this API and the data request is resolved with an InsufficientCommits error.

If a data requester wants to request data from a 'new' API (i.e. an API for which node operators don't have an API key), they would have to properly incentivize node operators to create one. I imagine it would be possible for a data requester to discuss the necessity of a new API beforehand with node operators in some off-chain channel such as Discord, or they just create a data request with proper incentives and launch it.

Node operators who are monitoring the blockchain see that it would be economically interesting to add this API to the capabilities of their nodes. After a couple of data requests failed (over the course of a day or so), enough node operators should have upgraded the API-querying capabilities of their nodes for the data request to be successful if it really is economically interesting to serve it.

As far as I can tell, this design would not necessarily need to be activated using TAPI as it is not a consensus-breaking update. However, given the implications, it's probably a good idea to use TAPI for it anyway.

Problems with the alternative solution

The main problem with this design is that it can potentially be gamed by node-operators. It would be possible for someone to craft a data request querying an exotic API in an attempt to influence the reputation of a certain set of (own) nodes.

There are however two counteracting effects in a healthy ecosystem of node-operators. Healthy means we assume there are a significant number of unique node operators which have an economic interest in keeping the network honest.

Commits for data requests can only span up to four epochs. If not enough commits are found by that time, the data request is aborted. This makes it unlikely that a node operator can easily game the system by spinning up a lot of nodes, simply because it is unlikely that a large part of his nodes are ARS nodes and the difficulty will decrease enough to be guaranteed that those nodes are chosen to solve a data request.
It is in the best interest of other node operators to monitor the data requests in the network. If they see something like above attack happening, they could create an API key for this API service thereby preventing the attack. This does assume that the API service allows creating a new key and is not controlled by the attacker.

What are your thoughts on this idea?

aesedepece · 2021-09-01T15:33:10Z

aesedepece
Sep 1, 2021
Maintainer

This would indeed be a breaking change because data source URLs with special patterns should be dealt differently than regular ones . Namely, may a node lack credentials for some API, it should refrain from performing the retrieval and commitment.

0 replies

parodyBit · 2021-09-10T01:18:15Z

parodyBit
Sep 10, 2021
Collaborator

An interesting and much needed thread.

The question is how do we handle private api keys when requests are public and logged onchain.

burner api-keys are not readily available.
managing key cycles is cumbersome without support from the api provider. and current support for the most popular porviders is close to null

Integrating API adapters into the node itself seems like a good solution.

data sources with unique/ special patterns is a breaking change agreed aesedepece-

So if a node lacks credentials it is disqualified from participating in retrievals/commitments for that request. sure-

maybe we can keep the data request homogenous if the RADType parameter of the request is something other than the current default HttpGet -- maybe an additional type to accomodate/ signal to use an api credential stored in the .toml file. so the url could be a generic call to what api with specific query parameters -> resulting in a deterministic protobuf hash.

perhaps even since the protobuf schema for RADType is an enum we could build on that and implement something like a registry similar to the ERC1820 protocol but on Witnet for RADTypes.
define a universal registry where any Node can register/signal which interface(public or private) it supports and which API-Key(if any) is responsible for its retrieval.

[edit]
and if the RADType enum maxes at uint32 -> 0xFFFFFFFF that would be a potential of over 4.2 billion unique API services -> 0x0 being the default HttpGet etc...
That way nodes have control over what RADtypes they support -
-> there is room to leverage adding additional RADtypes-> maybe the best use-case is for integration of private API providers.

suggesting new APIs for node operators to adopt could be accomplished by time-locking requests - > so operators see the opportunity to add that API to the registry before the request is processed.
Also I would expect that if the API being queried is a paid service that the reward would be higher to offset the cost and generate incentive.

Will do some more thinking about this...

0 replies

guidiaz · 2022-08-08T16:59:15Z

guidiaz
Aug 8, 2022
Collaborator

Thanks @aesedepece for letting me know about existence of this thread. Not knowning about it before, and thinking about how private APIs could be supported in a best-effort approach by the Witnet network, I couldn't help but coming up w/ a very similar approach as the alternative design proposed proposed by @drcpu-github like one year ago.

However, just like @aesedepece, I do agree the implementation could involve a breaking-change in the protocol, or perhaps not:

When building a DR, and whenever a source is added to a DR, let's have an optional parameter for declaring a private API kind within a list of supported types (for instance: regular api_key string, oAuth 2.0, ...); if nothing is specified, by default a regular public api will be assumed.
Nodes would be free to add within their toml files a mapping between specific FQDNs (eventually refered from a DR's private source) to a configuration map where the private-api plug-in could be specified, as well all other configuration parameters this plug-in may eventually require. A minimum reward generic parameter could be also considered, enabling the node operator to set the minimum DR reward required for accepting to call upon this private api. If the DR refers more than one private source, the minimum DR reward threshold should be calculated as the sum of the minimum reward for every single private source.
Nodes would have a way to, a priori, differentiate drs refering one or more private apis, from those that that work just on public apis; if private sources are used within a DR, the node will check whether it has some specific config for the corresponding FQDN; if no config is set for any of the private sources, then it would just refrain from attending the DR, avoiding their reputation to be penalized.
When building a DR, let's consider also an optional "commit-phase-max-levels" parameter, that it could be either equal or greater to the "commit-phase-levels" protocol constant currently set to 4 blocks (@tmpolaczyk ?). Perhaps it would be a good idea to set an upper limit to this "commit-phase-max-levels" parameter, as 32 should be enough to cover scenarios of 4-billion witnet identites, and as to avoid the commit-phase of a private-refering DR be running forever. Just like suggested by @drcpu-github, the DR composer would just need to increase the "commit-phase-max-levels" of a data source in order to increase the chances of getting the DR solved, even if not-widely supported private APIs are referred.
Last but not least, as to avoid node operators to artificially increase their reputation by posting fictional DRs refering any random private api, nodes commiting (and revealing) private-referring DRs within a phase higher than the "commit-phase-levels" (defined as constant at the protocol level) would not increase their reputation, at all.

0 replies

aesedepece · 2022-08-08T19:19:25Z

aesedepece
Aug 8, 2022
Maintainer

Let me elaborate on my former comment:

Namely, may a node lack credentials for some API, it should refrain from performing the retrieval and commitment.

I don't care if this is a breaking change, consensus-wise. This above is truly my main concern.

A bit of history on crowd-attestation

Ever since we started designing Witnet, we had this clear idea in mind that for the crowd-attestation mechanism to work, we needed to make sure that the census from which we were randomly sampling witnesses was as big as possible. The underlying principle was always this: it is highly unlikely for an attacker to control a small committee out of a big population of nodes.

From that principle, there was a very immediate takeaway: all witnesses must have the same capabilities.

That decision shaped up Witnet as a totally generic oracle over HTTP. If you can get some piece of data over HTTP, you can have it retrieved, attested and delivered to your smart contract using Witnet. No need for the data providers to make any change to their APIs, and no need for the node operators to add "adapters" to specific APIs.

Already at that point (late 2017), we had started thinking on other "universal" capabilities that we could implement to make the most of the crowd-attestation mechanism. From that brainstorming, there were two different capabilities that stood out. The first one, a RNG, we landed in October 2021. The other one, generalistic offchain computation of WASM code, is yet to be explored.

On custom capabilities

At that point, the "all witnesses must have the same capabilities" was not that written into stone. However, what really made us rule out the possibility of having "custom capabilities" was the added complexity.

This "added complexity" was much related to what @parodyBit mentioned above. When different nodes have different capabilities, the protocol needs to provide specific mechanisms for:

Registering all the existing capabilities
Allowing each node to announce their capabilities (namely their support or ability for fulfilling tasks of that specific type)
Separate censae for each specific capability
A mechanism that adjusts the difficulty (aka "replication factor") of oracle queries depending on the size of the census for the required capability.
Independent reputation systems for each specific capability.

For the sake of being able to deliver the protocol in a reasonable time frame, we preferred to keep it simple, and removed the concept of custom capabilities from our design around spring / summer 2018, when we realized it would really clash with the kind of eligibility and reputation mechanism that we were implementing, because of points 3 to 5 above.

General comment on private APIs

Ok, first I'd like to clarify a few things about private APIs.

For first, a private API is one that requires "credentials". Credentials:

identify each API consumer
are secret
are created in advance, normally through the API provider's website
normally take the form of an access code that must be included inside a request header

There are at least two main types of these private APIs. They look the same from a request and response standpoint, but they're very different in nature:

In type 1 private APIs, the main point of using credentials for identification of API consumers is to keep track of the API usage (such as the number of queries per unit of time), and rate-limit or bill accordingly. As an example, a weather data API may ask for credentials to prevent abuse, or structure a paid model.
In type 2 private APIs however, the main point is to limit access to specific resources, that are personal to each API consumer. As an example, the API of some exchange may ask for credentials to allow you to place orders or make withdrawals from your wallet.

I think we all will agree, that Type 2 APIs are completely incompatible with any kind of third party oracles, specially those based on crowd-attestation, because of the risk of abuse of the credentials by an attacker. There is some research on solving this through cryptographic enclaves and the like, but this approach has been extensively proven to be vulnerable to side channel attacks carried out by the operator of the system that hosts the enclave, or by any other neighbor processes hosted by the same processor (specially scary for nodes run on the cloud).

So let me assume that this discussion is all about type 1 APIs: those that require credentials solely for rate limiting or billing, and that don't handle any PII or privileged access to resources.

API adapters

That's where the concept of API adapters fit in. Generally speaking, API adapters are some kind of custom capabilities that the nodes can opt-in for.

In its most strict form, API adapters are pieces of code that the nodes need to install and configure in order to be able to read from specific APIs and to expose those data points to the entire network. That's how Chainlink adapters work.

In a more loosely coupled model, API adapters could take form of a configuration file local to each node in which FQDNs are mapped to access tokens configured by the node operator. I think this is what @drcpu-github, @parodyBit and @guidiaz are suggesting.

... to be continued ...

0 replies

tmpolaczyk · 2022-08-09T09:00:37Z

tmpolaczyk
Aug 9, 2022

I'm wondering if there is any way a node can prove that they have access to an API key? One obvious proof is if that node has already solved a data request with this API with a non-error result. But not sure how to bootstrap that.

Because if we cannot have a proof and rely on trusting what nodes say, then a malicious user can run a node that claims to support any kind of API. Maybe that's not a problem because of the reputation system, but if the number of different APIs grows then I expect the average number of nodes per API to be quite small, so it may be easy to attack some niche APIs.

8 replies

aesedepece Aug 9, 2022
Maintainer

Now that you mention commit-phase-max-levels....

It seems to me like Guille assumed that commitment transactions can be accumulated over the period of this time limit, and then published together before the time is over, or otherwise the request will resolve to InsufficientCommits. Am I right, @guidiaz ?

Let me clarify that it doesn't work exactly like that. Commitments are one-shot because the function for computing witnessing eligibility takes into account the current epoch. That is, if for a certain epoch following the publication of a data request we don't get enough commitments, we can't bundle them together with new ones that may arrive in the next epoch. Every time a block is consolidated, all commitments in the mempool are invalidated and cleared.

This works like this to maximize the chances that the actual retrievals behind those commitments happened over a short period of time and will therefore tend to coincide. If we allowed accumulating commitments, chances are that the time difference between the first and the list commitment will cause some divergence that may be detrimental to the quality of the result, if not leading to a lack of consensus error.

With this in mind, you can understand that the actual point of that commit-phase-max-levels is to recover from random disruptions in broadcasting of commitment transactions, or from VRF black swan events (the calculated replication factor not producing enough eligible witnesses because of stochastic aberrances).

I'd run away from repurposing commit-phase-max-levels for making up for the fact that allowing nodes to opt-in for private APIs without a proper mechanism to verify that capability would lead to very poorly calculated replication factors.

drcpu-github Aug 9, 2022
Collaborator Author

That's assuming that the majority of nodes that participate in the request are honest, which is a reasonable assumption for normal data requests, but may not be true if we introduce capabilities. Without a "proof of API", an attacker can perform a sibyl attack, and they don't need to have a majority of nodes in the network, they just need to have a majority of nodes that support that specific API. So it may be a big security risk for small APIs. And even with a proof of API this attack still exists, the only difficulty is that the attacker needs to acquire enough API keys.

The combination of the reputation system and data request expiry after 4 blocks solve the problem of this type of sybil attack though? Whether you control a majority of the nodes servicing a specific API does not really matter if that number is small compared to the ARS size (or total number of nodes) because you're unlikely to be eligible to solve a data request using it.

Furthermore, I could craft a data request right now that emulates such an attack without requiring API key capabilities. It is however very unlikely to succeed because of the existence of the reputation system and data request expiry consensus constant. Hence, I don't think the introduction of this feature changes anything (assuming the reputation and expiry system works well).

tmpolaczyk Aug 9, 2022

@drcpu-github I was thinking about the alternative cases where the difficulty is lowered each round until there are enough commits, or the difficulty depends only on the number of nodes that support that API. Because if we keep the current eligibility formula, most data requests with specific APIs will resolve with InsufficientCommits until there is a majority of nodes that support it, and then it would be secure.

drcpu-github Aug 9, 2022
Collaborator Author

We are in agreement then. 😉

We need to be very careful about changing settings that make it easier to solve data requests and I would prefer not touching them at all because it introduces the weaknesses you describe. As I mentioned before, I think this feature should not be aimed at solving data requests using obscure API's. For it to properly work, I make the assumption that oracle network operates optimally and node operators monitor the requests being sent into it. It should then be up to the data requester to properly incentivize operators to add an API to their node or use other sources.

guidiaz Aug 9, 2022
Collaborator

I'd rather suggest to let the data requester decide on security vs functionality. We all agree that not widely-supported private APIs would not be secure. But then again, the data requester may just not need that much security for her app. However, I do agree that, despite the fact that attackers would not be incentivized as a means to increase their reputation (as they would get none), they could easily tamper results of any lowly-accepted private APIs (e.g. by just running a higher number of nodes than those allegedly run by such api demanders).

Still, if we want to be extra careful and avoid the data requester to manually set the commit-phase-max-levels, we could still enable nodes to deterministically know when a data request is referring private apis (like having an extra/optional flag parameter as part of the data request RAD bytecode), so these would only be attended by nodes supporting them. Once this functionality was eventually supported, the Witnet Foundation could just promote some off-chain channels, like a specific discord channel managed by the Witnet Foundation where devs could request support for new private apis, and/or letting the Witnet block explorer show statistics and trends on data requests refering private apis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate API adapters into witnet-rust node #2050

{{title}}

Replies: 5 comments 8 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Integrate API adapters into witnet-rust node #2050

drcpu-github Sep 1, 2021 Collaborator

Problem

Current solutions

Integrate API adapters into witnet-rust node

Problems with the alternative solution

Replies: 5 comments · 8 replies

aesedepece Sep 1, 2021 Maintainer

parodyBit Sep 10, 2021 Collaborator

guidiaz Aug 8, 2022 Collaborator

aesedepece Aug 8, 2022 Maintainer

A bit of history on crowd-attestation

On custom capabilities

General comment on private APIs

API adapters

tmpolaczyk Aug 9, 2022

aesedepece Aug 9, 2022 Maintainer

drcpu-github Aug 9, 2022 Collaborator Author

tmpolaczyk Aug 9, 2022

drcpu-github Aug 9, 2022 Collaborator Author

guidiaz Aug 9, 2022 Collaborator

drcpu-github
Sep 1, 2021
Collaborator

Replies: 5 comments 8 replies

aesedepece
Sep 1, 2021
Maintainer

parodyBit
Sep 10, 2021
Collaborator

guidiaz
Aug 8, 2022
Collaborator

aesedepece
Aug 8, 2022
Maintainer

tmpolaczyk
Aug 9, 2022

aesedepece Aug 9, 2022
Maintainer

drcpu-github Aug 9, 2022
Collaborator Author

drcpu-github Aug 9, 2022
Collaborator Author

guidiaz Aug 9, 2022
Collaborator