-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix limit of records allowed to be fetched in a single query / request #88
Comments
For some inspiring experiments see this repo |
Note that the above Issue must be implemented in the following data model storage-types:
This must include adapters in the context of distributed data models:
|
New branch for the graphql-server: https://github.com/ScienceDb/graphql-server/tree/issue88-record-limit |
Branch in the GraphQL Server Code Generator: https://github.com/ScienceDb/graphql-server-model-codegen/tree/issue88-record-limit Add count check to the root resolvers: 604a1ee Test for the record limit (set at 25): 6cfae76 Test adapted for the record limit 10000: 536f16e |
Functionality is now in the master branch (for both the GraphQL Server Code Generator - ecd6213 and the Skeleton Server - Zendro-dev/graphql-server@6d84670) and in case of the StarterPack in the branch https://github.com/ScienceDb/ScienceDbStarterPack/tree/i25-starter-pack-react with commit Zendro-dev/ZendroStarterPack@8e6f879 |
A large error regarding efficiency and a case of a functionally wrong implementation was handled in #122 |
The problem
Imagine we have our classical Person has Dogs has Fleas models, the global constant maximum limit of records is 10,000 (10 k), and we fetch 10 k persons where each of them has 10 k dogs where in turn each dog has 10 k fleas. Assuming that a records without associations is roughly 20 Bytes in size we would need approximately 20 Terrabyte of memory to process this.
(1e4 ^ 3 * 20) / (1024^3) = 18626.45
We need to fix this possible open door for malicious clients.
First extend our GraphQL
context
as followsThen in each resolver implement the following behavior
1.a If the count exceeds
context.recordsLimit
throw an error1.b Else fetch the matching records into say
resultRecords
nr
) and the currentcontext.recordsLimit
2.a If
nr
exceedscontext.recordsLimit
throw an error2.b Else subtract from
context.recordsLimit
nr
and return theresultRecords
Verify, if the above behavior is possible to implement in the resolver and not the model layer. If not, pass
context.recordsLimit
as an argument to the model layer, and implement the above behavior there.The parallel resolver problem
Imagine we have the following schema: A Person has Dogs and also has Parrots. When using the following GraphQL Query:
GraphQL invokes three resolvers separately. First the root resolver
persons
is invoked. Next, the functionsdogs
andparrots
inside eachperson
object are invoked asynchronously, i.e. "to some degree in parallel". We cannot know in what order both async processes are carried out, and when they actually accesscontext.recordsLimit
and especially when they diminish its value by the number of fetched records. But on the other hand we do not want to enforce sequential execution (and hack GraphQL). Thus in theory the following scenario is possible. Bothdogs
andparrots
check the current value ofcontext.recordsLimit
before diminishing it, thus find that they can return their respective fetched dogs and parrots, and then diminishcontext.recordsLimit
. In this rare case therecordsLimit
will actually not hold. However, because any subsequent fetch will blow up the limit, this scenario most likely does not pose a thread to efficiency. Hence the implementation seems the right way to go.The text was updated successfully, but these errors were encountered: