Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix limit of records allowed to be fetched in a single query / request #88

Open
asishallab opened this issue Mar 17, 2020 · 6 comments
Open
Assignees
Labels
enhancement New feature or request

Comments

@asishallab
Copy link
Member

The problem

Imagine we have our classical Person has Dogs has Fleas models, the global constant maximum limit of records is 10,000 (10 k), and we fetch 10 k persons where each of them has 10 k dogs where in turn each dog has 10 k fleas. Assuming that a records without associations is roughly 20 Bytes in size we would need approximately 20 Terrabyte of memory to process this.

(1e4 ^ 3 * 20) / (1024^3) = 18626.45

We need to fix this possible open door for malicious clients.

First extend our GraphQL context as follows

app.use('/graphql', cors(), graphqlHTTP((req) => ({
  schema: Schema,
  rootValue: resolvers,
  context: {
    recordsLimit: globals.RECORD_LIMIT
 // ...

Then in each resolver implement the following behavior

  1. Count the records matching the current search arguments.
    1.a If the count exceeds context.recordsLimit throw an error
    1.b Else fetch the matching records into say resultRecords
  2. In order to take into account the parallel resolver problem (see below) compare again the number of fetched records (nr) and the current context.recordsLimit
    2.a If nr exceeds context.recordsLimit throw an error
    2.b Else subtract from context.recordsLimit nr and return the resultRecords

Verify, if the above behavior is possible to implement in the resolver and not the model layer. If not, pass context.recordsLimit as an argument to the model layer, and implement the above behavior there.

The parallel resolver problem

Imagine we have the following schema: A Person has Dogs and also has Parrots. When using the following GraphQL Query:

{ persons {
  name
  dogs {
    name
    age
  }
  parrots {
    name
    species
  }
}

GraphQL invokes three resolvers separately. First the root resolver persons is invoked. Next, the functions dogs and parrots inside each person object are invoked asynchronously, i.e. "to some degree in parallel". We cannot know in what order both async processes are carried out, and when they actually access context.recordsLimit and especially when they diminish its value by the number of fetched records. But on the other hand we do not want to enforce sequential execution (and hack GraphQL). Thus in theory the following scenario is possible. Both dogs and parrots check the current value of context.recordsLimit before diminishing it, thus find that they can return their respective fetched dogs and parrots, and then diminish context.recordsLimit. In this rare case the recordsLimit will actually not hold. However, because any subsequent fetch will blow up the limit, this scenario most likely does not pose a thread to efficiency. Hence the implementation seems the right way to go.

@asishallab asishallab added the enhancement New feature or request label Mar 17, 2020
@asishallab
Copy link
Member Author

For some inspiring experiments see this repo

@asishallab
Copy link
Member Author

Note that the above Issue must be implemented in the following data model storage-types:

  • sql
  • cenz_server

This must include adapters in the context of distributed data models:

  • sql-adapter
  • DDM-adapter

@tmvoe
Copy link

tmvoe commented Apr 8, 2020

New branch for the graphql-server: https://github.com/ScienceDb/graphql-server/tree/issue88-record-limit

@tmvoe
Copy link

tmvoe commented Apr 14, 2020

Branch in the GraphQL Server Code Generator: https://github.com/ScienceDb/graphql-server-model-codegen/tree/issue88-record-limit

Add count check to the root resolvers: 604a1ee

Test for the record limit (set at 25): 6cfae76

Test adapted for the record limit 10000: 536f16e

@tmvoe
Copy link

tmvoe commented Apr 15, 2020

Functionality is now in the master branch (for both the GraphQL Server Code Generator - ecd6213 and the Skeleton Server - Zendro-dev/graphql-server@6d84670) and in case of the StarterPack in the branch https://github.com/ScienceDb/ScienceDbStarterPack/tree/i25-starter-pack-react with commit Zendro-dev/ZendroStarterPack@8e6f879

@tmvoe
Copy link

tmvoe commented May 14, 2020

A large error regarding efficiency and a case of a functionally wrong implementation was handled in #122

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants