Fix limit of records allowed to be fetched in a single query / request #88

asishallab · 2020-03-17T16:38:18Z

The problem

Imagine we have our classical Person has Dogs has Fleas models, the global constant maximum limit of records is 10,000 (10 k), and we fetch 10 k persons where each of them has 10 k dogs where in turn each dog has 10 k fleas. Assuming that a records without associations is roughly 20 Bytes in size we would need approximately 20 Terrabyte of memory to process this.

(1e4 ^ 3 * 20) / (1024^3) = 18626.45

We need to fix this possible open door for malicious clients.

First extend our GraphQL `context` as follows

app.use('/graphql', cors(), graphqlHTTP((req) => ({
  schema: Schema,
  rootValue: resolvers,
  context: {
    recordsLimit: globals.RECORD_LIMIT
 // ...

Then in each resolver implement the following behavior

Count the records matching the current search arguments.
1.a If the count exceeds context.recordsLimit throw an error
1.b Else fetch the matching records into say resultRecords
In order to take into account the parallel resolver problem (see below) compare again the number of fetched records (nr) and the current context.recordsLimit
2.a If nr exceeds context.recordsLimit throw an error
2.b Else subtract from context.recordsLimit nr and return the resultRecords

Verify, if the above behavior is possible to implement in the resolver and not the model layer. If not, pass context.recordsLimit as an argument to the model layer, and implement the above behavior there.

The parallel resolver problem

Imagine we have the following schema: A Person has Dogs and also has Parrots. When using the following GraphQL Query:

{ persons {
  name
  dogs {
    name
    age
  }
  parrots {
    name
    species
  }
}

GraphQL invokes three resolvers separately. First the root resolver persons is invoked. Next, the functions dogs and parrots inside each person object are invoked asynchronously, i.e. "to some degree in parallel". We cannot know in what order both async processes are carried out, and when they actually access context.recordsLimit and especially when they diminish its value by the number of fetched records. But on the other hand we do not want to enforce sequential execution (and hack GraphQL). Thus in theory the following scenario is possible. Both dogs and parrots check the current value of context.recordsLimit before diminishing it, thus find that they can return their respective fetched dogs and parrots, and then diminish context.recordsLimit. In this rare case the recordsLimit will actually not hold. However, because any subsequent fetch will blow up the limit, this scenario most likely does not pose a thread to efficiency. Hence the implementation seems the right way to go.

The text was updated successfully, but these errors were encountered:

asishallab · 2020-03-17T16:46:49Z

For some inspiring experiments see this repo

asishallab · 2020-04-06T09:35:45Z

Note that the above Issue must be implemented in the following data model storage-types:

sql
cenz_server

This must include adapters in the context of distributed data models:

sql-adapter
DDM-adapter

tmvoe · 2020-04-08T12:30:01Z

New branch for the graphql-server: https://github.com/ScienceDb/graphql-server/tree/issue88-record-limit

tmvoe · 2020-04-14T16:37:58Z

Branch in the GraphQL Server Code Generator: https://github.com/ScienceDb/graphql-server-model-codegen/tree/issue88-record-limit

Add count check to the root resolvers: 604a1ee

Test for the record limit (set at 25): 6cfae76

Test adapted for the record limit 10000: 536f16e

tmvoe · 2020-04-15T12:11:55Z

Functionality is now in the master branch (for both the GraphQL Server Code Generator - ecd6213 and the Skeleton Server - Zendro-dev/graphql-server@6d84670) and in case of the StarterPack in the branch https://github.com/ScienceDb/ScienceDbStarterPack/tree/i25-starter-pack-react with commit Zendro-dev/ZendroStarterPack@8e6f879

tmvoe · 2020-05-14T15:28:52Z

A large error regarding efficiency and a case of a functionally wrong implementation was handled in #122

asishallab added the enhancement New feature or request label Mar 17, 2020

asishallab assigned vsuaste, coeit and framirez07 Mar 17, 2020

asishallab assigned katjadohm and tmvoe Apr 6, 2020

asishallab mentioned this issue Apr 9, 2020

Remove mapping of to_one and to_many associations onto internal types #107

Closed

tmvoe unassigned katjadohm Apr 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix limit of records allowed to be fetched in a single query / request #88

Fix limit of records allowed to be fetched in a single query / request #88

asishallab commented Mar 17, 2020

asishallab commented Mar 17, 2020

asishallab commented Apr 6, 2020

tmvoe commented Apr 8, 2020

tmvoe commented Apr 14, 2020

tmvoe commented Apr 15, 2020

tmvoe commented May 14, 2020

Fix limit of records allowed to be fetched in a single query / request #88

Fix limit of records allowed to be fetched in a single query / request #88

Comments

asishallab commented Mar 17, 2020

The problem

First extend our GraphQL context as follows

Then in each resolver implement the following behavior

The parallel resolver problem

asishallab commented Mar 17, 2020

asishallab commented Apr 6, 2020

tmvoe commented Apr 8, 2020

tmvoe commented Apr 14, 2020

tmvoe commented Apr 15, 2020

tmvoe commented May 14, 2020

First extend our GraphQL `context` as follows