Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

useful but not critical: ability to do "scrolling" GET queries with v3 Monarch API #789

Closed
colleenXu opened this issue Feb 29, 2024 · 10 comments
Assignees
Labels
enhancement New feature or request On Test -> Prod

Comments

@colleenXu
Copy link
Collaborator

In #774, we migrated to the new v3 Monarch API (using the /association endpoint). This is one of the follow-up tasks that would be useful to do.

The endpoint seems to only allow us to retrieve 500 items at a time (I encountered error 500 (Internal Service Error) when trying to set the limit parameter to > 500 or to -1. -1 worked w/ the old API to return all hits).

Being able to retrieve all the items in the response would be nice, as "scrolling" GET queries. The API responses have a total field that tells us how many total items there are. Ex: this has > 2000 items.
And there is an offset parameter for requests, that we could use to retrieve the next 500 items, and the next, until we've retrieved all the items.


Jackson says there should be code to handle scrolling GET queries already, it just needs to be applied (and customized?) for v3 Monarch API.

They also said that it may be helpful to have an operation-level flag (like supportBatch) that flags operations where we want to do these scrolling GET queries.

On thing I'm unsure of is if all cases can be handled exactly the same way though (ex: same parameter name, same field the response to look for).

@colleenXu colleenXu added the enhancement New feature or request label Feb 29, 2024
@colleenXu
Copy link
Collaborator Author

@rjawesome Jackson said this may be an issue you could handle. It's not high priority though, if you have other tasks.

@rjawesome
Copy link
Contributor

rjawesome commented May 23, 2024

I added a special case to the pagination logic to support Monarch.

I also added a feature where pagination can be specified through SmartAPI yamls. For example, for the MonarchAPI the appropriate lines to be added to the yaml for a given operation would be:

parameters:
  ...
  limit: 500
  offset: "{{ start }}" # {{ start }} is a special field in templating to denote what numbered entry the pagination is at ( similar to {{ queryInputs }})
...
pagination:
  countField: items # number of hits from this response, can point to an array (length is taken) or a numerical field
  totalField: total # field in response containing total number of hits from this request
  pageSize: 500
...

(these don't need to be added for Monarch since I implemented it as a special case similar to biothigns)

@tokebe
Copy link
Member

tokebe commented Jun 25, 2024

@colleenXu I believe the next step for this issue would be for you to review/approve @rjawesome's interface in the SmartAPI yamls. If that interface makes sense, I can start testing and then add to the dev branch/instance.

@colleenXu
Copy link
Collaborator Author

colleenXu commented Jun 27, 2024

That x-bte annotation looks okay from my POV and looks simple/generic enough to work for other external API cases. Unfortunately, we don't have any such cases right now, so I can't back up my words :P.

This is my understanding, but maybe @rjawesome can clarify:

  • offset could be replaced with whatever the actual parameter is - for example, BioThings APIs use from. The parameter should be the "number of matching hits to skip before collecting" (OpenAPI spec has examples w/ that definition).
  • the countField is a little tricky, but I understand why it's that way. Monarch API doesn't have a numeric "here's how many items are in this response". So we're assuming these kinds of responses have a list with all the items/hits that we can get the length for. I guess that's okay...
  • the rest looks fine (totalField, pageSize)

@tokebe Do you want to switch Monarch over to the x-bte annotation method, or are you just testing this new feature out? It sounds like @rjawesome implemented Monarch as a special case without using the new feature with SmartAPI-yaml/x-bte annotation editing...

@tokebe tokebe added On Dev Related changes are deployed to Dev server On CI Related changes are deployed to CI server and removed On Dev Related changes are deployed to Dev server labels Aug 19, 2024
@colleenXu
Copy link
Collaborator Author

colleenXu commented Aug 21, 2024

I got a 500 error while testing this: TypeError: TypeError: Cannot read properties of undefined (reading '1').

I ran locally in CI mode (INSTANCE_ENV=ci pnpm start redis) while using the self-edge-removal code (and main branches for everything else).

In the console logs below, I removed the Sentry logs.

TRAPI query used for testing, should get 2257 records from Monarch

Based on this query to Monarch API

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "categories":["biolink:AnatomicalEntity"],
                    "ids":["UBERON:0000178"]
                },
                "n1": {
                    "categories":["biolink:Gene"]
               }
            },
            "edges": {
                "eA": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:expresses"]
                }
            }
        }
    }
}

The pagination logs look good. It does several rounds to get all 2257 records

  bte:call-apis:query {
  bte:call-apis:query   url: 'https://api-v3.monarchinitiative.org/v3/api/association',
  bte:call-apis:query   params: {
  bte:call-apis:query     category: 'biolink:GeneToExpressionSiteAssociation',
  bte:call-apis:query     direct: true,
  bte:call-apis:query     format: 'json',
  bte:call-apis:query     limit: 500,
  bte:call-apis:query     object: 'UBERON:0000178',
  bte:call-apis:query     object_namespace: 'UBERON',
  bte:call-apis:query     predicate: 'biolink:expressed_in',
  bte:call-apis:query     subject_namespace: 'HGNC',
  bte:call-apis:query     subject_taxon: 'NCBITaxon:9606'
  bte:call-apis:query   },
  bte:call-apis:query   data: undefined,
  bte:call-apis:query   method: 'get',
  bte:call-apis:query   timeout: 50000
  bte:call-apis:query } +12ms


  bte:call-apis:query query success, transforming hits->records... +761ms
  bte:call-apis:query Query requires pagination, will re-query to window 500-1000: https://api-v3.monarchinitiative.org (1 ID) +0ms
  bte:api-response-transform:index api name Monarch API +761ms
  bte:api-response-transform:index api tags: translator +0ms
  bte:call-apis:query Successful GET https://api-v3.monarchinitiative.org (1 ID): GrossAnatomicalStructure > expresses > Gene (obtained 500 records, took 1s) +434ms
  bte:call-apis:query {
  bte:call-apis:query   url: 'https://api-v3.monarchinitiative.org/v3/api/association',
  bte:call-apis:query   params: {
  bte:call-apis:query     category: 'biolink:GeneToExpressionSiteAssociation',
  bte:call-apis:query     direct: true,
  bte:call-apis:query     format: 'json',
  bte:call-apis:query     limit: 500,
  bte:call-apis:query     object: 'UBERON:0000178',
  bte:call-apis:query     object_namespace: 'UBERON',
  bte:call-apis:query     predicate: 'biolink:expressed_in',
  bte:call-apis:query     subject_namespace: 'HGNC',
  bte:call-apis:query     subject_taxon: 'NCBITaxon:9606',
  bte:call-apis:query     offset: 500
  bte:call-apis:query   },
  bte:call-apis:query   data: undefined,
  bte:call-apis:query   method: 'get',
  bte:call-apis:query   timeout: 50000
  bte:call-apis:query } +1ms
  bte:call-apis:query query success, transforming hits->records... +1s
  bte:call-apis:query Query requires pagination, will re-query to window 1000-1500: https://api-v3.monarchinitiative.org (1 ID) +0ms
  bte:api-response-transform:index api name Monarch API +2s
  bte:api-response-transform:index api tags: translator +0ms
  bte:call-apis:query Successful GET https://api-v3.monarchinitiative.org (1 ID): GrossAnatomicalStructure > expresses > Gene (obtained 500 records, took 1s) +450ms
  bte:call-apis:query {
  bte:call-apis:query   url: 'https://api-v3.monarchinitiative.org/v3/api/association',
  bte:call-apis:query   params: {
  bte:call-apis:query     category: 'biolink:GeneToExpressionSiteAssociation',
  bte:call-apis:query     direct: true,
  bte:call-apis:query     format: 'json',
  bte:call-apis:query     limit: 500,
  bte:call-apis:query     object: 'UBERON:0000178',
  bte:call-apis:query     object_namespace: 'UBERON',
  bte:call-apis:query     predicate: 'biolink:expressed_in',
  bte:call-apis:query     subject_namespace: 'HGNC',
  bte:call-apis:query     subject_taxon: 'NCBITaxon:9606',
  bte:call-apis:query     offset: 1000
  bte:call-apis:query   },
  bte:call-apis:query   data: undefined,
  bte:call-apis:query   method: 'get',
  bte:call-apis:query   timeout: 50000
  bte:call-apis:query } +3ms
  bte:call-apis:query query success, transforming hits->records... +1s
  bte:call-apis:query Query requires pagination, will re-query to window 1500-2000: https://api-v3.monarchinitiative.org (1 ID) +0ms
  bte:api-response-transform:index api name Monarch API +2s
  bte:api-response-transform:index api tags: translator +0ms
  bte:call-apis:query Successful GET https://api-v3.monarchinitiative.org (1 ID): GrossAnatomicalStructure > expresses > Gene (obtained 500 records, took 1s) +423ms
  bte:call-apis:query {
  bte:call-apis:query   url: 'https://api-v3.monarchinitiative.org/v3/api/association',
  bte:call-apis:query   params: {
  bte:call-apis:query     category: 'biolink:GeneToExpressionSiteAssociation',
  bte:call-apis:query     direct: true,
  bte:call-apis:query     format: 'json',
  bte:call-apis:query     limit: 500,
  bte:call-apis:query     object: 'UBERON:0000178',
  bte:call-apis:query     object_namespace: 'UBERON',
  bte:call-apis:query     predicate: 'biolink:expressed_in',
  bte:call-apis:query     subject_namespace: 'HGNC',
  bte:call-apis:query     subject_taxon: 'NCBITaxon:9606',
  bte:call-apis:query     offset: 1500
  bte:call-apis:query   },
  bte:call-apis:query   data: undefined,
  bte:call-apis:query   method: 'get',
  bte:call-apis:query   timeout: 50000
  bte:call-apis:query } +1ms
  bte:call-apis:query query success, transforming hits->records... +1s
  bte:call-apis:query Query requires pagination, will re-query to window 2000-2500: https://api-v3.monarchinitiative.org (1 ID) +0ms
  bte:api-response-transform:index api name Monarch API +1s
  bte:api-response-transform:index api tags: translator +0ms
  bte:call-apis:query Successful GET https://api-v3.monarchinitiative.org (1 ID): GrossAnatomicalStructure > expresses > Gene (obtained 500 records, took 1s) +420ms
  bte:call-apis:query {
  bte:call-apis:query   url: 'https://api-v3.monarchinitiative.org/v3/api/association',
  bte:call-apis:query   params: {
  bte:call-apis:query     category: 'biolink:GeneToExpressionSiteAssociation',
  bte:call-apis:query     direct: true,
  bte:call-apis:query     format: 'json',
  bte:call-apis:query     limit: 500,
  bte:call-apis:query     object: 'UBERON:0000178',
  bte:call-apis:query     object_namespace: 'UBERON',
  bte:call-apis:query     predicate: 'biolink:expressed_in',
  bte:call-apis:query     subject_namespace: 'HGNC',
  bte:call-apis:query     subject_taxon: 'NCBITaxon:9606',
  bte:call-apis:query     offset: 2000
  bte:call-apis:query   },
  bte:call-apis:query   data: undefined,
  bte:call-apis:query   method: 'get',
  bte:call-apis:query   timeout: 50000
  bte:call-apis:query } +2ms
  bte:call-apis:query query success, transforming hits->records... +895ms
  bte:api-response-transform:index api name Monarch API +1s
  bte:api-response-transform:index api tags: translator +0ms
  bte:call-apis:query Successful GET https://api-v3.monarchinitiative.org (1 ID): GrossAnatomicalStructure > expresses > Gene (obtained 257 records, took 895ms) +217ms
  bte:call-apis:query query completes. +8s

But then the 500 error comes up after scoring

  bte:biothings-explorer-trapi:Graph Updating BTE Graph now. +0ms
  bte:biothings-explorer-trapi:edge-manager (13) Edge Manager reporting organized records... +252ms
  bte:biothings-explorer-trapi:QueryResult Updating query results now! +0ms
  bte:biothings-explorer-trapi:score Querying 2743 combos. +0ms
  bte:biothings-explorer-trapi:score 11 / 0 / 0 queries successful / errored / timed out, representing 2743 / 0 / 0 pairs +3s
  bte:biothings-explorer-trapi:asyncquery_queue Async job uzRNMILvCy failed with error Cannot read properties of undefined (reading '1') +3m
  bte:biothings-explorer-trapi:threading Worker thread 2 terminated successfully. +3m
  bte:biothings-explorer-trapi:error_handler TypeError: Cannot read properties of undefined (reading '1')
  bte:biothings-explorer-trapi:error_handler     at /Users/colleenxu/Desktop/biothings_explorer/packages/query_graph_handler/built/index.js:212:19
  bte:biothings-explorer-trapi:error_handler     at Array.forEach (<anonymous>)
  bte:biothings-explorer-trapi:error_handler     at /Users/colleenxu/Desktop/biothings_explorer/packages/query_graph_handler/built/index.js:196:28
  bte:biothings-explorer-trapi:error_handler     at Array.forEach (<anonymous>)
  bte:biothings-explorer-trapi:error_handler     at TRAPIQueryHandler.createSubclassSupportGraphs (/Users/colleenxu/Desktop/biothings_explorer/packages/query_graph_handler/built/index.js:191:42)
  bte:biothings-explorer-trapi:error_handler     at TRAPIQueryHandler.query (/Users/colleenxu/Desktop/biothings_explorer/packages/query_graph_handler/built/index.js:641:14)
  bte:biothings-explorer-trapi:error_handler     at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
  bte:biothings-explorer-trapi:error_handler     at async V1Query.task (/Users/colleenxu/Desktop/biothings_explorer/packages/bte-server/built/routes/v1/query_v1.js:59:13)
  bte:biothings-explorer-trapi:error_handler     at async runTask (/Users/colleenxu/Desktop/biothings_explorer/packages/bte-server/built/controllers/threading/taskHandler.js:112:27)
  bte:biothings-explorer-trapi:error_handler     at async /Users/colleenxu/Desktop/biothings_explorer/node_modules/.pnpm/[email protected]/node_modules/piscina/dist/src/worker.js:141:26 +3m


But this smaller query (only 1 round of pagination) ran successfully without errors:

TRAPI query, should retrieve 506 records from Monarch

from this comment

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "categories":["biolink:AnatomicalEntity"],
                    "ids":["UBERON:0002240"]
                },
                "n1": {
                    "categories":["biolink:Gene"]
               }
            },
            "edges": {
                "eA": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:expresses"]
                }
            }
        }
    }
}

Console logs retrieving all 506 records

  bte:call-apis:query {
  bte:call-apis:query   url: 'https://api-v3.monarchinitiative.org/v3/api/association',
  bte:call-apis:query   params: {
  bte:call-apis:query     category: 'biolink:GeneToExpressionSiteAssociation',
  bte:call-apis:query     direct: true,
  bte:call-apis:query     format: 'json',
  bte:call-apis:query     limit: 500,
  bte:call-apis:query     object: 'UBERON:0002240',
  bte:call-apis:query     object_namespace: 'UBERON',
  bte:call-apis:query     predicate: 'biolink:expressed_in',
  bte:call-apis:query     subject_namespace: 'HGNC',
  bte:call-apis:query     subject_taxon: 'NCBITaxon:9606'
  bte:call-apis:query   },
  bte:call-apis:query   data: undefined,
  bte:call-apis:query   method: 'get',
  bte:call-apis:query   timeout: 50000
  bte:call-apis:query } +15ms
  bte:call-apis:query query success, transforming hits->records... +256ms
  bte:call-apis:query Query requires pagination, will re-query to window 500-1000: https://api-v3.monarchinitiative.org (1 ID) +0ms
  bte:api-response-transform:index api name Monarch API +259ms
  bte:api-response-transform:index api tags: translator +0ms
  bte:call-apis:query Successful GET https://api-v3.monarchinitiative.org (1 ID): GrossAnatomicalStructure > expresses > Gene (obtained 500 records, took 1s) +530ms
  bte:call-apis:query {
  bte:call-apis:query   url: 'https://api-v3.monarchinitiative.org/v3/api/association',
  bte:call-apis:query   params: {
  bte:call-apis:query     category: 'biolink:GeneToExpressionSiteAssociation',
  bte:call-apis:query     direct: true,
  bte:call-apis:query     format: 'json',
  bte:call-apis:query     limit: 500,
  bte:call-apis:query     object: 'UBERON:0002240',
  bte:call-apis:query     object_namespace: 'UBERON',
  bte:call-apis:query     predicate: 'biolink:expressed_in',
  bte:call-apis:query     subject_namespace: 'HGNC',
  bte:call-apis:query     subject_taxon: 'NCBITaxon:9606',
  bte:call-apis:query     offset: 500
  bte:call-apis:query   },
  bte:call-apis:query   data: undefined,
  bte:call-apis:query   method: 'get',
  bte:call-apis:query   timeout: 50000
  bte:call-apis:query } +4ms
  bte:call-apis:query query success, transforming hits->records... +286ms
  bte:api-response-transform:index api name Monarch API +820ms
  bte:api-response-transform:index api tags: translator +0ms
  bte:call-apis:query Successful GET https://api-v3.monarchinitiative.org (1 ID): GrossAnatomicalStructure > expresses > Gene (obtained 6 records, took 285ms) +19ms
  bte:call-apis:query query completes. +2s

@tokebe
Copy link
Member

tokebe commented Aug 21, 2024

This appears to happen regardless of the new subclassing code, I'm looking into it.

@tokebe
Copy link
Member

tokebe commented Aug 21, 2024

Turns out our HP and MONDO ontologies contain id prefixes that are not HP or MONDO, respectively, which breaks our ontology source detection. I'm going to have to make some changes to node-expansion to return the specific source of each expanded ID, because there's overlap.

@colleenXu
Copy link
Collaborator Author

colleenXu commented Aug 21, 2024

Context: the error revealed a bug in node-expansion, coming from MONDO ontology having non-MONDO ID info - including mismatched ID namespaces for parent-child pairs. HP has this issue too.

The first test query used an UBERON ID, which had some child info in those files with a different ID namespace.

The solution is to remove those mismatched pairs.

Jackson has put a fix into the subclassing-fix branches (deployment tracked in #850)

@colleenXu
Copy link
Collaborator Author

colleenXu commented Aug 21, 2024

@tokebe

Things look good!

I retested using the subclassing-fix branches.

  1. Now the first test query UBERON:0000178 (blood) works without errors: monarch-testing-blood-expresses.json.zip
    • There's some subclassing due to 20 CHP edges using the subclass UBERON:0013756 (venous blood). That part looks okay.
  2. Other test queries work without issues too.

@tokebe tokebe added On CI -> Test and removed On CI Related changes are deployed to CI server labels Aug 22, 2024
@tokebe tokebe added On Test Related changes are deployed to Test server and removed On CI -> Test labels Aug 23, 2024
@colleenXu colleenXu added On Test -> Prod and removed On Test Related changes are deployed to Test server labels Aug 28, 2024
@tokebe
Copy link
Member

tokebe commented Sep 3, 2024

Relevant changes deployed to Prod.

@tokebe tokebe closed this as completed Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request On Test -> Prod
Projects
None yet
Development

No branches or pull requests

3 participants