useful but not critical: ability to do "scrolling" GET queries with v3 Monarch API #789

colleenXu · 2024-02-29T01:37:29Z

In #774, we migrated to the new v3 Monarch API (using the /association endpoint). This is one of the follow-up tasks that would be useful to do.

The endpoint seems to only allow us to retrieve 500 items at a time (I encountered error 500 (Internal Service Error) when trying to set the limit parameter to > 500 or to -1. -1 worked w/ the old API to return all hits).

Being able to retrieve all the items in the response would be nice, as "scrolling" GET queries. The API responses have a total field that tells us how many total items there are. Ex: this has > 2000 items.
And there is an offset parameter for requests, that we could use to retrieve the next 500 items, and the next, until we've retrieved all the items.

Jackson says there should be code to handle scrolling GET queries already, it just needs to be applied (and customized?) for v3 Monarch API.

They also said that it may be helpful to have an operation-level flag (like supportBatch) that flags operations where we want to do these scrolling GET queries.

On thing I'm unsure of is if all cases can be handled exactly the same way though (ex: same parameter name, same field the response to look for).

The text was updated successfully, but these errors were encountered:

colleenXu · 2024-02-29T01:38:02Z

@rjawesome Jackson said this may be an issue you could handle. It's not high priority though, if you have other tasks.

rjawesome · 2024-05-23T23:50:56Z

I added a special case to the pagination logic to support Monarch.

I also added a feature where pagination can be specified through SmartAPI yamls. For example, for the MonarchAPI the appropriate lines to be added to the yaml for a given operation would be:

parameters:
  ...
  limit: 500
  offset: "{{ start }}" # {{ start }} is a special field in templating to denote what numbered entry the pagination is at ( similar to {{ queryInputs }})
...
pagination:
  countField: items # number of hits from this response, can point to an array (length is taken) or a numerical field
  totalField: total # field in response containing total number of hits from this request
  pageSize: 500
...

(these don't need to be added for Monarch since I implemented it as a special case similar to biothigns)

tokebe · 2024-06-25T19:25:12Z

@colleenXu I believe the next step for this issue would be for you to review/approve @rjawesome's interface in the SmartAPI yamls. If that interface makes sense, I can start testing and then add to the dev branch/instance.

colleenXu · 2024-06-27T21:07:22Z

That x-bte annotation looks okay from my POV and looks simple/generic enough to work for other external API cases. Unfortunately, we don't have any such cases right now, so I can't back up my words :P.

This is my understanding, but maybe @rjawesome can clarify:

offset could be replaced with whatever the actual parameter is - for example, BioThings APIs use from. The parameter should be the "number of matching hits to skip before collecting" (OpenAPI spec has examples w/ that definition).
the countField is a little tricky, but I understand why it's that way. Monarch API doesn't have a numeric "here's how many items are in this response". So we're assuming these kinds of responses have a list with all the items/hits that we can get the length for. I guess that's okay...
the rest looks fine (totalField, pageSize)

@tokebe Do you want to switch Monarch over to the x-bte annotation method, or are you just testing this new feature out? It sounds like @rjawesome implemented Monarch as a special case without using the new feature with SmartAPI-yaml/x-bte annotation editing...

colleenXu · 2024-08-21T04:29:41Z

I got a 500 error while testing this: TypeError: TypeError: Cannot read properties of undefined (reading '1').

I ran locally in CI mode (INSTANCE_ENV=ci pnpm start redis) while using the self-edge-removal code (and main branches for everything else).

In the console logs below, I removed the Sentry logs.

TRAPI query used for testing, should get 2257 records from Monarch

Based on this query to Monarch API

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "categories":["biolink:AnatomicalEntity"],
                    "ids":["UBERON:0000178"]
                },
                "n1": {
                    "categories":["biolink:Gene"]
               }
            },
            "edges": {
                "eA": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:expresses"]
                }
            }
        }
    }
}

The pagination logs look good. It does several rounds to get all 2257 records

  bte:call-apis:query {
  bte:call-apis:query   url: 'https://api-v3.monarchinitiative.org/v3/api/association',
  bte:call-apis:query   params: {
  bte:call-apis:query     category: 'biolink:GeneToExpressionSiteAssociation',
  bte:call-apis:query     direct: true,
  bte:call-apis:query     format: 'json',
  bte:call-apis:query     limit: 500,
  bte:call-apis:query     object: 'UBERON:0000178',
  bte:call-apis:query     object_namespace: 'UBERON',
  bte:call-apis:query     predicate: 'biolink:expressed_in',
  bte:call-apis:query     subject_namespace: 'HGNC',
  bte:call-apis:query     subject_taxon: 'NCBITaxon:9606'
  bte:call-apis:query   },
  bte:call-apis:query   data: undefined,
  bte:call-apis:query   method: 'get',
  bte:call-apis:query   timeout: 50000
  bte:call-apis:query } +12ms


  bte:call-apis:query query success, transforming hits->records... +761ms
  bte:call-apis:query Query requires pagination, will re-query to window 500-1000: https://api-v3.monarchinitiative.org (1 ID) +0ms
  bte:api-response-transform:index api name Monarch API +761ms
  bte:api-response-transform:index api tags: translator +0ms
  bte:call-apis:query Successful GET https://api-v3.monarchinitiative.org (1 ID): GrossAnatomicalStructure > expresses > Gene (obtained 500 records, took 1s) +434ms

  bte:call-apis:query {
  bte:call-apis:query   url: 'https://api-v3.monarchinitiative.org/v3/api/association',
  bte:call-apis:query   params: {
  bte:call-apis:query     category: 'biolink:GeneToExpressionSiteAssociation',
  bte:call-apis:query     direct: true,
  bte:call-apis:query     format: 'json',
  bte:call-apis:query     limit: 500,
  bte:call-apis:query     object: 'UBERON:0000178',
  bte:call-apis:query     object_namespace: 'UBERON',
  bte:call-apis:query     predicate: 'biolink:expressed_in',
  bte:call-apis:query     subject_namespace: 'HGNC',
  bte:call-apis:query     subject_taxon: 'NCBITaxon:9606',
  bte:call-apis:query     offset: 500
  bte:call-apis:query   },
  bte:call-apis:query   data: undefined,
  bte:call-apis:query   method: 'get',
  bte:call-apis:query   timeout: 50000
  bte:call-apis:query } +1ms
  bte:call-apis:query query success, transforming hits->records... +1s
  bte:call-apis:query Query requires pagination, will re-query to window 1000-1500: https://api-v3.monarchinitiative.org (1 ID) +0ms
  bte:api-response-transform:index api name Monarch API +2s
  bte:api-response-transform:index api tags: translator +0ms
  bte:call-apis:query Successful GET https://api-v3.monarchinitiative.org (1 ID): GrossAnatomicalStructure > expresses > Gene (obtained 500 records, took 1s) +450ms

  bte:call-apis:query {
  bte:call-apis:query   url: 'https://api-v3.monarchinitiative.org/v3/api/association',
  bte:call-apis:query   params: {
  bte:call-apis:query     category: 'biolink:GeneToExpressionSiteAssociation',
  bte:call-apis:query     direct: true,
  bte:call-apis:query     format: 'json',
  bte:call-apis:query     limit: 500,
  bte:call-apis:query     object: 'UBERON:0000178',
  bte:call-apis:query     object_namespace: 'UBERON',
  bte:call-apis:query     predicate: 'biolink:expressed_in',
  bte:call-apis:query     subject_namespace: 'HGNC',
  bte:call-apis:query     subject_taxon: 'NCBITaxon:9606',
  bte:call-apis:query     offset: 1000
  bte:call-apis:query   },
  bte:call-apis:query   data: undefined,
  bte:call-apis:query   method: 'get',
  bte:call-apis:query   timeout: 50000
  bte:call-apis:query } +3ms
  bte:call-apis:query query success, transforming hits->records... +1s
  bte:call-apis:query Query requires pagination, will re-query to window 1500-2000: https://api-v3.monarchinitiative.org (1 ID) +0ms
  bte:api-response-transform:index api name Monarch API +2s
  bte:api-response-transform:index api tags: translator +0ms
  bte:call-apis:query Successful GET https://api-v3.monarchinitiative.org (1 ID): GrossAnatomicalStructure > expresses > Gene (obtained 500 records, took 1s) +423ms

  bte:call-apis:query {
  bte:call-apis:query   url: 'https://api-v3.monarchinitiative.org/v3/api/association',
  bte:call-apis:query   params: {
  bte:call-apis:query     category: 'biolink:GeneToExpressionSiteAssociation',
  bte:call-apis:query     direct: true,
  bte:call-apis:query     format: 'json',
  bte:call-apis:query     limit: 500,
  bte:call-apis:query     object: 'UBERON:0000178',
  bte:call-apis:query     object_namespace: 'UBERON',
  bte:call-apis:query     predicate: 'biolink:expressed_in',
  bte:call-apis:query     subject_namespace: 'HGNC',
  bte:call-apis:query     subject_taxon: 'NCBITaxon:9606',
  bte:call-apis:query     offset: 1500
  bte:call-apis:query   },
  bte:call-apis:query   data: undefined,
  bte:call-apis:query   method: 'get',
  bte:call-apis:query   timeout: 50000
  bte:call-apis:query } +1ms
  bte:call-apis:query query success, transforming hits->records... +1s
  bte:call-apis:query Query requires pagination, will re-query to window 2000-2500: https://api-v3.monarchinitiative.org (1 ID) +0ms
  bte:api-response-transform:index api name Monarch API +1s
  bte:api-response-transform:index api tags: translator +0ms
  bte:call-apis:query Successful GET https://api-v3.monarchinitiative.org (1 ID): GrossAnatomicalStructure > expresses > Gene (obtained 500 records, took 1s) +420ms

  bte:call-apis:query {
  bte:call-apis:query   url: 'https://api-v3.monarchinitiative.org/v3/api/association',
  bte:call-apis:query   params: {
  bte:call-apis:query     category: 'biolink:GeneToExpressionSiteAssociation',
  bte:call-apis:query     direct: true,
  bte:call-apis:query     format: 'json',
  bte:call-apis:query     limit: 500,
  bte:call-apis:query     object: 'UBERON:0000178',
  bte:call-apis:query     object_namespace: 'UBERON',
  bte:call-apis:query     predicate: 'biolink:expressed_in',
  bte:call-apis:query     subject_namespace: 'HGNC',
  bte:call-apis:query     subject_taxon: 'NCBITaxon:9606',
  bte:call-apis:query     offset: 2000
  bte:call-apis:query   },
  bte:call-apis:query   data: undefined,
  bte:call-apis:query   method: 'get',
  bte:call-apis:query   timeout: 50000
  bte:call-apis:query } +2ms
  bte:call-apis:query query success, transforming hits->records... +895ms
  bte:api-response-transform:index api name Monarch API +1s
  bte:api-response-transform:index api tags: translator +0ms
  bte:call-apis:query Successful GET https://api-v3.monarchinitiative.org (1 ID): GrossAnatomicalStructure > expresses > Gene (obtained 257 records, took 895ms) +217ms
  bte:call-apis:query query completes. +8s

But then the 500 error comes up after scoring

  bte:biothings-explorer-trapi:Graph Updating BTE Graph now. +0ms
  bte:biothings-explorer-trapi:edge-manager (13) Edge Manager reporting organized records... +252ms
  bte:biothings-explorer-trapi:QueryResult Updating query results now! +0ms
  bte:biothings-explorer-trapi:score Querying 2743 combos. +0ms
  bte:biothings-explorer-trapi:score 11 / 0 / 0 queries successful / errored / timed out, representing 2743 / 0 / 0 pairs +3s
  bte:biothings-explorer-trapi:asyncquery_queue Async job uzRNMILvCy failed with error Cannot read properties of undefined (reading '1') +3m
  bte:biothings-explorer-trapi:threading Worker thread 2 terminated successfully. +3m
  bte:biothings-explorer-trapi:error_handler TypeError: Cannot read properties of undefined (reading '1')
  bte:biothings-explorer-trapi:error_handler     at /Users/colleenxu/Desktop/biothings_explorer/packages/query_graph_handler/built/index.js:212:19
  bte:biothings-explorer-trapi:error_handler     at Array.forEach (<anonymous>)
  bte:biothings-explorer-trapi:error_handler     at /Users/colleenxu/Desktop/biothings_explorer/packages/query_graph_handler/built/index.js:196:28
  bte:biothings-explorer-trapi:error_handler     at Array.forEach (<anonymous>)
  bte:biothings-explorer-trapi:error_handler     at TRAPIQueryHandler.createSubclassSupportGraphs (/Users/colleenxu/Desktop/biothings_explorer/packages/query_graph_handler/built/index.js:191:42)
  bte:biothings-explorer-trapi:error_handler     at TRAPIQueryHandler.query (/Users/colleenxu/Desktop/biothings_explorer/packages/query_graph_handler/built/index.js:641:14)
  bte:biothings-explorer-trapi:error_handler     at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
  bte:biothings-explorer-trapi:error_handler     at async V1Query.task (/Users/colleenxu/Desktop/biothings_explorer/packages/bte-server/built/routes/v1/query_v1.js:59:13)
  bte:biothings-explorer-trapi:error_handler     at async runTask (/Users/colleenxu/Desktop/biothings_explorer/packages/bte-server/built/controllers/threading/taskHandler.js:112:27)
  bte:biothings-explorer-trapi:error_handler     at async /Users/colleenxu/Desktop/biothings_explorer/node_modules/.pnpm/[email protected]/node_modules/piscina/dist/src/worker.js:141:26 +3m

But this smaller query (only 1 round of pagination) ran successfully without errors:

TRAPI query, should retrieve 506 records from Monarch

from this comment

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "categories":["biolink:AnatomicalEntity"],
                    "ids":["UBERON:0002240"]
                },
                "n1": {
                    "categories":["biolink:Gene"]
               }
            },
            "edges": {
                "eA": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:expresses"]
                }
            }
        }
    }
}

Console logs retrieving all 506 records

  bte:call-apis:query {
  bte:call-apis:query   url: 'https://api-v3.monarchinitiative.org/v3/api/association',
  bte:call-apis:query   params: {
  bte:call-apis:query     category: 'biolink:GeneToExpressionSiteAssociation',
  bte:call-apis:query     direct: true,
  bte:call-apis:query     format: 'json',
  bte:call-apis:query     limit: 500,
  bte:call-apis:query     object: 'UBERON:0002240',
  bte:call-apis:query     object_namespace: 'UBERON',
  bte:call-apis:query     predicate: 'biolink:expressed_in',
  bte:call-apis:query     subject_namespace: 'HGNC',
  bte:call-apis:query     subject_taxon: 'NCBITaxon:9606'
  bte:call-apis:query   },
  bte:call-apis:query   data: undefined,
  bte:call-apis:query   method: 'get',
  bte:call-apis:query   timeout: 50000
  bte:call-apis:query } +15ms
  bte:call-apis:query query success, transforming hits->records... +256ms
  bte:call-apis:query Query requires pagination, will re-query to window 500-1000: https://api-v3.monarchinitiative.org (1 ID) +0ms
  bte:api-response-transform:index api name Monarch API +259ms
  bte:api-response-transform:index api tags: translator +0ms
  bte:call-apis:query Successful GET https://api-v3.monarchinitiative.org (1 ID): GrossAnatomicalStructure > expresses > Gene (obtained 500 records, took 1s) +530ms

  bte:call-apis:query {
  bte:call-apis:query   url: 'https://api-v3.monarchinitiative.org/v3/api/association',
  bte:call-apis:query   params: {
  bte:call-apis:query     category: 'biolink:GeneToExpressionSiteAssociation',
  bte:call-apis:query     direct: true,
  bte:call-apis:query     format: 'json',
  bte:call-apis:query     limit: 500,
  bte:call-apis:query     object: 'UBERON:0002240',
  bte:call-apis:query     object_namespace: 'UBERON',
  bte:call-apis:query     predicate: 'biolink:expressed_in',
  bte:call-apis:query     subject_namespace: 'HGNC',
  bte:call-apis:query     subject_taxon: 'NCBITaxon:9606',
  bte:call-apis:query     offset: 500
  bte:call-apis:query   },
  bte:call-apis:query   data: undefined,
  bte:call-apis:query   method: 'get',
  bte:call-apis:query   timeout: 50000
  bte:call-apis:query } +4ms
  bte:call-apis:query query success, transforming hits->records... +286ms
  bte:api-response-transform:index api name Monarch API +820ms
  bte:api-response-transform:index api tags: translator +0ms
  bte:call-apis:query Successful GET https://api-v3.monarchinitiative.org (1 ID): GrossAnatomicalStructure > expresses > Gene (obtained 6 records, took 285ms) +19ms
  bte:call-apis:query query completes. +2s

tokebe · 2024-08-21T16:39:50Z

This appears to happen regardless of the new subclassing code, I'm looking into it.

tokebe · 2024-08-21T17:06:13Z

Turns out our HP and MONDO ontologies contain id prefixes that are not HP or MONDO, respectively, which breaks our ontology source detection. I'm going to have to make some changes to node-expansion to return the specific source of each expanded ID, because there's overlap.

colleenXu · 2024-08-21T19:24:43Z

Context: the error revealed a bug in node-expansion, coming from MONDO ontology having non-MONDO ID info - including mismatched ID namespaces for parent-child pairs. HP has this issue too.

The first test query used an UBERON ID, which had some child info in those files with a different ID namespace.

The solution is to remove those mismatched pairs.

Jackson has put a fix into the subclassing-fix branches (deployment tracked in #850)

colleenXu · 2024-08-21T19:43:57Z

@tokebe

Things look good!

I retested using the subclassing-fix branches.

Now the first test query UBERON:0000178 (blood) works without errors: monarch-testing-blood-expresses.json.zip
- There's some subclassing due to 20 CHP edges using the subclass UBERON:0013756 (venous blood). That part looks okay.
Other test queries work without issues too.
- monarch-eye-gland.json: has 2 Monarch edges both connected to subclass UBERON:0001817 (lacrimal gland)
- monarch-pheno-disease-test.json.zip: starts with HP:0001317 (abnormal cerebellum morphology), retrieved edges use subclasses like HP:0006855 (Cerebellar vermis atrophy)

tokebe · 2024-09-03T14:12:57Z

Relevant changes deployed to Prod.

colleenXu added the enhancement New feature or request label Feb 29, 2024

rjawesome self-assigned this May 20, 2024

This was referenced May 23, 2024

Monarch pagination biothings/call-apis.js#77

Merged

allow pagination to be passed in through smartapi yaml biothings/smartapi-kg.js#90

Merged

tokebe added On Dev Related changes are deployed to Dev server On CI Related changes are deployed to CI server and removed On Dev Related changes are deployed to Dev server labels Aug 19, 2024

tokebe mentioned this issue Aug 21, 2024

Pass source with descendants biothings/node-expansion#7

Merged

tokebe added On CI -> Test and removed On CI Related changes are deployed to CI server labels Aug 22, 2024

colleenXu mentioned this issue Aug 23, 2024

BTE creating self-edges via subclassing #850

Closed

tokebe added On Test Related changes are deployed to Test server and removed On CI -> Test labels Aug 23, 2024

colleenXu added On Test -> Prod and removed On Test Related changes are deployed to Test server labels Aug 28, 2024

tokebe closed this as completed Sep 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

useful but not critical: ability to do "scrolling" GET queries with v3 Monarch API #789

useful but not critical: ability to do "scrolling" GET queries with v3 Monarch API #789

colleenXu commented Feb 29, 2024

colleenXu commented Feb 29, 2024

rjawesome commented May 23, 2024 •

edited

Loading

tokebe commented Jun 25, 2024 •

edited

Loading

colleenXu commented Jun 27, 2024 •

edited

Loading

colleenXu commented Aug 21, 2024 •

edited

Loading

tokebe commented Aug 21, 2024

tokebe commented Aug 21, 2024

colleenXu commented Aug 21, 2024 •

edited

Loading

colleenXu commented Aug 21, 2024 •

edited

Loading

tokebe commented Sep 3, 2024

useful but not critical: ability to do "scrolling" GET queries with v3 Monarch API #789

useful but not critical: ability to do "scrolling" GET queries with v3 Monarch API #789

Comments

colleenXu commented Feb 29, 2024

colleenXu commented Feb 29, 2024

rjawesome commented May 23, 2024 • edited Loading

tokebe commented Jun 25, 2024 • edited Loading

colleenXu commented Jun 27, 2024 • edited Loading

colleenXu commented Aug 21, 2024 • edited Loading

tokebe commented Aug 21, 2024

tokebe commented Aug 21, 2024

colleenXu commented Aug 21, 2024 • edited Loading

colleenXu commented Aug 21, 2024 • edited Loading

tokebe commented Sep 3, 2024

rjawesome commented May 23, 2024 •

edited

Loading

tokebe commented Jun 25, 2024 •

edited

Loading

colleenXu commented Jun 27, 2024 •

edited

Loading

colleenXu commented Aug 21, 2024 •

edited

Loading

colleenXu commented Aug 21, 2024 •

edited

Loading

colleenXu commented Aug 21, 2024 •

edited

Loading