Changing Creative mode threshold from results to time #808

tokebe · 2024-04-11T17:56:19Z

Currently, BTE will run templates until it reaches 500 results. If BTE takes 4 minutes to retrieve 499 results from template 1, it'll run template 2 and go over time. Simultaneously, if BTE takes 1 minute to retrieve 500 results from template 1, it won't use any remaining time to check if template 2 has better results.

We should instead check if more than x time is remaining (perhaps 2.5 minutes?), and run the next template if so. We could also include a dryrun step to check how many meta-edges are going to be hit by a given template as a heuristic for how long we might expect a template to take, and compare that against time remaining. This should allow BTE to get the best results given the time remaining, even if it sometimes returns <500 results due to time.

This will require further investigation and discussion before any sort of implementation work can be done.

tokebe · 2024-04-11T18:03:56Z

Note that such an implementation may significantly help results/performance from #794

tokebe · 2024-04-17T17:31:33Z

After further discussion, we've settled on logic to start testing by:

Before the templates, start a timer.
After any given template, if the time remaining (5 minutes - timer) is greater than the expected next template time + 30 seconds, continue on to the next template. Otherwise, wrap-up and finish query execution.

Implementation requires a few things:

A given template group should have an expectedTime property which can be checked against. For initial testing, let's assume 2 minutes (expressed in seconds). This will be faster to check against than a dry-run metaEdges heuristic, and we can get actual run-time averages for templates rather than best-guess.
New timer logic during the template loop
Isolated testing to determine expected template timings (this can be the last step or performed simultaneously by another dev)

tokebe · 2024-04-17T17:32:32Z

@rjawesome I'm assigning this issue to you given your familiarity with the inferred mode handler at this point. As always let us know if you have any questions.

colleenXu · 2024-04-19T04:26:07Z

I discussed the "testing" aspect with Jackson earlier today.

We both imagined an automated testing framework to run templates with a list of input IDs, and record run-time info. Other info could also be helpful like: how many MetaEdges, how many subqueries, how long scoring/the NGD step is taking...

For lists of input IDs, we could use:

what Translator is using in the automated test runs (I list input IDs in my analysis sheet but I'm behind in my analysis at the moment >.<)
from "treats" creative-mode development:
- I've been using a shorter list for testing
- the original example disease list
from "chem-affects-gene" creative-mode development:
- I've been using my modified testing list (add "better" chem IDs, removed gene GAPDH since 4th template was running > 15 min in the past)
- original chemical and gene lists

tokebe · 2024-04-22T17:35:51Z

Note: Much of https://github.com/biothings/bte-auto-demos could be re-purposed for this kind of testing (removing some of the unneeded automated server framework, re-running, caching, etc.)

rjawesome · 2024-04-27T00:17:28Z

Basic implementation with durationMin property on query template JSONs has been completed in creative-timer branch in query_handler. I'm also working on the automated testing thing in biothings_explorer/performance-test/template_test.js (creative-timer branch)

rjawesome · 2024-05-01T01:49:53Z

A working version of the automated testing framework has been completed in biothings_explorer/performance-test/template_test.js (requires server to be running on localhost or prod) & biothings_explorer/performance-test/template_test_threaded.js (this program starts its own threads to run queries & should be faster, server should not be running).

The creative mode queries that are used have to be placed in the biothings_explorer/performance-test/template_data folder. The script will automatically detect the appropriate creative mode templates for each query and time them.

After the script is finished, it will give an output looking like this (for all tempaltes that were ran from the creative mode queries supplied).

{
  'Chem-treats-DoP.json': { count: 1, totalMs: 18491, avgMin: 0.31 },
  'Chem-treats-PhenoOfDisease.json': { count: 1, totalMs: 600000, avgMin: 10 },
  'Chem-regulates,affects-Gene-biomarker,associated_condition-DoP.json': { count: 1, totalMs: 600000, avgMin: 10 }
}

(I hard coded it so a timeout [>5 minutes] is recorded as 10 minutes)

tokebe · 2024-05-02T19:50:30Z

@rjawesome If you make a draft PR, it'll be a little easier to comment on code review. On line 532, you have
const queryTime = durationMin * 60 * 1000 ?? DEFAULT_QUERY_TIME;. If durationMin is not set for a template, you'll get undefined * 60 * 1000 which evaluates to NaN. NaN ?? value will evaluate to NaN rather than value, which will probably behave in an unintended manner.

Additionally, you've currently left in the creative results threshold, which should be removed (I imagine you're getting to that).

Otherwise, I like the implementation -- skipping a template when there isn't enough time for it rather than just stopping all template execution there is a good call, and means we could hypothetically run shorter-running but lower-priority templates under some circumstances.

rjawesome · 2024-05-07T00:27:41Z

Seeing that a lot of the queries that will be used for the "testing" aspect have similar query structures w/ changing IDs, I have added an "id template" feature to the "testing" program (performance-test/template_test.js and performance-test/template_test_threaded.js).

For example, first.json has the following contents. It will run the query two times in testing, the first time replacing {ID} with MONDO:0002909 and the second time replacing {ID} with MONDO:0019499

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids": "{ID}"
                },
                "n1": {
                    "categories": ["biolink:Drug"]
               }
            },
            "edges": {
                "e0": {
                    "subject": "n1",
                    "object": "n0",
                    "predicates": ["biolink:treats"],
                    "knowledge_type": "inferred"
                }
            }
        }
    },
    "ids": ["MONDO:0002909", "MONDO:0019499"]
}

tokebe · 2024-05-07T19:09:05Z

I think I understand the approach here -- you're eventually going to have to make 2 more templates alongside first.json to handle the other two templateGroups, which each require a specific qualifier on the inferred edge.

tokebe · 2024-07-10T18:30:05Z

Superseded by #824

colleenXu added needs discussion enhancement New feature or request labels Apr 16, 2024

tokebe assigned rjawesome Apr 17, 2024

tokebe removed the needs discussion label Apr 17, 2024

This was referenced May 2, 2024

Creative timer #819

Closed

Creative-mode timer biothings/bte_trapi_query_graph_handler#193

Closed

tokebe mentioned this issue Jun 24, 2024

Run all creative mode templates simultaneously / fix and adjust record-counting for queries and template execution #824

Closed

tokebe closed this as completed Jul 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changing Creative mode threshold from results to time #808

Changing Creative mode threshold from results to time #808

tokebe commented Apr 11, 2024

tokebe commented Apr 11, 2024

tokebe commented Apr 17, 2024

tokebe commented Apr 17, 2024

colleenXu commented Apr 19, 2024 •

edited

Loading

tokebe commented Apr 22, 2024

rjawesome commented Apr 27, 2024 •

edited

Loading

rjawesome commented May 1, 2024 •

edited

Loading

tokebe commented May 2, 2024

rjawesome commented May 7, 2024

tokebe commented May 7, 2024

tokebe commented Jul 10, 2024

Changing Creative mode threshold from results to time #808

Changing Creative mode threshold from results to time #808

Comments

tokebe commented Apr 11, 2024

tokebe commented Apr 11, 2024

tokebe commented Apr 17, 2024

tokebe commented Apr 17, 2024

colleenXu commented Apr 19, 2024 • edited Loading

tokebe commented Apr 22, 2024

rjawesome commented Apr 27, 2024 • edited Loading

rjawesome commented May 1, 2024 • edited Loading

tokebe commented May 2, 2024

rjawesome commented May 7, 2024

tokebe commented May 7, 2024

tokebe commented Jul 10, 2024

colleenXu commented Apr 19, 2024 •

edited

Loading

rjawesome commented Apr 27, 2024 •

edited

Loading

rjawesome commented May 1, 2024 •

edited

Loading