Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TCDC CQS #17

Open
karafecho opened this issue Nov 17, 2022 · 5 comments
Open

TCDC CQS #17

karafecho opened this issue Nov 17, 2022 · 5 comments
Assignees

Comments

@karafecho
Copy link
Contributor

karafecho commented Nov 17, 2022

TCDC CARA Curated Query Service Overview

This issue is intended to initiate implementation work on the Translator Clinical Data Committee (TCDC) Curated ARA (CARA) Curated Query Service. The goal is to create a skeletal ARA that initially will support the TCDC's MVP1 workflow on rare pulmonary disease but eventually will support any workflow developed by the committee. CARA CQS also will provide a general model and approach for other teams, committees, working groups, and external users who wish to contribute an ARA to the Translator ecosystem. The development and implementation work is being supported by the SRI, with Jason Reilly serving as lead developer. Plans for long-term maintenance are TBD.

TCDC CARA CQS Implementation Plan

A detailed implementation plan was developed by Jason F., Arbrar M., Chris B., Casey T., and Kara F. on 11/15/2022 and finalized by those same persons on 11/17/2022. That plan is described below.

  • TCDC will register within CARA CQS mappings between a template query-graph and one or more TRAPI queries with workflows but without score operations (i.e., a TRAPI message with a query_graph and a workflow element)
    • For the ‘treats’ MVP1 question, there will be two such queries, one for Path A and one for Path B one query, Path D for initial deployment, with the more complex paths implemented after testing the initial deployment [revised 03/22/2023)
  • At runtime, when the registered template query-graph (without a workflow but with a URL for return response) comes in from the ARS, CARA CQS will submit the associated TRAPI queries with workflows but without score operations to the Workflow Runner (WFR) and get back the results
  • After all results are returned, CARA CQS will use FastAPI Reasoner Pydantic to merge the N sets of results by the result node
  • CARA CQS will then score results using a composite metric TBD, but derived from one or more of the following edge attributes: log_odds_ratio, total_sample_size, and log_odds_ratio_95_ci
    [per discussion on 04/12/2023]
  • The WFR will generate scores for the merged result from multiple ARAs, but rather than generating multiple results (one score per each ARA response), it will put all of the scores into some property on the (one) result, and then generate some kind of half-baked average of the scores from the different ARAs TRAPI 1.4: Each ARA will score results. All of the separate scores generated by each ARA will be presented individually as analyses of the result when returned to the ARS [revised 03/29/2023, per Abrar]
    - The WFR sends that scored result back to CARA, who returns it to the ARS using URL for return response
@andrewsu
Copy link

Just a note that I think this approach is similar to what we're doing for BTE's creative mode implementation. So for example, any incoming creative mode query gets compared to the template definitions in this "templateGroups file", which currently only has one entry for [Drug] - treats - [Disease]. If the input query matches the subject/object/predicate constraints given, then BTE will plug in the input IDs into a series of hand-curated query templates (which for [Drug] - treats - [Disease] would be in this directory). We'd be happy to explore synergies here in syntax, implementation, or both...

@karafecho
Copy link
Contributor Author

Thanks for alerting me to BTE's creative mode implementation, @andrewsu. This does seem similar to what we're planning for CARA. The main difference may be that the TCDC iteratively refines the TRAPI queries that we develop by reviewing answers and invoking SME input when appropriate.

Yes, let's find a time to discuss the two creative mode implementations. The next TCDC meeting is scheduled for January 4 at 2 pm ET. Any chance you and/or members of your team are free to join that call or the following one on January 18? Alternatively, we can arrange a separate meeting. Just let me know. Thanks!

@karafecho
Copy link
Contributor Author

Actually, the agenda for the January 4 meeting is somewhat full, so the January 18 meeting might be better, or a separate meeting.

@andrewsu
Copy link

Thanks for alerting me to BTE's creative mode implementation, @andrewsu. This does seem similar to what we're planning for CARA. The main difference may be that the TCDC iteratively refines the TRAPI queries that we develop by reviewing answers and invoking SME input when appropriate.

Yes, I agree that could be the potential synergy -- the review/refinement process planned by TCDC combined with some technical foundation that we've already built through BTE. I will plan on being at the Jan 18 meeting to discuss more!

@karafecho karafecho changed the title TCDC CARA TCDC CQS Oct 17, 2023
@karafecho
Copy link
Contributor Author

Update: The initial dev deployment of the CQS was in place and tested prior to the Fall 2023 relay meeting. Goal is to have a new deployment in ci, one which supports the Path A, B, and E queries, before the Winter 2024 code freeze.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants