Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overarching x-bte issue: defining what a unit of annotation is #656

Open
colleenXu opened this issue Jun 14, 2023 · 10 comments
Open

Overarching x-bte issue: defining what a unit of annotation is #656

colleenXu opened this issue Jun 14, 2023 · 10 comments
Labels

Comments

@colleenXu
Copy link
Collaborator

Going to open this issue as a place-holder for now. Look in comments for discussion on what is involved here

@colleenXu
Copy link
Collaborator Author

colleenXu commented Jun 14, 2023

Idea from @tokebe:

  • move query-handler config on edge attributes for hashing (unique edges), so somehow KPs can tell BTE how to process their edges / use some edge-attributes to avoid merging Edges
    • like a field allowing you to provide a list of attribute_type_ids?
    • only for Multiomics/Text-Mining (ones that provide edge-attributes) or for all?

Ideas from me:

  • remove unneeded fields (useTemplating)
  • redefine what counts as 1 written "unit of x-bte annotation" (operation? vocab to-be-defined):
    • uncouple what's written in the YAML vs what representation exists within BTE (MetaXEdge?)
      • "automatic expansion" annotation and flag. so 1 operation -> many in BTE potentially
      • design? what's written in the YAML should be human-writable, simple -> not look like code! which may be tricky with the nunjucks / jq work...
      • shouldn't be too dependent on a specific TRAPI/biolink-model version...
    • what counts as 1 unit?
      • 1 unique combo of subject-semantic-type / predicate / object-semantic-type
      • what about different ID-namespaces for input and output, and qualifiers?
        • how do we handle tweaks in the way you query (parameters / requestBody / requestBodyObject) and parse responses (response-mapping)
    • how to handle edge-attributes / qualifiers (especially mapping to biolink-model / TRAPI standards)

@colleenXu
Copy link
Collaborator Author

colleenXu commented Jul 21, 2023

Another idea, although may be more "modifying / adding operations": using list_filter for BioThings APIs more, although that doesn't fully fix reverse-operation issues #316

@tokebe tokebe added the enhancement New feature or request label Aug 1, 2023
@colleenXu
Copy link
Collaborator Author

From a convo Jackson and me had today:

Some main things to address?

  • multiple-prefixes for inputs and outputs (aka the inputs + requestBody + parameters restructure, and outputs + response-mapping restructure)
  • qualifier-sets (multiple sets per unit?? -> tied to requestBody / parameters?? list_filter/jmespath??)
  • underlying sources stuff (multiple possible values, more than 1 value per record, list_filter / jmespath???)

Note:

  • JQ-stuff only for basic stuff (mental model / more complex stuff should be BTE stuff) -> we don't want x-bte annotation stuff to look basically like code

@colleenXu
Copy link
Collaborator Author

colleenXu commented Sep 13, 2023

@newgene This is the issue on x-bte refactoring. I heard during today's group meeting that there's some issues with MetaKG stuff and id-prefixes / operation-level source.

However, @tokebe and I have discussed the way x-bte annotations are written may be set up 1 way to be more writer-friendly, and be parsed into a different representation for BTE's MetaKG.....because there are diff requirements for both or design ideas.

I wonder if this is the case for your Translator/SmartAPI-Registry's MetaKG work....I suspect this work has a different set of requirements from BTE's internal representation AND the x-bte annotation writing...

@colleenXu
Copy link
Collaborator Author

One case of "multiple prefixes in output" is #585

@colleenXu
Copy link
Collaborator Author

colleenXu commented Oct 25, 2023

Jackson @tokebe and I think this is the overarching topic for x-bte refactoring (after reviewing the notes above):

The issues

There seems to be 3 different requirement sets at play, that we want to tell apart and be aware of:

  • "writer-friendly x-bte annotation":
    • easy to write/teach/maintain, can write manually (without using code or UI)
    • shouldn't be completely like code
    • has clear expectations for format / allowed values / what everything is used for
    • flexible, expressive
    • not dependent on specific TRAPI / biolink-model stuff that's still in-flux
  • "internal BTE use": what BTE needs to keep track of all the info, construct sub-queries, edge management, etc. (vocab: BTE MetaEdge, MetaXEdge, bteEdge...)
    • x-bte annotation may be too "collapsed" from this POV, and BTE will need to expand 1 operation -> multiple internal representations
  • "SmartAPI Registry MetaKG use": https://smart-api.info/portal/translator/metakg. What's needed for this tool / UI
    • x-bte annotation may be too verbose/specific from this POV, and this'll need to collapse multiple operations -> 1 MetaEdge for its purpose

Which leads to specific questions for group discussion, like:

  • How does "1 x-bte operation / unit of annotation" relate to similar concepts (MetaEdges?) in BTE and SmartAPI Registry MetaKG?
    • and how does x-bte refactoring relate to and potentially change this?
  • are BTE and SmartAPI Registry MetaKG using the same code? Does that make sense or should they use different code to process x-bte annotations?

And some ideas on how to "expand" an x-bte operation/ unit of annotation

Currently, 1 x-bte operation represents...

  • 1 API endpoint being used
  • 1 unique combo of:
    • input semantic-type
    • input ID namespace
    • sub-query information
    • predicate
    • qualifier-set
    • source field value
    • output semantic-type
    • output ID namespace

Jackson @tokebe and I have discussed how to make it easier to write x-bte annotation - and one of our ideas is to have 1 x-bte operation (one unit of annotation?) expand to include more info:

  • first-step proposal is 1st iteration of x-bte-refactoring: handling multiple input/output namespaces, sources #748
    • since there can be "combinatorial explosions" of current operations where the main difference comes from the input/output ID namespaces
  • Other sources of "combinatorial explosions" are:
    • unique qualifier-sets
    • unique source field values
  • note that all of these aren't as easy as "list out the possible values". There can be sub-query info, response-mapping info, post-processing info differences based on unique value/set...
my qualifier-set thinking

There are theoretically many operations that would mainly differ by qualifier-set (and how that affects sub-query info like post_filter/filter, jmespath, JQ).

The guidance for anatomical / species / and population context qualifiers is currently unclear to me (are they edge-attributes or part of the qualifier-set?). If they turn out to be part of the qualifier-set and we want to suppor them, this has combinatorial explosion problems because the context qualifiers in our KPs have a lot of possible values.

  • anatomical context:
    • multiomics apis (drug response): Guangrong has previously told me that some operations are affected, and include 10-20s of possible tissue/anatomical-context values
    • also in pending apis: ebi gene2pheno
  • species context: affects lots of apis
    • core biothings: MyChem chembl.drug_mechanism and drugcentral.bioactivity info, MyGene panther, a little MyDisease disgenet)
    • pending biothings: bindingdb, mgi gene 2pheno
    • external: ctd, biolink/monarch
  • population context:
    • multiomics apis based on clinical data: ehr risk, wellness (clinical trials too?)

My source field thinking

There are theoretically some operations that would mainly differ by source (and how that affects sub-query info like post_filter/filter, jmespath, JQ...).

It would be nice if we could set the source info to field values that are post-processed by BTE...

I'm not sure of the scope of this issue though:

  • core biothings apis: mygene, mydisease disgenet
  • external apis: biolink/monarch

Also maybe complicated because some api hits will have multiple source values / fields?

@colleenXu
Copy link
Collaborator Author

colleenXu commented Jan 13, 2025

Update 1/2 based on December 2024 discussions

These updates are to summarize the x-bte annotation refactor discussions Jackson and I had last month.

On requirements...
Who/what uses x-bte annotation?

  • People writing it (me)
  • BTE / Retriever ▶️BioPack ▶️Translator. How x-bte annotation is used:
    • Search for resource APIs with matching "MetaEdges"
    • Construct sub-queries
    • Process sub-query responses
  • BioThings team (tagged @newgene so he can correct/add to this)
    • BioThings streaming (based on subset of annotation contents). Goes through NodeNorm so node categories are normalized in output
    • SmartAPI registry metaKG work
    • BioThings development/ideas?

@colleenXu
Copy link
Collaborator Author

colleenXu commented Jan 13, 2025

Update 2/2 based on December 2024 discussions

Jackson and I decided to:

  • keep calling "1 unit of x-bte annotation" a "x-bte (annotation) operation"
  • change to "1 unit of x-bte annotation" to potentially many "units of data in Retriever" (not sure what to call this, since it's more than a "MetaEdge"). Right now they're basically 1-to-1.

We discussed what currently defines/differs between individual x-bte annotation units, and what we want to change in the first iteration of refactoring:

Continuing:

  • Input category
  • Predicate^
  • Qualifier-set^
  • Output category
  • Endpoint^^
  • Request method (GET vs POST)^^

^ discussed changing in a later iteration of refactor (see internal notes 12/5)
^^ discussed changing later maybe (see internal notes 1/28)

Changing in 1st refactor proposal:

  • Input namespace
  • Output namespace
  • Source and corresponding KL/AT ▶️ provenance
  • Different request info based on namespaces, source
  • Different response-mapping/processing based on namespaces, source

@colleenXu colleenXu changed the title x-bte annotation refactoring discussion Overarching x-bte issue: defining what a unit of annotation is Jan 13, 2025
@colleenXu
Copy link
Collaborator Author

colleenXu commented Jan 13, 2025

Updated title/scope of this issue. The other things in this issue's other posts are covered by other issues.

@colleenXu colleenXu removed needs discussion enhancement New feature or request labels Jan 13, 2025
@colleenXu
Copy link
Collaborator Author

colleenXu commented Jan 29, 2025

Clarification

Right now, endpoint path/method are static in an annotation. However...

  • Path/method is not part of "what defines 1 annotation unit" - it's just another kind of request info.
    • It was in the past, but this has changed now (annotation no longer hanging off endpoints)
  • What defines 1 annotation/unit = MetaTriple + retrieving 1 subset of data from an API. VS "how to query" and "how to process response" are subsumed / lower-level.

(based on discussion between Jackson and me 1/28)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants