Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

semmed sentence edge attributes #68

Merged
merged 6 commits into from
Sep 27, 2024
Merged

semmed sentence edge attributes #68

merged 6 commits into from
Sep 27, 2024

Conversation

rjawesome
Copy link
Contributor

biothings/biothings_explorer#833

Needs semmedb's response mapping to ingest the edge attributes.
aka, add the following to all response mappings

edge-attributes: edge-attributes

Can also support saying subject/object location, if subject/object text is retrieved from the semmed api as a field

@colleenXu
Copy link
Contributor

@rjawesome

Is this ready for testing? The linked issue isn't in the "Awaiting Review" column...

Also, on "subject/object location can be supported"....

  • looks like lines 20-27 need to be uncommented in order to test this, right?
  • it looks like the values produced would be in the text-mining format, right? (looks like: pipe-limited string, 2 int values for start and end character index)

@rjawesome
Copy link
Contributor Author

rjawesome commented Aug 26, 2024

Is this ready for testing? The linked issue isn't in the "Awaiting Review" column...

I wanted to do a basic test (which would require me to modify the smartapi yaml to ingest the edge attributes). Hopefully I can complete that today and then mark it for review.

looks like lines 20-27 need to be uncommented in order to test this, right?

Yes, but this won't work until the smartapi yaml is updated to request the subject_text and object_text fields from the SemmedDB BioThings API (otherwise it throws an error since those fields are missing).

it looks like the values produced would be in the text-mining format, right? (looks like: pipe-limited string, 2 int values for start and end character index)

Correct.

@rjawesome rjawesome marked this pull request as ready for review August 27, 2024 00:23
@rjawesome
Copy link
Contributor Author

rjawesome commented Aug 27, 2024

@colleenXu
More Notes (I am now marking this as ready to review):
When adding the edge-attributes to the smartapi yaml (response mapping), it needs to be escaped with backticks and quotes like so (due to how jsonata works):
edge-attributes: "`edge-attributes`"

In addition to requesting predication.subject_text and predication.object_text for getting the end/start characters, we also need to request predication.predication_id for the attribute value. (this is in the parameters.fields of every operation in the smartapi yaml)

@colleenXu
Copy link
Contributor

colleenXu commented Aug 28, 2024

@rjawesome

(1) I originally proposed using semmeddb_publication_info: predication for the response-mapping (now I wonder if ref_text_mining is better, to match previous work).

Is there a reason you went with edge-attributes: "`edge-attributes`" instead (both different keyword and value)?

  • Specifically, we've been using edge-attributes for Multiomics/Text-Mining to ingest the json containing their TRAPI edge-attributes without mutating them (previous issue)...so from my POV, it seems confusing to use the same keyword here.
  • I was imagining a setup closer to the previous work where the original field (predication) could still be used as a value.

(2) In my original proposal and this branch, I changed parameter.fields to retrieve the entire predication json. Does this work? I see your comments that fields are missing (subject_text, object_text, predication_id), which I'm confused by...

@rjawesome
Copy link
Contributor Author

@colleenXu

(1) We can keep semmeddb_publication_info: predication as well if any predication info was to be used directly. The JQ transformer that I wrote transforms the publication info directly into the TRAPI attribute format (it keeps the predication field as is while creating the edge-attributes field to store the TRAPI-formatted attributes). Hence, it would be ingested as edge-attributes so BTE knows it is already in TRAPI format.

ie. edge-attributes would contain something like this

{
              "attribute_type_id": "biolink:has_supporting_study_result",
              "attributes": [
                {
                  "attribute_type_id": "biolink:supporting_text",
                  "value": "CONCLUSIONS: Induction of the ARE-Nrf2 pathway by zinc provides powerful and prolonged antioxidation and detoxification that may explain the beneficial effects of zinc observed in the treatment of age-related macular degeneration (AMD)."
                },
                {
                  "attribute_type_id": "biolink:publications",
                  "value": "PMID:16723490"
                },
                {
                  "attribute_type_id": "biolink:subject_location_in_text",
                  "value": "50|54"
                },
                {
                  "attribute_type_id": "biolink:object_location_in_text",
                  "value": "34|38"
                }
              ],
              "value": "166335624"
            },
            {
              "attribute_source": "infores:text-mining-provider-targeted",
              "attribute_type_id": "biolink:has_supporting_study_result",
              "attributes": [
                {
                  "attribute_type_id": "biolink:supporting_text",
                  "value": "There was gender difference for the protective effect of zinc against diabetes-induced pathogenic changes and the up-regulated levels of Nrf2 and MT in the aorta."
                },
                {
                  "attribute_type_id": "biolink:publications",
                  "value": "PMID:23536959"
                },
                {
                  "attribute_type_id": "biolink:subject_location_in_text",
                  "value": "57|61"
                },
                {
                  "attribute_type_id": "biolink:object_location_in_text",
                  "value": "137|141"
                }
              ],
              "value": "168073659"
            },

(2) I missed the branch on the translator registry repo. That covers all the predication fields, however, the changes to get the edge-attributes are still needed.

@colleenXu
Copy link
Contributor

colleenXu commented Sep 25, 2024

@tokebe @rjawesome

Sorry for the delay in testing.

It looks good except for 1 thing: it's not respecting the edge max limit of 50 unique sentences/publications, aka only make 50 of these special edge-attributes. Could there be a problem when multiple records get merged into 1 edge?

You can use the translator-api-registry branch I've made. The response-mapping should match Rohan's specs now.

Example

add to override: "1d288b3a3caf75d541ffaae3aab386c8": "https://raw.githubusercontent.com/NCATS-Tangerine/translator-api-registry/refs/heads/semmeddb_publication_refactor/semmeddb/smartapi.yaml",

Query:

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids": ["NCBIGene:6752", "NCBIGene:2641"],
                    "categories": ["biolink:Gene"]
                },
                "n1": {
                    "categories": ["biolink:Polypeptide"]
                }
            },
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:affected_by"]
                }
            }
        }
    }
}

Look at edge b6d8cea8e750400d3224490240a63871. It has 90 edge-attributes: 3 for evidence_count/KL/AT and 87 for the 87 publications. It should have been cut off at 50.

Other examples are:

  • 03bb8b167fae7431a88b50a593b029b0 (92 publications, 95 edge-attributes)
  • 3dbd5535b00e8123ecef8f89ca29a685 (57 publications)
  • 023c3ff7d5d3d9fa74d46a8a4ea675bd (73 publications)

@rjawesome
Copy link
Contributor Author

I'll try to take a look at it this week.

@rjawesome
Copy link
Contributor Author

rjawesome commented Sep 27, 2024

@colleenXu The amount of publications in the Semmed edge attributes should be capped at 50 now (the max edge attributes in the example decreases to 53)
image

@colleenXu
Copy link
Contributor

@tokebe

I've tested using the latest main branches + this PR + biothings/bte-server#45.
It looks good! The capping seems to be working now.

I think this is ready to put onto CI for testing this weekend...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants