Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BTE creating self-edges via subclassing #850

Closed
tokebe opened this issue Aug 15, 2024 · 7 comments
Closed

BTE creating self-edges via subclassing #850

tokebe opened this issue Aug 15, 2024 · 7 comments
Assignees
Labels
bug Something isn't working On Test -> Prod

Comments

@tokebe
Copy link
Member

tokebe commented Aug 15, 2024

Example: NCATSTranslator/Feedback#907

Something is wrong in node-expansion and subclass handling, likely a missing check somewhere that should be causing edges to be dropped.

@tokebe
Copy link
Member Author

tokebe commented Aug 15, 2024

Looks like this is being caused by the following process:

  • Node is expanded from originalID -> descendantID
  • BTE gets record(s) back that shows descendantID -> has_phenotype -> originalID
    • This misses edge-management self-edge avoidance because originalID != descendantID
    • This also misses self-subclass avoidance because it's not technically a self-subclass
  • BTE rebinds descendantID -> has_phenotype -> originalID to originalID -> has_phenotype -> originalID with underlying support graph(s) originalID -> has_subclass -> descendantID -> has_phenotype -> originalID

This should be possible to catch at support graph creation time -- I'm double-checking for potential knock on effects of catching it there.

@rjawesome
Copy link
Contributor

rjawesome commented Aug 15, 2024

Note: there also needs to be tracking of which QNodes subclassing should be applied to, otherwise subclassing created for one QNode (ie. n0) could be applied to a more specific node bindings for a more specific QNode (ie. n1)
Example Query

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids":["MONDO:0006497"]
                },
                "n1": {
                    "categories": ["biolink:DiseaseOrPhenotypicFeature"],
                    "ids": ["MONDO:0016215"]
               }
            },
            "edges": {
                "e0": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:has_phenotype"]
                }
            }
        }
    }
}

Result (subclassing changes n1 from MONDO:0016215 to MONDO:0006497 even though n1 is explicitly set to MONDO:0016215)

"results": [
      {
        "node_bindings": {
          "n0": [
            {
              "id": "MONDO:0016215",
              "attributes": []
            }
          ],
          "n1": [
            {
              "id": "MONDO:0006497",
              "attributes": []
            }
          ]
        },
        "analyses": [
          {
            "resource_id": "infores:biothings-explorer",
            "edge_bindings": {
              "e0": [
                {
                  "id": "MONDO:0016215-has_phenotype-MONDO:0006497-via_subclass",
                  "attributes": []
                }
              ]
            },
            "score": 0.7308028298005093
          }
        ]
      }
    ]
  },

@rjawesome rjawesome self-assigned this Aug 15, 2024
@colleenXu
Copy link
Collaborator

colleenXu commented Aug 16, 2024

HOLD ON

@rjawesome @tokebe: See my recent post where I wonder if the "duplicate second path" in the Feedback issue is actually coming from Aragorn NCATSTranslator/Feedback#907 (comment)

If that is the case, then the self-edge doesn't seem to be showing in the UI. Which would mean this issue of self-edges is less urgent/breaking.


There is a "true duplicate paths in the UI" situation going on, but that's happening because of how the ARS/UI handles our edges with subclassing support-graphs vs ones without NCATSTranslator/Feedback#776 (comment). This does make me wonder if Translator will ask us to undo our subclassing support-graph wrapping...


FYI Another example with a self-edge (CI, ran earlier today): https://arax.ci.transltr.io/?r=3c00776f-a6aa-473e-907f-3dec1367a566

Screenshot from ARAX-CI view

Screen Shot 2024-08-15 at 10 46 29 PM

But I didn't notice any duplicate paths issues in the UI view. There were similar paths (highlighted in the screenshot below), but I think they were from Aragorn, based on my analysis of the Test-instance version here.

This is why I started to suspect that the self-edge wasn't the source of the duplicate paths in 907 either...

Screenshot of UI-CI view

Screenshot 2024-08-15 at 23-39-25 Results - NCATS Biomedical Data Translator

@tokebe
Copy link
Member Author

tokebe commented Aug 16, 2024

I'm pretty sure regardless of ARS/UI-level merging errors, BTE is still creating self-edges via subclassing that we don't want? @colleenXu unless there's a reason to keep these self-edges, @rjawesome don't pause any active work on this issue.

It's fine if there's simultaneous effort outside of this issue toward problems that are related to NCATSTranslator/Feedback#907 but aren't related to this issue -- we can separately track a BTE-level problem here and assignees to the Feedback issue can decide whether our efforts are related/causal to 907.

@tokebe tokebe added the On Dev Related changes are deployed to Dev server label Aug 19, 2024
@colleenXu
Copy link
Collaborator

I tested the two linked queries and things looked good: the paths containing self-edges have been removed. I did see a different in scores/rank, often a slight decrease for the results that previously had self-edges.

But in the maturity-onset diabetes case below, the rank drastically dropped, but the score only decreased a little.

cerebral palsy (MONDO:0006497)

Before (ARAX-CI viewing, used BTE-test)

Screen Shot 2024-08-20 at 8 51 55 PM

After: cerebral-palsy-issue-850-testing.json.zip

Screen Shot 2024-08-20 at 8 52 49 PM

maturity-onset diabetes of the young (MONDO:0018911)

Before (ARAX-CI viewing)

Screen Shot 2024-08-20 at 8 57 11 PM

After: MoDY-issue-850.json.zip

Screen Shot 2024-08-20 at 8 57 48 PM


I also did a quick test of a MVP2 query and didn't notice anything off about its response (increases ADRB2 - NCBIGene:154).

@tokebe tokebe added On CI Related changes are deployed to CI server On CI -> Test and removed On Dev Related changes are deployed to Dev server On CI Related changes are deployed to CI server labels Aug 22, 2024
@colleenXu
Copy link
Collaborator

This issue is also tracking a fix to "edge-case query failures due to an error in subclassing".

Info from #789 (comment)

Context: the error revealed a bug in node-expansion, coming from MONDO ontology having non-MONDO ID info - including mismatched ID namespaces for parent-child pairs. HP has this issue too.

The first test query used an UBERON ID, which had some child info in those files with a different ID namespace.

The solution is to remove those mismatched pairs.

Jackson has put a fix into the subclassing-fix branches (deployment tracked in #850)

@tokebe tokebe added On Test Related changes are deployed to Test server and removed On CI -> Test labels Aug 23, 2024
@colleenXu colleenXu added bug Something isn't working On Test -> Prod and removed On Test Related changes are deployed to Test server labels Aug 28, 2024
@tokebe
Copy link
Member Author

tokebe commented Sep 3, 2024

Relevant changes deployed to Prod.

@tokebe tokebe closed this as completed Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working On Test -> Prod
Projects
None yet
Development

No branches or pull requests

3 participants