You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As an opportunity to move past the implicit Biolink+kgx format assumptions of the current Koza writers, and a way to support writing to multiple output files from a single ingest, I think we should define a new writer configuration based on LinkML models. Supplying a schema and list of classes to a writer, along with an explicit output filename, will handle the challenge of specifying output columns in a dynamic way that is model agnostic (a longer term Koza goal), and less brittle than the current listing of node and edge properties, where a property set in the python but left out of the node/edge properties won't actually be written to the file.
A challenge is the method of specifying the schema. The two initial use cases I'm imagining are writing to biolink node or association classes, or SSSOM associations, and I think in both cases it might make the most sense to pull the model yaml from importlib, so my initial specification is {package}:{model.yaml} which for our standard Koza STRINGDB example looks like:
Note: I may walk out of basing this entirely on LinkML, even though that's the big win, because there have been times that we want to export some additional file, maybe for debugging or QC purposes, and in those cases it might be nice to have the option to just specify a list of columns.
The text was updated successfully, but these errors were encountered:
As an opportunity to move past the implicit Biolink+kgx format assumptions of the current Koza writers, and a way to support writing to multiple output files from a single ingest, I think we should define a new writer configuration based on LinkML models. Supplying a schema and list of classes to a writer, along with an explicit output filename, will handle the challenge of specifying output columns in a dynamic way that is model agnostic (a longer term Koza goal), and less brittle than the current listing of node and edge properties, where a property set in the python but left out of the node/edge properties won't actually be written to the file.
A challenge is the method of specifying the schema. The two initial use cases I'm imagining are writing to biolink node or association classes, or SSSOM associations, and I think in both cases it might make the most sense to pull the model yaml from importlib, so my initial specification is
{package}:{model.yaml}
which for our standard Koza STRINGDB example looks like:With the expectation that the python part of the koza transform would change from:
to the slightly more verbose, but specific
Note: I may walk out of basing this entirely on LinkML, even though that's the big win, because there have been times that we want to export some additional file, maybe for debugging or QC purposes, and in those cases it might be nice to have the option to just specify a list of columns.
The text was updated successfully, but these errors were encountered: