-
Notifications
You must be signed in to change notification settings - Fork 3
01 Home
#Semantic Metabolomics
Welcome to the wiki!
This wiki is a hub containing supplemental information about the mass spectrometry ontology mbco, which currently contains a proof-of-concept RDFication of MassBank records.
Contents:
1. [Core model](https://github.com/sneumann/SemanticMetabolomics/wiki/01-Home#core-model) 1. [MassBank](https://github.com/sneumann/SemanticMetabolomics/wiki/01-Home#massbank) 1. [Metabolights](https://github.com/sneumann/SemanticMetabolomics/wiki/01-Home#metabolights) 1. [Chebi](https://github.com/sneumann/SemanticMetabolomics/wiki/01-Home#chebi) 1. [SPARQL-Endpoint](https://github.com/sneumann/SemanticMetabolomics/wiki/01-Home#sparql-endpoint) 2. [Generalized Workflow of creating an rdf-Resource] (https://github.com/sneumann/SemanticMetabolomics/wiki/01-Home#generalized-workflow-of-creating-an-rdf-resource) 3. [Supplemental material](https://github.com/sneumann/SemanticMetabolomics/wiki/01-Home#supplemental-material) 1. [Links and Tools](https://github.com/sneumann/SemanticMetabolomics/wiki/01-Home#links-and-tools)##Scope of this wiki This wiki captures the basic setup for a prototypic RDF resource mirroring essential MassBank data along the Semantic Web and LOD data paradigm. First focusing on mass spectrometry use cases, it can later serve as a test case for wider i.e. COSMOS efforts towards a semantic web of metabolomics data resources. The production version hence later needs to be reimplemented in a more decentralized way and will be expandede.g. to fulfill the requirements of the NMR world as well, i.e. by linking to HMDB data.
####Overall set-up We here describe an initial experimental RDF dump of Massbank core data. It was generated in a centralized local approach via makefiles that autoconvert selected MassBank parts into one large RDF triple store. Semantic Web best practices were followed along the sources provided below. We were in particular guided by the Bio2RDF community and the RDF resources available at the EBI
##Core model
In concept, this project consists mainly of four parts:
- The RDFication of MassBank
- The RDFication of Metabolights
- The interlinking of the aforementioned resources using Bio2RDF Chebi
- Setting up a SPARQL-Endpoint using Virtuoso
####MassBank
04 RDF MassBank Resource Module
For a current state of MassBank interlinking, visit the RDF MassBank wikipage.
####Metabolights
06 RDF Metabolights Resource Module
The RDFication of Metabolights is in an advanced state and will be added to the internal SPARQL-Endpoint in the course of the next week (proof-of-concept data sample) When this is done we will add more information.
####Chebi
05 RDF Chebi and Chembl Resource Modules
As Chebi is used as an interlinking tool, the first requirement would be to have data samples of databases to interlink - meaning, that this (like Metabolights) will be added shortly.
####SPARQL-Endpoint As the SPARQL-Endpoint depends on data, this will be expanded greatly. The current idea is to create a simplified query interface with examples, like the Chembl-Endpoint
##Generalized workflow of creating an rdf Resource Abstracted (general) workflow for creating linked Data resources: (add subheaders, make it a conditional graph?) (inspired from http://www.w3.org/2001/sw/hcls/notes/hcls-rdf-guide/ )
- Review external existing RDF resources to re-use and integrate with
- Define Use case: Determines scope
- Define Competency Questions: Defines domain dependent content links
- Select the data sources or portions thereof to be converted to RDF in the case where re-use is no option..
- Identify the items of interest in your domain, the things whose properties and relationships we want to describe
- Agree on which items should be URI and which stay literals, e.g. float or string values
- Identify persistent HTTP URIs for information & non-information resources (use hashed URIs here)
- Choose your robust namespace
- Use http://identifiers.org/
- Agree on Mime types you want to provide for HTTP content-negotiated additional representations presented upon dereferencing the URI, e.g. HTML in addition to RDF/XML
- Generate RDF model
- First sketch handwritten graph models for all modules/namespaces envisioned, with links/edges between resources needed to answer CQs
- Add core predicates/edges/relations incl inverses/backlinks, and NS where to take them from
- Add a 3-node-spanning link along multiple namespaces, i.e. to show nested queries & how URIs are used as primary key to pass information along
- List of what essential literals in resource modules can become robust URIs
- Must be translatable with high confidence (e.g. SpeciesLabel=”brassicaceae” → NCITax:ID723345
- URI examples in accordance to ID.org (dereferenceable via content negotiation) for all key nodes/predicates in all NS modules existing ones AND own ones (get server name here for HTTP url to serve own NS)
- To align ChemicalIDs use https://www.ebi.ac.uk/unichem/
- Generate ontology defining the formal semantics of the RDF model.
- Generate example RDF triples in turtle or RDF/XML syntax and along the CQ scope, i.e.
- using own NS-URI (Massbank) to literal (e.g. Mass value as float?)
- using own NS-URI (Massbank) to own NS-URI (Massbank)
- using own NS-URI (Massbank) to established external resource URI (e.g. Chebi from BioToRDF or Bioportal)
- Publish the RDF data as Linked Data or through SPARQL endpoint E.g. set up Virtuoso server with endpoint & configure it.
- Agree on Information from different sources that merges in naturally & allows synergistic insights (context enrichment)
- Set RDF links between data from different external sources
- Create Semantic Web applications using the published data.
- Add example SPARQL queries along use cases in a) human readable form and b) in annotated SPARQL, e.g. turtle syntax
- Queries should leverage on the example triples store and document accompanying result data sets.
- Build and add query library
- Write Documentation
- Make your LOD known with sem web crawlers/tools/Websites