feed.xml

<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>OmicsDI Blog</title>
    <link>https://omicsdi.github.io/feed/index.xml</link>
    <description>Recent content on OmicsDI Blog</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <lastBuildDate>Mon, 27 Feb 2017 22:56:34 +0000</lastBuildDate>
    <atom:link href="https://omicsdi.github.io/feed/index.xml" rel="self" type="application/rss+xml" />
    
    <item>
      <title>Introduction to OmicsDI API</title>
      <link>https://omicsdi.github.io/post/introduction-api/</link>
      <pubDate>Mon, 27 Feb 2017 22:56:34 +0000</pubDate>
      
      <guid>https://omicsdi.github.io/post/introduction-api/</guid>
      <description>

&lt;p&gt;Most data in the Datatsets Discovery Index can be accessed
programmatically using a &lt;a href=&#34;www.omicsdi.org/ws&#34;&gt;RESTful API&lt;/a&gt;.
The API implementation is based on the Spring Rest Framework.&lt;/p&gt;

&lt;h2 id=&#34;web-browsable-api&#34;&gt;Web-browsable API&lt;/h2&gt;

&lt;p&gt;The OmicsDI API is web browsable, which means that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The query results returned by the API are available in JSONformat and also XML. This ensures that they can be viewed by human and accessed programmatically by computer.&lt;/li&gt;
&lt;li&gt;The main &lt;a href=&#34;www.omicsdi.org/ws&#34;&gt;RESTful API&lt;/a&gt; page provides a simple web-based user
interface, which allows developers to familiarize themselves with the API and get a
better sense of the OmicsDI data before writing a single line of code.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;many resources are hyperlinked so that it&amp;rsquo;s possible to navigate the API in the browser.&lt;/p&gt;

&lt;p&gt;As a result, developers can familiarize themselves with the API and get a better sense of the OmicsDI data.&lt;/p&gt;

&lt;h2 id=&#34;api-documentation&#34;&gt;API documentation&lt;/h2&gt;

&lt;p&gt;Responses containing multiple entries have the following fields:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the count is the number of entries in the matching set.&lt;/li&gt;
&lt;li&gt;dataset is an array of datasets.&lt;/li&gt;
&lt;li&gt;facet is an array of facets.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example&lt;/p&gt;


    http://www.omicsdi.org/ws/dataset/search?query=human
    {
    &#34;count&#34;: 733,
    &#34;datasets&#34;: [
    {
    &#34;id&#34;: &#34;PXD000456&#34;,
    &#34;source&#34;: &#34;pride&#34;,
    &#34;title&#34;: &#34;Human glomerular extracellular matrix analysed by LC-MSMS&#34;,
    &#34;description&#34;: &#34;Extracellular matrix proteins were isolated from human glomeruli and analysed by LC-MSMS&#34;,
    &#34;keywords&#34;: [
    &#34;Human&#34;,
    &#34;kidney&#34;,
    &#34;glomerulus&#34;,
    &#34;extracellular matrix&#34;
    ],
    &#34;organisms&#34;: [
    {
    &#34;acc&#34;: &#34;9606&#34;,
    &#34;name&#34;: &#34;Homo sapiens&#34;
    }
    ],
    &#34;publicationDate&#34;: &#34;20140122&#34;
    },
    // 19 more datasets
    ],
    &#34;facets&#34;: [
    {
    &#34;id&#34;: &#34;modification&#34;,
    &#34;label&#34;: &#34;Modification&#34;,
    &#34;total&#34;: 181,
    &#34;facetValues&#34;: [
    {
    &#34;label&#34;: &#34;Unknown modification&#34;,
    &#34;value&#34;: &#34;unknown modification&#34;,
    &#34;count&#34;: &#34;5&#34;
    },
    //other facet values
    ],
    },
    //other facets
    ]
    }


&lt;p&gt;Responses containing just a single dataset have some extra navigation fields, and without the facets&lt;/p&gt;


    http://www.omicsdi.org/ws/dataset/get?acc=PXD001848&amp;database=PRIDE
    {
    &#34;id&#34;: &#34;PXD001848&#34;,
    &#34;name&#34;: &#34;Global Analysis of Protein Folding Thermodynamics for Disease State Characterization, MCF7 vs MDAMB231&#34;,
    &#34;description&#34;: &#34;Protein biomarkers can be used to characterize and diagnose disease states such as cancer. They can also serve as therapeutic targets. Current methods for protein biomarker discovery, which generally rely on the large-scale analysis of gene and/or protein expression levels, fail to detect protein biomarkers with disease-related functions and unaltered expression levels. Here we describe the large-scale use of thermodynamic measurements of protein folding and stability for disease state characterization and the discovery of protein biomarkers. Using the Stable Isotope Labeling with Amino Acids in Cell Culture and Stability of Proteins from Rates of Oxidation (SILAC-SPROX) technique, we assayed ~800 proteins for protein folding and stability changes in three different cell culture models of breast cancer including the MCF-10A, MCF-7, and MDA-MB-231 cell lines. The thermodynamic stability profiles generated here created distinct molecular markers for the three cell lines, and a significant fraction (~45%) of the differentially stabilized proteins did not have altered expression levels. Thus, the protein biomarkers reported here created novel molecular signatures of breast cancer and provided additional insight into the molecular basis of the disease. Our results establish the utility of protein folding and stabilitymeasurements for the study of disease processes.&#34;,
    &#34;keywords&#34;: null,
    &#34;publicationDate&#34;: &#34;20150410&#34;,
    &#34;publications&#34;: [
    {
    &#34;id&#34;: &#34;25825992&#34;,
    &#34;publicationDate&#34;: &#34;2015-04-09&#34;,
    &#34;title&#34;: &#34;Global analysis of protein folding thermodynamics for disease state characterization.&#34;,
    &#34;pubabstract&#34;: &#34;Current methods for the large-scale characterization of disease states generally rely on the analysis of gene and/or protein expression levels. These existing methods fail to detect proteins with disease-related functions and unaltered expression levels. Here we describe the large-scale use of thermodynamic measurements of protein folding and stability for the characterization of disease states. Using the Stable Isotope Labeling with Amino Acids in Cell Culture and Stability of Proteins from Rates of Oxidation (SILAC-SPROX) technique, we assayed âˆ¼800 proteins for protein folding and stability changes in three different cell culture models of breast cancer including the MCF-10A, MCF-7, and MDA-MB-231 cell lines. The thermodynamic stability profiles generated here created distinct molecular markers to differentiate the three cell lines, and a significant fraction (âˆ¼45%) of the differentially stabilized proteins did not have altered expression levels. Thus, the differential thermodynamic profiling strategy reported here created novel molecular signatures of breast cancer and provided additional insight into the molecular basis of the disease. Our results establish the utility of protein folding and stability measurements for the study of disease processes, and they suggest that such measurements may be useful for biomarker discovery in disease.&#34;,
    &#34;cycle&#34;: &#34;testcyclehere&#34;
    }
    ],
    &#34;related_datasets&#34;: null,
    &#34;data_protocol&#34;: &#34;Peak lists were extracted from the raw LC-MS/MS data files and the data were searched against the 20265 human proteins in the 2014-04 release of the UniProt Knowledgebase (downloaded at ftp://ftp.uniprot.org/pub/databases/uniprot/current_releases/release-2014_04/knowledgebase/) using Maxquant 1.3.0.5.41 The following modifications were used: methyl methanethiosulfonate at cysteine as a fixed modification, SILAC labeling of lysine (13C614N2) and arginine (13C6), and variable (0-1) oxidation of methionine and deamidation of Asparagine and Glutamine (N and Q), and acetylation of the protein N-terminus. The enzyme was set as Trypsin, and up to 2 missed cleavages were permitted. The false discovery rate for peptide and protein identifications was set to1%, and rest of the parameters were set at the default settings. As part of the default settings, the mass tolerance for precursor ions was set to 20 ppm for the first search where initial mass recalibration was completed and a 6 ppm precursor mass tolerance was used for the main search. The mass tolerance for fragment ions was 0.5 Da. We also included match between runs and re-quantification of the searched peptides. The search results were exported toExcel for further data analysis as described below. Only the protein and peptide identifications with no-zeropositive ratios (H/L &gt;0) were used in subsequent data analysis steps. The methionine-containing peptides wereselected, and those methionine-containing peptides consistently identified in the protein samples derived from sixor more denaturant-containing buffers were assayed. For the methionine-containing peptides, a single averaged H/Lratio was calculated for each peptide sequence and each charge state at each denaturant concentration. Similarly, for each analysis, a median H/L ratio was determined for each protein using the H/L ratios measured for all thenon-methionine-containing peptides identified in all the denaturant concentrations for a given protein. These medianH/L ratios were used to select hits with H/L&gt;2 fold in the protein expression level analyses. For hit peptide andprotein selection in the thermodynamic analyses, all the H/L ratios generated for the non-methionine containingpeptides from a given protein were divided by the median H/L ratio for that protein in order to generate normalizedH/L ratios for each non-methionine containing peptide. These normalized H/L ratios were log2 transformed. Thenormalized and log2 transformed H/L ratios generated for the non-methionine-containing peptides in a given analysiswere used to determine the 5th and 95th percentiles values used in subsequent analysis of methionine-containingpeptides. The averaged H/L ratios calculated for each methionine-containing peptides were also normalized and log2transformed. The methionine-containing peptides and proteins with log2 transformed H/L ratios less than the 5thpercentile or greater than the 95th percentile values determined above were selected and then visually inspected todetermine which peptides had altered H/L ratios at 2 or more consecutive denaturant concentrations to generate aninitial list of protein hits.&#34;,
    &#34;sample_protocol&#34;: &#34;SILAC labeled MCF-7 and MDA-MB-231 cell lysates were prepared according to established SILAC protocols. Aliquots of each lysate were distributed into a series of denaturant-containing buffers, reacted with hydrogen peroxide under conditions that selectively oxidize exposed methionine residues, and quenched with the addition of excess methionine. The light and heavy samples generated at matching denaturant concentration were combined. Each combined protein sample was submitted to a bottom-up, solution-phase, shotgun proteomics analysis using LC-MS/MS. Ultimately, L/H ratios were obtained for the peptides detected at each denaturant concentration, and the denaturant dependence of the L/H ratioâ€™s was examined.&#34;
    }


&lt;h3 id=&#34;pagination&#34;&gt;Pagination&lt;/h3&gt;

&lt;p&gt;Responses containing multiple datasets are paginated to prevent accidental downloads
of large amounts of data and to speed up the &lt;code&gt;API&lt;/code&gt;. The &lt;code&gt;page size&lt;/code&gt; is controlled by the size parameter. Its default value is 20 datasets per page, and the maximum number of datasets per page is 100.&lt;/p&gt;

&lt;p&gt;Another parameter is start which indicates the numeric order (starting from 0, not 1) of the first dataset in this page. Its default value is 0.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;http://www.omicsdi.org/ws/dataset/search?query=human&amp;amp;start=0&amp;amp;size=50&#34;&gt;http://www.omicsdi.org/ws/dataset/search?query=human&amp;amp;start=0&amp;amp;size=50&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;a href=&#34;http://www.omicsdi.org/ws/dataset/search?query=human&amp;amp;start=0&amp;amp;size=20&#34;&gt;http://www.omicsdi.org/ws/dataset/search?query=human&amp;amp;start=0&amp;amp;size=20&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&#34;sort&#34;&gt;Sort&lt;/h3&gt;

&lt;p&gt;The result datasets can be sorted using the title, description, publication date, accession id and the relevance of the query term.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;http://www.omicsdi.org/ws/dataset/search?query=human&amp;amp;sort_field=id&#34;&gt;http://www.omicsdi.org/ws/dataset/search?query=human&amp;amp;sort_field=id&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;http://www.omicsdi.org/ws/dataset/search?query=human&amp;amp;sort_field=publication_date&#34;&gt;http://www.omicsdi.org/ws/dataset/search?query=human&amp;amp;sort_field=publication_date&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&#34;filtering&#34;&gt;Filtering&lt;/h3&gt;

&lt;p&gt;The API supports several filtering operations that complement the main &lt;code&gt;OmicsDI&lt;/code&gt; search functionality.&lt;/p&gt;

&lt;p&gt;Filtering by search term, there is 1 URL parameter: query&lt;/p&gt;

&lt;p&gt;Examples&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;http://www.omicsdi.org/ws/dataset/search?query=human&#34;&gt;http://www.omicsdi.org/ws/dataset/search?query=human&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;a href=&#34;http://www.omicsdi.org/ws/dataset/search?query=cancer&#34;&gt;http://www.omicsdi.org/ws/dataset/search?query=cancer&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Filtering by omics type&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;The omics type can be specified by adding terms in the query url parameter with key: omics_type (possible values: Proteomics, Metabolomics, Genomics, Transcriptomics).&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;http://www.omicsdi.org/ws/dataset/search?query=human%20AND%20omics_type:%22Proteomics%22&#34;&gt;http://www.omicsdi.org/ws/dataset/search?query=human AND omics_type:&amp;ldquo;Proteomics&amp;rdquo;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Filtering by database&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The database can be specified by adding terms in the query URL parameter with key: repository (possible values: MassIVE, Metabolights, PeptideAtlas, PRIDE, GPMDB, EGA, Metabolights, Metabolomics Workbench, MetabolomeExpress, GNPS, ArrayExpress, ExpressionAtlas).&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;http://www.omicsdi.org/ws/dataset/search?query=human%20AND%20repository:%22Metabolights%22&#34;&gt;http://www.omicsdi.org/ws/dataset/search?query=human AND repository:&amp;ldquo;Metabolights&amp;rdquo;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Filtering by Organism&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The organism can be specified by adding terms in the query URL parameter with key: TAXONOMY (possible values must be the TAXONOMY id: 9606, 10090&amp;hellip;).&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;http://www.omicsdi.org/ws/dataset/search?query=human%20AND%20TAXONOMY:%229606%22&#34;&gt;http://www.omicsdi.org/ws/dataset/search?query=human AND TAXONOMY:&amp;ldquo;9606&amp;rdquo;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Filtering by Tissue&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The tissue can be specified by adding terms in the query URL parameter with key: tissue (possible values: Liver, Cell culture, Brain, Lung&amp;hellip;).&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;http://www.omicsdi.org/ws/dataset/search?query=human%20AND%20tissue:%22Brain%22&#34;&gt;http://www.omicsdi.org/ws/dataset/search?query=human AND tissue:&amp;ldquo;Brain&amp;rdquo;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Filtering by Disease&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The disease can be specified by adding terms in the query URL parameter with key: disease (possible values: Breast cancer, Lymphoma, Carcinoma, prostate adenocarcinoma&amp;hellip;).&lt;/p&gt;

&lt;p&gt;Examples&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;http://www.omicsdi.org/ws/dataset/search?query=human%20AND%20tissue:%22Breast%20cancer%22&#34;&gt;http://www.omicsdi.org/ws/dataset/search?query=human AND tissue:&amp;ldquo;Breast cancer&amp;rdquo;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Filtering by Modification (in proteomics)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Modifications (in proteomics) can be specified by adding terms in the query URL parameter with key: disease (possible values: Deamidated residue, Deamidated, Monohydroxylated residue, Iodoacetamide derivatized residue&amp;hellip;).&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;http://www.omicsdi.org/ws/dataset/search?query=human%20AND%20modification:%22iodoacetamide%20derivatized%20residue%22&#34;&gt;http://www.omicsdi.org/ws/dataset/search?query=human AND modification:&amp;ldquo;iodoacetamide derivatized residue&amp;rdquo;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Filtering by Instruments &amp;amp; Platforms&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Instruments &amp;amp; Platforms can be specified by adding terms in the query URL parameter with key: instrument_platform (possible values: QSTAR, LTQ Orbitrap, Q Exactive, LTQ&amp;hellip;).&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;http://www.omicsdi.org/ws/dataset/search?query=human%20AND%20instrument_platform:%22Q%20Exactive%22&#34;&gt;http://www.omicsdi.org/ws/dataset/search?query=human AND instrument_platform:&amp;ldquo;Q Exactive&amp;rdquo;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Filtering by Publication Date&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Publication Date can be specified by adding terms in the query URL parameter with key: &amp;ldquo;publication_date&amp;rdquo; (possible values: 2015, 2014, 2013, 2014&amp;hellip;).&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;http://www.omicsdi.org/ws/dataset/search?query=human%20AND%20publication_date:%222015%22&#34;&gt;http://www.omicsdi.org/ws/dataset/search?query=human AND publication_date:&amp;ldquo;2015&amp;rdquo;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Filtering by Technology Type&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Technology Type can be specified by adding terms in the query URL parameter with key: &amp;ldquo;technology_type&amp;rdquo; (possible values: Mass Spectrometry, Bottom-up proteomics, Gel-based experiment, Shotgun proteomics&amp;hellip;).&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;http://www.omicsdi.org/ws/dataset/search?query=human%20AND%20technology_type:%22Mass%20Spectrometry%22&#34;&gt;http://www.omicsdi.org/ws/dataset/search?query=human AND technology_type:&amp;ldquo;Mass Spectrometry&amp;rdquo;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Combined filters&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Any filters can be combined to narrow down the query using the AND operator. More logical operators will be supported in the future.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;http://www.omicsdi.org/ws/dataset/search?query=human%20AND%20technology_type:%22Shotgun%20proteomics%22%20and%20AND%20modification:%22monohydroxylated%20residue%22&#34;&gt;http://www.omicsdi.org/ws/dataset/search?query=human AND technology_type:&amp;ldquo;Shotgun proteomics&amp;rdquo; and AND modification:&amp;ldquo;monohydroxylated residue&amp;rdquo;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>Claiming datasets in OmicsDI</title>
      <link>https://omicsdi.github.io/post/claiming-datasests/</link>
      <pubDate>Mon, 20 Feb 2017 14:04:54 +0000</pubDate>
      
      <guid>https://omicsdi.github.io/post/claiming-datasests/</guid>
      <description>

&lt;p&gt;One of the first request OmicsDI team (January/2017) received after the official release of the resource was the possibility to login into the system and associated to the user the related public datasets. The original request was informally made by Professor Rob Beynon of Liverpool University (@astacus) and replied by Laurent Gatto @lgatt0 .&lt;/p&gt;

&lt;p&gt;&lt;blockquote class=&#34;twitter-tweet&#34; data-lang=&#34;en&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;try &lt;a href=&#34;https://twitter.com/OmicsDI&#34;&gt;@OmicsDI&lt;/a&gt;. That should do it, I believe&lt;/p&gt;&amp;mdash; Laurent Gⓐtt⓪ (@lgatt0) &lt;a href=&#34;https://twitter.com/lgatt0/status/816199103495421952&#34;&gt;January 3, 2017&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async src=&#34;//platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;&lt;/p&gt;

&lt;p&gt;For more than 3 months, OmicsDI Team (www.omicsdi.org) has been working on this feature and we are proud to announce it formal release today. The user profile in OmicsDI follow has a simple aim:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Provide a central resource where scientist can aggregate all their public omics datasets previously deposited omics archives and repositories.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The OmicsDI Profile is built on two different components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The User Profile: General information about the User (Name, email, affiliation, Account connections).&lt;/li&gt;
&lt;li&gt;My Datasets: List of public datasets. &lt;a href=&#34;https://scholar.google.co.uk/intl/en/scholar/about.html&#34;&gt;Similar to Google Scholar for publications&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;omicsdi-user-profile&#34;&gt;OmicsDI User Profile.&lt;/h2&gt;

&lt;p&gt;The user can create an OmicsDI account using five different accounts: &lt;a href=&#34;https://orcid.org/&#34;&gt;ORCID&lt;/a&gt;, &lt;a href=&#34;https://www.elixir-europe.org/&#34;&gt;ELIXIR&lt;/a&gt;, &lt;a href=&#34;http://www.twitter.com&#34;&gt;Twitter&lt;/a&gt;, &lt;a href=&#34;http://www.github.com&#34;&gt;GitHub&lt;/a&gt; and &lt;a href=&#34;http://www.facebook.com&#34;&gt;Facebook&lt;/a&gt;.&lt;/p&gt;


&lt;figure &gt;
    
        &lt;img src=&#34;https://omicsdi.github.io/media/login-option.png&#34; /&gt;
    
    
    &lt;figcaption&gt;
        &lt;h4&gt;Figure 1: Login Button Home Page of OmicsDI&lt;/h4&gt;
        
    &lt;/figcaption&gt;
    
&lt;/figure&gt;


&lt;p&gt;When the user is created, an empty profile is generated (Figure 2). The Profile contains two main sections: &lt;strong&gt;Edit Profile&lt;/strong&gt; and &lt;strong&gt;Edit Datasets&lt;/strong&gt;.&lt;/p&gt;


&lt;figure &gt;
    
        &lt;img src=&#34;https://omicsdi.github.io/media/profile-empty-page.png&#34; /&gt;
    
    
    &lt;figcaption&gt;
        &lt;h4&gt;Figure 2: Profile Page in OmicsDI&lt;/h4&gt;
        
    &lt;/figcaption&gt;
    
&lt;/figure&gt;


&lt;p&gt;When a user logs into OmicsDI some general information is pulled from the source account (e.g. ORCID). However, this data is incomplete. Then, the first step is to &lt;strong&gt;Edit Profile&lt;/strong&gt; where the user can update the: &lt;code&gt;photo&lt;/code&gt;, &lt;code&gt;affiliation&lt;/code&gt;, &lt;code&gt;email&lt;/code&gt;, &lt;code&gt;short biography&lt;/code&gt;, &lt;code&gt;Make my profile public&lt;/code&gt;. The last option (Figure 3) enables the non-registered users to see your profile (datasets).&lt;/p&gt;


&lt;figure &gt;
    
        &lt;img src=&#34;https://omicsdi.github.io/media/make-profile.png&#34; /&gt;
    
    
    &lt;figcaption&gt;
        &lt;h4&gt;Figure 3: Make my profile public&lt;/h4&gt;
        
    &lt;/figcaption&gt;
    
&lt;/figure&gt;


&lt;p&gt;After the Profile is updated, the user can start adding (&lt;code&gt;claiming&lt;/code&gt;) its datasets.&lt;/p&gt;

&lt;h2 id=&#34;my-datasets&#34;&gt;My Datasets&lt;/h2&gt;

&lt;p&gt;Users can use the OmicsDI &lt;code&gt;Search Box&lt;/code&gt; (Figure 4) to search your datasets by using your name, last name, a title of your dataset or title of the publication related with your dataset.&lt;/p&gt;


&lt;figure &gt;
    
        &lt;img src=&#34;https://omicsdi.github.io/media/search-datasets.png&#34; /&gt;
    
    
    &lt;figcaption&gt;
        &lt;h4&gt;Figure 4: Search your datasets&lt;/h4&gt;
        
    &lt;/figcaption&gt;
    
&lt;/figure&gt;


&lt;p&gt;After clicking the dataset of interest a link in the dataset page allow you to claim and add to your profile the dataset.&lt;/p&gt;


&lt;figure &gt;
    
        &lt;img src=&#34;https://omicsdi.github.io/media/claim-butoon.png&#34; /&gt;
    
    
    &lt;figcaption&gt;
        &lt;h4&gt;Figure 5: Claim Datset Button&lt;/h4&gt;
        
    &lt;/figcaption&gt;
    
&lt;/figure&gt;


&lt;p&gt;Finally, a new button is shown (Figure 6) where the user can see their dataset added to the profile.&lt;/p&gt;


&lt;figure &gt;
    
        &lt;img src=&#34;https://omicsdi.github.io/media/profile-butoon.png&#34; /&gt;
    
    
    &lt;figcaption&gt;
        &lt;h4&gt;Figure 6: VIEW IN PROFILE&lt;/h4&gt;
        
    &lt;/figcaption&gt;
    
&lt;/figure&gt;


&lt;p&gt;Finally, all datasets are listed in your profile. The datasets can be removed from your Profile by using the &lt;code&gt;Edit Datasets&lt;/code&gt; in your Profile.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Filtering search results</title>
      <link>https://omicsdi.github.io/post/filter-results/</link>
      <pubDate>Mon, 20 Feb 2017 14:04:54 +0000</pubDate>
      
      <guid>https://omicsdi.github.io/post/filter-results/</guid>
      <description>

&lt;p&gt;The search results can be filtered or refined using different &lt;code&gt;filters&lt;/code&gt; or &lt;code&gt;terms&lt;/code&gt; (Figure 1).
The OmicsDI web application supports at the moment nine different refinements: &lt;code&gt;Omics Type&lt;/code&gt;, &lt;code&gt;repository/database&lt;/code&gt;,
&lt;code&gt;Organisms&lt;/code&gt;, &lt;code&gt;Tissue, diseases&lt;/code&gt;,  &lt;code&gt;Modifications (proteomics)&lt;/code&gt;, &lt;code&gt;Instruments and platforms&lt;/code&gt;, &lt;code&gt;Publication data&lt;/code&gt;,
&lt;code&gt;Technology type&lt;/code&gt;.&lt;/p&gt;


&lt;figure &gt;
    
        &lt;img src=&#34;https://omicsdi.github.io/media/filtering-results.png&#34; /&gt;
    
    
    &lt;figcaption&gt;
        &lt;h4&gt;Figure 1: Filtering results of Search in the Browse Page&lt;/h4&gt;
        
    &lt;/figcaption&gt;
    
&lt;/figure&gt;


&lt;p&gt;&lt;/br&gt;&lt;/br&gt;&lt;/p&gt;

&lt;h2 id=&#34;filter-box&#34;&gt;Filter Box&lt;/h2&gt;


&lt;figure class=&#34;left&#34;&gt;
    
        &lt;img src=&#34;https://omicsdi.github.io/media/filter-box-typing.png&#34; width=&#34;300&#34; /&gt;
    
    
    &lt;figcaption&gt;
        &lt;h4&gt;Figure 2: Tissue Filter Box&lt;/h4&gt;
        
    &lt;/figcaption&gt;
    
&lt;/figure&gt;


&lt;p&gt;Each &lt;code&gt;Filter Box&lt;/code&gt; shows the number of datasets within each category (e.g tissue type). The &lt;code&gt;user&lt;/code&gt; can search in the  &lt;strong&gt;textfield&lt;/strong&gt;
for a certain category and the system will filter the categories by the keywords the user specifies. For example (&lt;strong&gt;Figure 2&lt;/strong&gt;), if the user is interested
in &lt;code&gt;brain&lt;/code&gt; tissue, then s/he can see all the tissues containing the keyword &lt;code&gt;brain&lt;/code&gt;.&lt;/p&gt;


Notice that most of the filters are free text based meaning that their values rely on the
annotations provided by the specific databases.

OmicsDI Team is always improving the automatic annotation system to move more attributes/properties of the dataset
from Free-Text to ontology-based values.


&lt;p&gt;&lt;/br&gt;&lt;/br&gt;&lt;/p&gt;

&lt;h2 id=&#34;ranking-results&#34;&gt;Ranking Results&lt;/h2&gt;


&lt;figure &gt;
    
        &lt;img src=&#34;https://omicsdi.github.io/media/ranking.png&#34; /&gt;
    
    
    &lt;figcaption&gt;
        &lt;h4&gt;Figure 3: Ranking the Search Results&lt;/h4&gt;
        
    &lt;/figcaption&gt;
    
&lt;/figure&gt;


&lt;p&gt;The final results of the search can be sorted by three different categories: &lt;code&gt;Accession&lt;/code&gt;, &lt;code&gt;Relevance&lt;/code&gt;, &lt;code&gt;Publication Date&lt;/code&gt;. The
&lt;code&gt;Accession&lt;/code&gt; is the accession of the datasets in the system; the &lt;code&gt;Relevance&lt;/code&gt; is how close is the dataset to specific query; the
&lt;code&gt;Publication Date&lt;/code&gt; sort the datasets by publication date.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Relevance&lt;/strong&gt;&lt;/p&gt;


The actual search is done via a call to Apache Lucene, which takes two arguments: the query and an upper bound on the number of hits (datasets) to return.
Lucene scoring uses a combination of the Vector Space Model (VSM) of Information Retrieval and the Boolean model to determine how relevant a given Document is to a User&#39;s query. In general, the idea behind the VSM is the more times a query term appears in a document relative to the number of times the term appears in all the documents in the collection, the more relevant that document is to the query. It uses the Boolean model to first narrow down the documents that need to be scored based on the use of boolean logic in the Query specification. Lucene also adds some capabilities and refinements onto this model to support boolean and fuzzy searching, but it essentially remains a VSM based system at the heart.


</description>
    </item>
    
    <item>
      <title>Searching in OmicsDI</title>
      <link>https://omicsdi.github.io/post/searching-in-omicsdi/</link>
      <pubDate>Mon, 20 Feb 2017 10:02:07 +0000</pubDate>
      
      <guid>https://omicsdi.github.io/post/searching-in-omicsdi/</guid>
      <description>

&lt;p&gt;The &lt;strong&gt;main goal&lt;/strong&gt; of the &lt;a href=&#34;www.omicsdi.org&#34;&gt;Omics Discovery Index&lt;/a&gt; is to provide a platform for &lt;code&gt;searching&lt;/code&gt; and &lt;code&gt;linking&lt;/code&gt; omics public data.
OmicsDI has implemented a &lt;strong&gt;unique&lt;/strong&gt; and &lt;strong&gt;novel&lt;/strong&gt; &lt;code&gt;Search Engine&lt;/code&gt; for omics datasets including public and protected data.&lt;/p&gt;

&lt;h2 id=&#34;the-omicsdi-search-box&#34;&gt;The OmicsDI Search Box&lt;/h2&gt;


&lt;figure &gt;
    
        &lt;img src=&#34;https://omicsdi.github.io/media/search-box.png&#34; /&gt;
    
    
    &lt;figcaption&gt;
        &lt;h4&gt;Figure 1: OmicsDI Search Box&lt;/h4&gt;
        
    &lt;/figcaption&gt;
    
&lt;/figure&gt;


&lt;p&gt;The &lt;code&gt;OmicsDI Search Box&lt;/code&gt; is the main component to searching in OmicsDI. The &lt;code&gt;user&lt;/code&gt; can type a set of &lt;strong&gt;keywords&lt;/strong&gt; that will enable the system
to find the datasets containing those keywords.&lt;/p&gt;


If the user uses double quote &#34;breast cancer&#34; in their search the system will try to find the exact sentence in the datasets.


&lt;p&gt;The &lt;code&gt;OmicsDI Search Box&lt;/code&gt; provides a unique &lt;strong&gt;auto-complete&lt;/strong&gt; feature that enables &lt;code&gt;user&lt;/code&gt; to select sentence after typing a subset of keywords. For example,
Figure 2 shows all sentences/phrases in OmicsDI containing the words &lt;em&gt;breast cancer&lt;/em&gt;.&lt;/p&gt;


&lt;figure &gt;
    
        &lt;img src=&#34;https://omicsdi.github.io/media/search-box-autocomplete.png&#34; /&gt;
    
    
    &lt;figcaption&gt;
        &lt;h4&gt;Figure 2: OmicsDI Search Box with Auto-complete&lt;/h4&gt;
        
    &lt;/figcaption&gt;
    
&lt;/figure&gt;


&lt;h2 id=&#34;query-syntax&#34;&gt;Query Syntax&lt;/h2&gt;

&lt;p&gt;When the user types any text in &lt;code&gt;OmicsDI Search Box&lt;/code&gt;, the input is translated into an &lt;a href=&#34;http://lucene.apache.org/&#34;&gt;Apache Lucene query&lt;/a&gt; that is then executed
to get the search results. The actual query executed is generated following the typical Apache Lucene query syntax in order to
provide a generic approach avoiding complex query rearrangements.&lt;/p&gt;

&lt;p&gt;Multiple search terms separated by white spaces are combined by default in &lt;code&gt;AND&lt;/code&gt; logic. Therefore an input text containing for example
&lt;code&gt;glutathione transferase&lt;/code&gt; is treated as &lt;code&gt;glutathione AND transferase&lt;/code&gt; and only entries having both terms will be found. The default order
of results is based on their relevance, i.e. the proximity of the terms in the entries.&lt;/p&gt;

&lt;p&gt;Table 1: Overview of some useful query syntax elements is presented.&lt;/p&gt;

&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Element&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Meaning&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Usage&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Example&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Notes&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;

&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;AND&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;In addition to&lt;/td&gt;
&lt;td&gt;term1 AND term2&lt;/td&gt;
&lt;td&gt;glutathione AND transferase&lt;/td&gt;
&lt;td&gt;Matches entries where both glutathione and transferase occur.&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;&lt;code&gt;OR&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Equivalence&lt;/td&gt;
&lt;td&gt;term1 OR term2&lt;/td&gt;
&lt;td&gt;glutathione OR transferase&lt;/td&gt;
&lt;td&gt;Matches entries where either glutathione or transferase occur.&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;&lt;code&gt;NOT&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Exclusion&lt;/td&gt;
&lt;td&gt;term1 NOT term2&lt;/td&gt;
&lt;td&gt;coding NOT fragment&lt;/td&gt;
&lt;td&gt;Matches entries containing  coding  but not fragment.&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;&lt;code&gt;*&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Wildcard&lt;/td&gt;
&lt;td&gt;partialTerm*&lt;/td&gt;
&lt;td&gt;gluta*&lt;/td&gt;
&lt;td&gt;Matches for instance glutathione, glutamate, glutamic.&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;&amp;rdquo; &amp;ldquo;&lt;/td&gt;
&lt;td&gt;Exact match&lt;/td&gt;
&lt;td&gt;&amp;ldquo;quoted text&amp;rdquo;&lt;/td&gt;
&lt;td&gt;&amp;ldquo;x-ray diffraction&amp;rdquo;&lt;/td&gt;
&lt;td&gt;Exact matching for entries containing x-ray diffraction.&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;&lt;code&gt;( )&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Grouping&lt;/td&gt;
&lt;td&gt;(text)&lt;/td&gt;
&lt;td&gt;(reductase OR transferase) AND glutathione&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Field:&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Field-specific search&lt;/td&gt;
&lt;td&gt;fieldId:term&lt;/td&gt;
&lt;td&gt;description:dopamine&lt;/td&gt;
&lt;td&gt;Matches for a field description containing dopamine.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;h4 id=&#34;escaping-special-characters&#34;&gt;Escaping special characters&lt;/h4&gt;

&lt;p&gt;The following characters within queries require to be escaped (using a &amp;lsquo; \ &amp;lsquo; before the character to escape) in order to be correctly interpreted:&lt;/p&gt;


+ - &amp; | ! ( ) { } [ ] ^ &#34; ~ * ? : \ /


&lt;p&gt;Since Apache Lucene supports regular expression searches (matching a pattern between forwarding slashes) the forward slash &amp;lsquo; / &amp;rsquo; has become a special character to be escaped. For example to search for
&lt;code&gt;cancer/testis&lt;/code&gt; use the query &lt;code&gt;cancer\/testis&lt;/code&gt;. If special characters are not escaped the actual query performed may be different from what expected.&lt;/p&gt;

&lt;h4 id=&#34;query-examples&#34;&gt;Query examples&lt;/h4&gt;

&lt;p&gt;Following the aforementioned query syntax, users can easily search and filter results according to data content and characteristics.
A few examples of queries that can be performed using EBI Search are listed below.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Search for &lt;a href=&#34;http://www.omicsdi.org/search?q=insulin%20receptor&#34;&gt;insulin receptor&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Search Datasets that identified &lt;a href=&#34;http://www.omicsdi.org/search?q=(UNIPROT:%20(%22P07900%22))&#34;&gt;P07900&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;searching-using-biological-evidence&#34;&gt;Searching using Biological Evidence&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;OmicsDI Search Box&lt;/code&gt; allows the end-users to search data using biological evidence such as the list of the proteins identified in the proteomics experiment or the metabolites
reported in the Metabolomics experiment. For example (Figure 3), if the user searches for &lt;code&gt;3-methyl-2-oxobutanoic&lt;/code&gt; in the resource it will find one dataset in Metaboligths and five in Metabolome workbench
that identified the current molecule.&lt;/p&gt;


&lt;figure &gt;
    
        &lt;img src=&#34;https://omicsdi.github.io/media/search-biological-evidences.png&#34; /&gt;
    
    
    &lt;figcaption&gt;
        &lt;h4&gt;Figure 3: Search for Biological evidences  3-methyl-2-oxobutanoic&lt;/h4&gt;
        
    &lt;/figcaption&gt;
    
&lt;/figure&gt;


&lt;p&gt;The final search results are shown in the &lt;a href=&#34;http://www.omicsdi.org/search?q=*:*&#34;&gt;browser page&lt;/a&gt; including &lt;code&gt;Refine Filters&lt;/code&gt;. &lt;a href=&#39;https://omicsdi.github.io/post/filter-results/&#39;&gt;Read More Here&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Navigating Omics from the home page</title>
      <link>https://omicsdi.github.io/post/navigation-from-home/</link>
      <pubDate>Mon, 20 Feb 2017 01:17:55 +0000</pubDate>
      
      <guid>https://omicsdi.github.io/post/navigation-from-home/</guid>
      <description>&lt;p&gt;The &lt;a href=&#34;www.omicsdi.org&#34;&gt;OmicsDI Home Page&lt;/a&gt; provides different blocks for navigating through the datasets, some of them
are: &lt;code&gt;2D WordCloud&lt;/code&gt;; the species/organism/diseases &lt;code&gt;Bubble Chart&lt;/code&gt;, repository/omics &lt;code&gt;Bar Chart&lt;/code&gt;, &lt;code&gt;Latest Datasets List&lt;/code&gt;,
&lt;code&gt;Most Accessed Datasets&lt;/code&gt; List, &lt;code&gt;Datasets per year&lt;/code&gt; List. All the charts allow the user to search the data using
the specific attribute. These boxes also act as a statistic component of the resource: for example, the pie chart shows how many datasets
for each repository and omics the resource contains.&lt;/p&gt;


&lt;figure &gt;
    
        &lt;img src=&#34;https://omicsdi.github.io/media/home-page.png&#34; /&gt;
    
    
&lt;/figure&gt;


&lt;p&gt;A &lt;code&gt;TagCloud&lt;/code&gt; or &lt;code&gt;WordCloud&lt;/code&gt; is a visual representation of metadata, typically used to depict keyword metadata (tags)
on datasets, or to visualize free form text. The &lt;code&gt;WordCloud&lt;/code&gt; is built using the more frequent words for every
database/repository. The OmicsDI &lt;code&gt;WordCloud&lt;/code&gt; can be considered as a two-dimensional term representation where the user can
select the database and the field they want to look for: description vs database. The user can click the highlighted word in
the wordcloud to search for this term in the resource.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;Bubble Chart&lt;/code&gt; block allows the users to navigate the data using three main categories: Tissues, Organisms,
and Diseases. The user can click on the bubble and s/he will be redirected to the search using the clicked term.&lt;/p&gt;

&lt;p&gt;The Repo/Omics &lt;code&gt;Bar Chart&lt;/code&gt; and the Omics vs Year bar chart allow the users navigate the data using the omics categories
(&lt;strong&gt;metabolomics&lt;/strong&gt;, &lt;strong&gt;transcriptomics&lt;/strong&gt;, &lt;strong&gt;proteomics&lt;/strong&gt; and &lt;strong&gt;genomics&lt;/strong&gt;). The &lt;code&gt;user&lt;/code&gt; can click a bar or the pie and it will
be redirected to the search using the clicked term.&lt;/p&gt;

&lt;p&gt;The Latest Datasets and Most accessed datasets blocks provide a list of the datasets by the two categories.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Welcome to OmicsDI Edu page</title>
      <link>https://omicsdi.github.io/post/welcome-to-omicsdi/</link>
      <pubDate>Fri, 17 Feb 2017 23:48:18 +0000</pubDate>
      
      <guid>https://omicsdi.github.io/post/welcome-to-omicsdi/</guid>
      <description>

&lt;p&gt;&lt;a href=&#34;http://www.omicsdi.org&#34;&gt;Omics Discovery Index&lt;/a&gt; is an integrated and open source platform
facilitating the access and dissemination of omics datasets. It provides a unique infrastructure to integrate
datasets coming from multiple omics studies, including at present &lt;strong&gt;proteomics&lt;/strong&gt;, &lt;strong&gt;genomics&lt;/strong&gt;, &lt;strong&gt;transcriptomics&lt;/strong&gt; and
&lt;strong&gt;metabolomics&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;OmicsDI stores metadata coming from the public datasets from every resource using an efficient
indexing system, which is able to integrate different biological entities including
genes, proteins and metabolites with the relevant life science literature. OmicsDI is updated daily, as new datasets get
publicly available in the contributing repositories.&lt;/p&gt;

&lt;h2 id=&#34;omics-data-submission&#34;&gt;Omics Data Submission&lt;/h2&gt;

&lt;p&gt;The increasing role of huge datasets in scientific research has important implications for the way the research
is conducted, for the way it should be organized and funded, and for the training of new researchers.
However, the advances in biomedical research depend on scientists’ ability to consult and use all available data,
independently from where they were originally produced: data sharing on a global scale is the best way to
‘advance science for the public good’.&lt;/p&gt;

&lt;p&gt;The assumption underlying this policy is that the more scientists are allowed to access the same sets of data,
the more those data will be used to produce new knowledge about biological phenomena.&lt;/p&gt;


&lt;figure &gt;
    
        &lt;img src=&#34;https://omicsdi.github.io/media/databases-workflow.png&#34; /&gt;
    
    
    &lt;figcaption&gt;
        &lt;h4&gt;Figure 1: Current Schema of BioMedical Data Distribution/Dissemination &lt;/h4&gt;
        
    &lt;/figcaption&gt;
    
&lt;/figure&gt;


&lt;p&gt;Figure 1 shows how data is produced/stored and distributed in biomedical research. The result of an omics experiment (e.g Proteomics or Metabolomics)
is submitted to a public Archive (e.g PRIDE or Metaboligths). These &lt;code&gt;Data Archives&lt;/code&gt; provide a common interface for &lt;strong&gt;submission&lt;/strong&gt;,
&lt;strong&gt;validation&lt;/strong&gt; and &lt;strong&gt;downloading&lt;/strong&gt; of the original results/data. Importantly, each individual repository/archive define three major
components to guide the submission process:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The metadata guidelines including Standards and Ontologies to define/describe a dataset and corresponding components (e.g. samples, instruments).&lt;/li&gt;
&lt;li&gt;File formats to store and handle the underlying data in the Dataset.&lt;/li&gt;
&lt;li&gt;The submission guidelines define how to submit and retrieve the data from the repository.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In recent years the number of databases and archives has growth in all omics fields [1]. For example,
in Proteomics the results of a Mass spectrometry experiment can be submitted to four different databases members of
&lt;a href=&#34;www.proteomexchange.org&#34;&gt;ProteomeXchange&lt;/a&gt;: PeptideAtlas/PASSEL, PRIDE, MassIVE, jPOST. In addition, each omics field
has developed and grown independently of the other fields including their metadata specifications, file formats, and submission
guidelines. For this reason, most of the &lt;code&gt;Data Archives&lt;/code&gt; are field specific (e.g. Metabolomics - Metaboligths, Metabolomics Workbench).&lt;/p&gt;

&lt;h2 id=&#34;omics-data-dissemination&#34;&gt;Omics Data Dissemination&lt;/h2&gt;

&lt;p&gt;After the data is submitted to a formal &lt;a href=&#39;#omics-data-submission&#39;&gt;Archive&lt;/a&gt;, Knowledge Base Databases
(&lt;code&gt;BDs&lt;/code&gt;) &lt;strong&gt;reuse&lt;/strong&gt; part of the public data to respond to specific questions (e.g. Gene Expression Profiles - ExpressionAtlas). The
number of these &lt;code&gt;DBs&lt;/code&gt; has growth in recent years. For example, Table 1 shows the list of Protein Expression Databases [1] that include
peptide sequences, post-translational modifications, expression profiles.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Table 1&lt;/strong&gt;: Proteomics Knowledge Base Databases (&lt;code&gt;BDs&lt;/code&gt;) including information about Protein Expression informations (e.g
Peptides Sequences, Post-Translational Modifications).&lt;/p&gt;

&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;﻿&lt;strong&gt;Resource&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;URL&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Publication&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;

&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cancer Mutant Proteome Database&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;http://cgbc.cgu.edu.tw/cmpd/&#34;&gt;http://cgbc.cgu.edu.tw/cmpd/&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Nucleic Acids Res. 2015.&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;MOPED&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://www.proteinspire.org/MOPED/&#34;&gt;https://www.proteinspire.org/MOPED/&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Nucleic Acids Res. 2015.&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;ProteomicsDB&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://www.proteomicsdb.org/&#34;&gt;https://www.proteomicsdb.org/&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Nature. 2014&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;MaxQB&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;http://maxqb.biochem.mpg.de/mxdb/&#34;&gt;http://maxqb.biochem.mpg.de/mxdb/&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Mol Cell Proteomics. 2012&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;GPMDB&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;http://gpmdb.thegpm.org/&#34;&gt;http://gpmdb.thegpm.org/&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;COPaKB&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;http://www.heartproteome.org/copa/Default.aspx&#34;&gt;http://www.heartproteome.org/copa/Default.aspx&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Circ Res. 2013&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;paxDB&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;http://pax-db.org/#!home&#34;&gt;http://pax-db.org/#!home&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Mol Cell Proteomics. 2012&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;Human Proteinpedia&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;http://www.humanproteinpedia.org/&#34;&gt;http://www.humanproteinpedia.org/&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Curr Protoc Bioinformatics. 2013&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;Human Proteome Map&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;http://www.humanproteomemap.org/&#34;&gt;http://www.humanproteomemap.org/&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Nature. 2014&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;h2 id=&#34;omicsdi-vision&#34;&gt;OmicsDI Vision&lt;/h2&gt;


&lt;figure class=&#34;left&#34;&gt;
    
        &lt;img src=&#34;https://omicsdi.github.io/media/vision.png&#34; width=&#34;400&#34; /&gt;
    
    
&lt;/figure&gt;


&lt;p&gt;All these databases/repositories have created a complex and distributed scenario where the data can
be submitted into different &lt;code&gt;Archives&lt;/code&gt; and reused in multiple and different &lt;code&gt;DBs&lt;/code&gt;. The development of tools
facilitating &lt;strong&gt;data sharing&lt;/strong&gt; and able to handle this complexity is a great challenge in itself.&lt;/p&gt;

&lt;p&gt;In this context, we introduce here the &lt;a href=&#34;http://www.omicsdi.org&#34;&gt;Omics Discovery Index&lt;/a&gt;, an open-source platform facilitating
the access, discovery, and dissemination of omics datasets. OmicsDI provides a unique infrastructure to integrate datasets
coming from multiple omics fields, including at present proteomics, genomics, metabolomics, and transcriptomics.&lt;/p&gt;

&lt;p&gt;To date, &lt;a href=&#34;www.omicsdi.org/databases/&#34;&gt;eleven resources&lt;/a&gt; have agreed on a common metadata structure framework and exchange format,
and have contributed to OmicsDI, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Proteomics&lt;/code&gt;: The PRoteomics IDEntifications (PRIDE) database, PeptideAtlas, the Mass spectrometry Interactive Virtual Environment (MassIVE)
and the Global Proteome Machine Database (GPMDB).&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Metabolomics&lt;/code&gt;: MetaboLights, the Global Natural Products Social Molecular Networking project (GNPS),
MetabolomeExpress, and the Metabolomics Workbench.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;The major European Genome-Phenome Archive (EGA)&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Transcriptomics&lt;/code&gt;:  ArrayExpress and Expression Atlas.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;OmicsDI stores biological and technical metadata coming from the public datasets available in every resource,
using an efficient indexing system, which is able to integrate differently
biological entities including &lt;code&gt;genes&lt;/code&gt;, &lt;code&gt;transcripts&lt;/code&gt;, &lt;code&gt;proteins&lt;/code&gt; and &lt;code&gt;metabolites&lt;/code&gt; with the relevant scientific literature.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;References&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[1] Perez‐Riverol, Yasset, et al. &amp;ldquo;Making proteomics data accessible and reusable: current state of proteomics databases and repositories.&amp;rdquo;
Proteomics 15.5-6 (2015): 930-950.&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>OmicsDI major partners</title>
      <link>https://omicsdi.github.io/post/partners/</link>
      <pubDate>Fri, 17 Feb 2017 23:09:18 +0000</pubDate>
      
      <guid>https://omicsdi.github.io/post/partners/</guid>
      <description>

&lt;p&gt;OmicsDI has been built with the collaboration of multiple consortia and individual databases. This collaboration has enabled
the standardization of the metadata across multiple resources and omics type. Each consortium group a set of databases around
the same topic (e.g. proteomics) and has previously agree in a common metadata including Ontology Terms, Study Design, etc.
At the same time, OmicsDI has collaborated with other individual archives and databases such as ArrayExpress or EGA.&lt;/p&gt;

&lt;h2 id=&#34;proteomexchange&#34;&gt;ProteomeXchange&lt;/h2&gt;


&lt;figure &gt;
    
        &lt;img src=&#34;https://omicsdi.github.io/media/PX_logo.png&#34; /&gt;
    
    
&lt;/figure&gt;


&lt;p&gt;The &lt;a href=&#34;wwww.proteomexchange.org&#34;&gt;ProteomeXchange Consortium&lt;/a&gt; is a collaboration of currently four major mass spectrometry
proteomics data repositories, &lt;a href=&#34;www.ebi.ac.uk/pride/archive&#34;&gt;PRIDE&lt;/a&gt; at EMBL-EBI in Cambridge (UK), &lt;a href=&#34;www.peptideatlas.org&#34;&gt;PeptideAtlas&lt;/a&gt;
at ISB in Seattle (US), and &lt;a href=&#34;massive.ucsd.edu&#34;&gt;MASSive&lt;/a&gt; at UCSD (US) and &lt;a href=&#34;http://jpostdb.org/&#34;&gt;jPOST&lt;/a&gt;
offering a unified data deposition and discovery strategy across all three repositories. ProteomeXchange is a
distributed database infrastructure; the potentially very large raw data component of the data is only held at
the original submission database, while the searchable metadata is centrally collected and indexed.
All ProteomeXchange data is fully open after the release of the associated publication.&lt;/p&gt;

&lt;h2 id=&#34;metabolomexchange&#34;&gt;MetabolomeXchange&lt;/h2&gt;


&lt;figure &gt;
    
        &lt;img src=&#34;https://omicsdi.github.io/media/metabolomexchange.png&#34; /&gt;
    
    
&lt;/figure&gt;


&lt;p&gt;&lt;a href=&#34;http://www.metabolomexchange.org/site/&#34;&gt;MetabolomeXchange&lt;/a&gt; is a collaboration of 4 major metabolomics repositories,
with a total of 10 partners contributing. MetabolomeXchange was inspired by and is implementing similar coordination
strategies to ProteomeXchange. The founding partners are &lt;a href=&#34;www.ebi.ac.uk/metabolights/&#34;&gt;MetaboLights&lt;/a&gt; at EMBL-EBI(UK),
Metabolomics Repository Bordeaux(FR), Golm Metabolome Database and the Metabolomics Workbench (US).
The &lt;a href=&#34;metabolomicsworkbench.org/&#34;&gt;Metabolomics Workbench&lt;/a&gt; is an NIH-funded collaboration of 6 Regional
Comprehensive Metabolomics Resource Cores.&lt;/p&gt;

&lt;h2 id=&#34;the-european-genome-phenome-archive&#34;&gt;The European Genome-Phenome Archive&lt;/h2&gt;


&lt;figure &gt;
    
        &lt;img src=&#34;https://omicsdi.github.io/media/ega_logo.png&#34; /&gt;
    
    
&lt;/figure&gt;


&lt;p&gt;The &lt;a href=&#34;https://www.ebi.ac.uk/ega/home&#34;&gt;European Genome-Phenome Archive&lt;/a&gt; (EGA) provides a service for the permanent archiving and distribution of
personally identifiable genetic and phenotypic data resulting from biomedical research projects. Strict protocols govern how information is managed, stored and
distributed by the EGA project. The EGA comprises a public metadata section, allowing searching and identifying
relevant studies, and the controlled access data section. Access to the data section for a particular study is only
granted after validation of a research proposal through the relevant ethics approval.&lt;/p&gt;
</description>
    </item>
    
  </channel>
</rss>