forked from OmicsDI/omicsdi.github.io
-
Notifications
You must be signed in to change notification settings - Fork 0
/
feed.xml
955 lines (625 loc) · 52.5 KB
/
feed.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>OmicsDI Blog</title>
<link>https://omicsdi.github.io/feed/index.xml</link>
<description>Recent content on OmicsDI Blog</description>
<generator>Hugo -- gohugo.io</generator>
<language>en-us</language>
<lastBuildDate>Mon, 27 Feb 2017 22:56:34 +0000</lastBuildDate>
<atom:link href="https://omicsdi.github.io/feed/index.xml" rel="self" type="application/rss+xml" />
<item>
<title>Introduction to OmicsDI API</title>
<link>https://omicsdi.github.io/post/introduction-api/</link>
<pubDate>Mon, 27 Feb 2017 22:56:34 +0000</pubDate>
<guid>https://omicsdi.github.io/post/introduction-api/</guid>
<description>
<p>Most data in the Datatsets Discovery Index can be accessed
programmatically using a <a href="www.omicsdi.org/ws">RESTful API</a>.
The API implementation is based on the Spring Rest Framework.</p>
<h2 id="web-browsable-api">Web-browsable API</h2>
<p>The OmicsDI API is web browsable, which means that:</p>
<ul>
<li>The query results returned by the API are available in JSONformat and also XML. This ensures that they can be viewed by human and accessed programmatically by computer.</li>
<li>The main <a href="www.omicsdi.org/ws">RESTful API</a> page provides a simple web-based user
interface, which allows developers to familiarize themselves with the API and get a
better sense of the OmicsDI data before writing a single line of code.</li>
</ul>
<p>many resources are hyperlinked so that it&rsquo;s possible to navigate the API in the browser.</p>
<p>As a result, developers can familiarize themselves with the API and get a better sense of the OmicsDI data.</p>
<h2 id="api-documentation">API documentation</h2>
<p>Responses containing multiple entries have the following fields:</p>
<ul>
<li>the count is the number of entries in the matching set.</li>
<li>dataset is an array of datasets.</li>
<li>facet is an array of facets.</li>
</ul>
<p>Example</p>
http://www.omicsdi.org/ws/dataset/search?query=human
{
"count": 733,
"datasets": [
{
"id": "PXD000456",
"source": "pride",
"title": "Human glomerular extracellular matrix analysed by LC-MSMS",
"description": "Extracellular matrix proteins were isolated from human glomeruli and analysed by LC-MSMS",
"keywords": [
"Human",
"kidney",
"glomerulus",
"extracellular matrix"
],
"organisms": [
{
"acc": "9606",
"name": "Homo sapiens"
}
],
"publicationDate": "20140122"
},
// 19 more datasets
],
"facets": [
{
"id": "modification",
"label": "Modification",
"total": 181,
"facetValues": [
{
"label": "Unknown modification",
"value": "unknown modification",
"count": "5"
},
//other facet values
],
},
//other facets
]
}
<p>Responses containing just a single dataset have some extra navigation fields, and without the facets</p>
http://www.omicsdi.org/ws/dataset/get?acc=PXD001848&database=PRIDE
{
"id": "PXD001848",
"name": "Global Analysis of Protein Folding Thermodynamics for Disease State Characterization, MCF7 vs MDAMB231",
"description": "Protein biomarkers can be used to characterize and diagnose disease states such as cancer. They can also serve as therapeutic targets. Current methods for protein biomarker discovery, which generally rely on the large-scale analysis of gene and/or protein expression levels, fail to detect protein biomarkers with disease-related functions and unaltered expression levels. Here we describe the large-scale use of thermodynamic measurements of protein folding and stability for disease state characterization and the discovery of protein biomarkers. Using the Stable Isotope Labeling with Amino Acids in Cell Culture and Stability of Proteins from Rates of Oxidation (SILAC-SPROX) technique, we assayed ~800 proteins for protein folding and stability changes in three different cell culture models of breast cancer including the MCF-10A, MCF-7, and MDA-MB-231 cell lines. The thermodynamic stability profiles generated here created distinct molecular markers for the three cell lines, and a significant fraction (~45%) of the differentially stabilized proteins did not have altered expression levels. Thus, the protein biomarkers reported here created novel molecular signatures of breast cancer and provided additional insight into the molecular basis of the disease. Our results establish the utility of protein folding and stabilitymeasurements for the study of disease processes.",
"keywords": null,
"publicationDate": "20150410",
"publications": [
{
"id": "25825992",
"publicationDate": "2015-04-09",
"title": "Global analysis of protein folding thermodynamics for disease state characterization.",
"pubabstract": "Current methods for the large-scale characterization of disease states generally rely on the analysis of gene and/or protein expression levels. These existing methods fail to detect proteins with disease-related functions and unaltered expression levels. Here we describe the large-scale use of thermodynamic measurements of protein folding and stability for the characterization of disease states. Using the Stable Isotope Labeling with Amino Acids in Cell Culture and Stability of Proteins from Rates of Oxidation (SILAC-SPROX) technique, we assayed ∼800 proteins for protein folding and stability changes in three different cell culture models of breast cancer including the MCF-10A, MCF-7, and MDA-MB-231 cell lines. The thermodynamic stability profiles generated here created distinct molecular markers to differentiate the three cell lines, and a significant fraction (∼45%) of the differentially stabilized proteins did not have altered expression levels. Thus, the differential thermodynamic profiling strategy reported here created novel molecular signatures of breast cancer and provided additional insight into the molecular basis of the disease. Our results establish the utility of protein folding and stability measurements for the study of disease processes, and they suggest that such measurements may be useful for biomarker discovery in disease.",
"cycle": "testcyclehere"
}
],
"related_datasets": null,
"data_protocol": "Peak lists were extracted from the raw LC-MS/MS data files and the data were searched against the 20265 human proteins in the 2014-04 release of the UniProt Knowledgebase (downloaded at ftp://ftp.uniprot.org/pub/databases/uniprot/current_releases/release-2014_04/knowledgebase/) using Maxquant 1.3.0.5.41 The following modifications were used: methyl methanethiosulfonate at cysteine as a fixed modification, SILAC labeling of lysine (13C614N2) and arginine (13C6), and variable (0-1) oxidation of methionine and deamidation of Asparagine and Glutamine (N and Q), and acetylation of the protein N-terminus. The enzyme was set as Trypsin, and up to 2 missed cleavages were permitted. The false discovery rate for peptide and protein identifications was set to1%, and rest of the parameters were set at the default settings. As part of the default settings, the mass tolerance for precursor ions was set to 20 ppm for the first search where initial mass recalibration was completed and a 6 ppm precursor mass tolerance was used for the main search. The mass tolerance for fragment ions was 0.5 Da. We also included match between runs and re-quantification of the searched peptides. The search results were exported toExcel for further data analysis as described below. Only the protein and peptide identifications with no-zeropositive ratios (H/L >0) were used in subsequent data analysis steps. The methionine-containing peptides wereselected, and those methionine-containing peptides consistently identified in the protein samples derived from sixor more denaturant-containing buffers were assayed. For the methionine-containing peptides, a single averaged H/Lratio was calculated for each peptide sequence and each charge state at each denaturant concentration. Similarly, for each analysis, a median H/L ratio was determined for each protein using the H/L ratios measured for all thenon-methionine-containing peptides identified in all the denaturant concentrations for a given protein. These medianH/L ratios were used to select hits with H/L>2 fold in the protein expression level analyses. For hit peptide andprotein selection in the thermodynamic analyses, all the H/L ratios generated for the non-methionine containingpeptides from a given protein were divided by the median H/L ratio for that protein in order to generate normalizedH/L ratios for each non-methionine containing peptide. These normalized H/L ratios were log2 transformed. Thenormalized and log2 transformed H/L ratios generated for the non-methionine-containing peptides in a given analysiswere used to determine the 5th and 95th percentiles values used in subsequent analysis of methionine-containingpeptides. The averaged H/L ratios calculated for each methionine-containing peptides were also normalized and log2transformed. The methionine-containing peptides and proteins with log2 transformed H/L ratios less than the 5thpercentile or greater than the 95th percentile values determined above were selected and then visually inspected todetermine which peptides had altered H/L ratios at 2 or more consecutive denaturant concentrations to generate aninitial list of protein hits.",
"sample_protocol": "SILAC labeled MCF-7 and MDA-MB-231 cell lysates were prepared according to established SILAC protocols. Aliquots of each lysate were distributed into a series of denaturant-containing buffers, reacted with hydrogen peroxide under conditions that selectively oxidize exposed methionine residues, and quenched with the addition of excess methionine. The light and heavy samples generated at matching denaturant concentration were combined. Each combined protein sample was submitted to a bottom-up, solution-phase, shotgun proteomics analysis using LC-MS/MS. Ultimately, L/H ratios were obtained for the peptides detected at each denaturant concentration, and the denaturant dependence of the L/H ratio’s was examined."
}
<h3 id="pagination">Pagination</h3>
<p>Responses containing multiple datasets are paginated to prevent accidental downloads
of large amounts of data and to speed up the <code>API</code>. The <code>page size</code> is controlled by the size parameter. Its default value is 20 datasets per page, and the maximum number of datasets per page is 100.</p>
<p>Another parameter is start which indicates the numeric order (starting from 0, not 1) of the first dataset in this page. Its default value is 0.</p>
<p>Examples:</p>
<ul>
<li><p><a href="http://www.omicsdi.org/ws/dataset/search?query=human&amp;start=0&amp;size=50">http://www.omicsdi.org/ws/dataset/search?query=human&amp;start=0&amp;size=50</a></p></li>
<li><p><a href="http://www.omicsdi.org/ws/dataset/search?query=human&amp;start=0&amp;size=20">http://www.omicsdi.org/ws/dataset/search?query=human&amp;start=0&amp;size=20</a></p></li>
</ul>
<h3 id="sort">Sort</h3>
<p>The result datasets can be sorted using the title, description, publication date, accession id and the relevance of the query term.</p>
<p>Examples:</p>
<ul>
<li><a href="http://www.omicsdi.org/ws/dataset/search?query=human&amp;sort_field=id">http://www.omicsdi.org/ws/dataset/search?query=human&amp;sort_field=id</a></li>
<li><a href="http://www.omicsdi.org/ws/dataset/search?query=human&amp;sort_field=publication_date">http://www.omicsdi.org/ws/dataset/search?query=human&amp;sort_field=publication_date</a></li>
</ul>
<h3 id="filtering">Filtering</h3>
<p>The API supports several filtering operations that complement the main <code>OmicsDI</code> search functionality.</p>
<p>Filtering by search term, there is 1 URL parameter: query</p>
<p>Examples</p>
<ul>
<li><p><a href="http://www.omicsdi.org/ws/dataset/search?query=human">http://www.omicsdi.org/ws/dataset/search?query=human</a></p></li>
<li><p><a href="http://www.omicsdi.org/ws/dataset/search?query=cancer">http://www.omicsdi.org/ws/dataset/search?query=cancer</a></p></li>
</ul>
<p><strong>Filtering by omics type</strong>:</p>
<p>The omics type can be specified by adding terms in the query url parameter with key: omics_type (possible values: Proteomics, Metabolomics, Genomics, Transcriptomics).</p>
<p>Examples:</p>
<ul>
<li><a href="http://www.omicsdi.org/ws/dataset/search?query=human%20AND%20omics_type:%22Proteomics%22">http://www.omicsdi.org/ws/dataset/search?query=human AND omics_type:&ldquo;Proteomics&rdquo;</a></li>
</ul>
<p><strong>Filtering by database</strong></p>
<p>The database can be specified by adding terms in the query URL parameter with key: repository (possible values: MassIVE, Metabolights, PeptideAtlas, PRIDE, GPMDB, EGA, Metabolights, Metabolomics Workbench, MetabolomeExpress, GNPS, ArrayExpress, ExpressionAtlas).</p>
<p>Examples:</p>
<ul>
<li><a href="http://www.omicsdi.org/ws/dataset/search?query=human%20AND%20repository:%22Metabolights%22">http://www.omicsdi.org/ws/dataset/search?query=human AND repository:&ldquo;Metabolights&rdquo;</a></li>
</ul>
<p><strong>Filtering by Organism</strong></p>
<p>The organism can be specified by adding terms in the query URL parameter with key: TAXONOMY (possible values must be the TAXONOMY id: 9606, 10090&hellip;).</p>
<p>Examples:</p>
<ul>
<li><a href="http://www.omicsdi.org/ws/dataset/search?query=human%20AND%20TAXONOMY:%229606%22">http://www.omicsdi.org/ws/dataset/search?query=human AND TAXONOMY:&ldquo;9606&rdquo;</a></li>
</ul>
<p><strong>Filtering by Tissue</strong></p>
<p>The tissue can be specified by adding terms in the query URL parameter with key: tissue (possible values: Liver, Cell culture, Brain, Lung&hellip;).</p>
<p>Examples:</p>
<ul>
<li><a href="http://www.omicsdi.org/ws/dataset/search?query=human%20AND%20tissue:%22Brain%22">http://www.omicsdi.org/ws/dataset/search?query=human AND tissue:&ldquo;Brain&rdquo;</a></li>
</ul>
<p><strong>Filtering by Disease</strong></p>
<p>The disease can be specified by adding terms in the query URL parameter with key: disease (possible values: Breast cancer, Lymphoma, Carcinoma, prostate adenocarcinoma&hellip;).</p>
<p>Examples</p>
<ul>
<li><a href="http://www.omicsdi.org/ws/dataset/search?query=human%20AND%20tissue:%22Breast%20cancer%22">http://www.omicsdi.org/ws/dataset/search?query=human AND tissue:&ldquo;Breast cancer&rdquo;</a></li>
</ul>
<p><strong>Filtering by Modification (in proteomics)</strong></p>
<p>The Modifications (in proteomics) can be specified by adding terms in the query URL parameter with key: disease (possible values: Deamidated residue, Deamidated, Monohydroxylated residue, Iodoacetamide derivatized residue&hellip;).</p>
<p>Examples:</p>
<ul>
<li><a href="http://www.omicsdi.org/ws/dataset/search?query=human%20AND%20modification:%22iodoacetamide%20derivatized%20residue%22">http://www.omicsdi.org/ws/dataset/search?query=human AND modification:&ldquo;iodoacetamide derivatized residue&rdquo;</a></li>
</ul>
<p><strong>Filtering by Instruments &amp; Platforms</strong></p>
<p>The Instruments &amp; Platforms can be specified by adding terms in the query URL parameter with key: instrument_platform (possible values: QSTAR, LTQ Orbitrap, Q Exactive, LTQ&hellip;).</p>
<p>Examples:</p>
<ul>
<li><a href="http://www.omicsdi.org/ws/dataset/search?query=human%20AND%20instrument_platform:%22Q%20Exactive%22">http://www.omicsdi.org/ws/dataset/search?query=human AND instrument_platform:&ldquo;Q Exactive&rdquo;</a></li>
</ul>
<p><strong>Filtering by Publication Date</strong></p>
<p>The Publication Date can be specified by adding terms in the query URL parameter with key: &ldquo;publication_date&rdquo; (possible values: 2015, 2014, 2013, 2014&hellip;).</p>
<p>Examples:</p>
<ul>
<li><a href="http://www.omicsdi.org/ws/dataset/search?query=human%20AND%20publication_date:%222015%22">http://www.omicsdi.org/ws/dataset/search?query=human AND publication_date:&ldquo;2015&rdquo;</a></li>
</ul>
<p><strong>Filtering by Technology Type</strong></p>
<p>The Technology Type can be specified by adding terms in the query URL parameter with key: &ldquo;technology_type&rdquo; (possible values: Mass Spectrometry, Bottom-up proteomics, Gel-based experiment, Shotgun proteomics&hellip;).</p>
<p>Examples:</p>
<ul>
<li><a href="http://www.omicsdi.org/ws/dataset/search?query=human%20AND%20technology_type:%22Mass%20Spectrometry%22">http://www.omicsdi.org/ws/dataset/search?query=human AND technology_type:&ldquo;Mass Spectrometry&rdquo;</a></li>
</ul>
<p><strong>Combined filters</strong></p>
<p>Any filters can be combined to narrow down the query using the AND operator. More logical operators will be supported in the future.</p>
<p>Examples:</p>
<ul>
<li><a href="http://www.omicsdi.org/ws/dataset/search?query=human%20AND%20technology_type:%22Shotgun%20proteomics%22%20and%20AND%20modification:%22monohydroxylated%20residue%22">http://www.omicsdi.org/ws/dataset/search?query=human AND technology_type:&ldquo;Shotgun proteomics&rdquo; and AND modification:&ldquo;monohydroxylated residue&rdquo;</a></li>
</ul>
</description>
</item>
<item>
<title>Claiming datasets in OmicsDI</title>
<link>https://omicsdi.github.io/post/claiming-datasests/</link>
<pubDate>Mon, 20 Feb 2017 14:04:54 +0000</pubDate>
<guid>https://omicsdi.github.io/post/claiming-datasests/</guid>
<description>
<p>One of the first request OmicsDI team (January/2017) received after the official release of the resource was the possibility to login into the system and associated to the user the related public datasets. The original request was informally made by Professor Rob Beynon of Liverpool University (@astacus) and replied by Laurent Gatto @lgatt0 .</p>
<p><blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">try <a href="https://twitter.com/OmicsDI">@OmicsDI</a>. That should do it, I believe</p>&mdash; Laurent Gⓐtt⓪ (@lgatt0) <a href="https://twitter.com/lgatt0/status/816199103495421952">January 3, 2017</a></blockquote>
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script></p>
<p>For more than 3 months, OmicsDI Team (www.omicsdi.org) has been working on this feature and we are proud to announce it formal release today. The user profile in OmicsDI follow has a simple aim:</p>
<blockquote>
<p>Provide a central resource where scientist can aggregate all their public omics datasets previously deposited omics archives and repositories.</p>
</blockquote>
<p>The OmicsDI Profile is built on two different components:</p>
<ul>
<li>The User Profile: General information about the User (Name, email, affiliation, Account connections).</li>
<li>My Datasets: List of public datasets. <a href="https://scholar.google.co.uk/intl/en/scholar/about.html">Similar to Google Scholar for publications</a></li>
</ul>
<h2 id="omicsdi-user-profile">OmicsDI User Profile.</h2>
<p>The user can create an OmicsDI account using five different accounts: <a href="https://orcid.org/">ORCID</a>, <a href="https://www.elixir-europe.org/">ELIXIR</a>, <a href="http://www.twitter.com">Twitter</a>, <a href="http://www.github.com">GitHub</a> and <a href="http://www.facebook.com">Facebook</a>.</p>
<figure >
<img src="https://omicsdi.github.io/media/login-option.png" />
<figcaption>
<h4>Figure 1: Login Button Home Page of OmicsDI</h4>
</figcaption>
</figure>
<p>When the user is created, an empty profile is generated (Figure 2). The Profile contains two main sections: <strong>Edit Profile</strong> and <strong>Edit Datasets</strong>.</p>
<figure >
<img src="https://omicsdi.github.io/media/profile-empty-page.png" />
<figcaption>
<h4>Figure 2: Profile Page in OmicsDI</h4>
</figcaption>
</figure>
<p>When a user logs into OmicsDI some general information is pulled from the source account (e.g. ORCID). However, this data is incomplete. Then, the first step is to <strong>Edit Profile</strong> where the user can update the: <code>photo</code>, <code>affiliation</code>, <code>email</code>, <code>short biography</code>, <code>Make my profile public</code>. The last option (Figure 3) enables the non-registered users to see your profile (datasets).</p>
<figure >
<img src="https://omicsdi.github.io/media/make-profile.png" />
<figcaption>
<h4>Figure 3: Make my profile public</h4>
</figcaption>
</figure>
<p>After the Profile is updated, the user can start adding (<code>claiming</code>) its datasets.</p>
<h2 id="my-datasets">My Datasets</h2>
<p>Users can use the OmicsDI <code>Search Box</code> (Figure 4) to search your datasets by using your name, last name, a title of your dataset or title of the publication related with your dataset.</p>
<figure >
<img src="https://omicsdi.github.io/media/search-datasets.png" />
<figcaption>
<h4>Figure 4: Search your datasets</h4>
</figcaption>
</figure>
<p>After clicking the dataset of interest a link in the dataset page allow you to claim and add to your profile the dataset.</p>
<figure >
<img src="https://omicsdi.github.io/media/claim-butoon.png" />
<figcaption>
<h4>Figure 5: Claim Datset Button</h4>
</figcaption>
</figure>
<p>Finally, a new button is shown (Figure 6) where the user can see their dataset added to the profile.</p>
<figure >
<img src="https://omicsdi.github.io/media/profile-butoon.png" />
<figcaption>
<h4>Figure 6: VIEW IN PROFILE</h4>
</figcaption>
</figure>
<p>Finally, all datasets are listed in your profile. The datasets can be removed from your Profile by using the <code>Edit Datasets</code> in your Profile.</p>
</description>
</item>
<item>
<title>Filtering search results</title>
<link>https://omicsdi.github.io/post/filter-results/</link>
<pubDate>Mon, 20 Feb 2017 14:04:54 +0000</pubDate>
<guid>https://omicsdi.github.io/post/filter-results/</guid>
<description>
<p>The search results can be filtered or refined using different <code>filters</code> or <code>terms</code> (Figure 1).
The OmicsDI web application supports at the moment nine different refinements: <code>Omics Type</code>, <code>repository/database</code>,
<code>Organisms</code>, <code>Tissue, diseases</code>, <code>Modifications (proteomics)</code>, <code>Instruments and platforms</code>, <code>Publication data</code>,
<code>Technology type</code>.</p>
<figure >
<img src="https://omicsdi.github.io/media/filtering-results.png" />
<figcaption>
<h4>Figure 1: Filtering results of Search in the Browse Page</h4>
</figcaption>
</figure>
<p></br></br></p>
<h2 id="filter-box">Filter Box</h2>
<figure class="left">
<img src="https://omicsdi.github.io/media/filter-box-typing.png" width="300" />
<figcaption>
<h4>Figure 2: Tissue Filter Box</h4>
</figcaption>
</figure>
<p>Each <code>Filter Box</code> shows the number of datasets within each category (e.g tissue type). The <code>user</code> can search in the <strong>textfield</strong>
for a certain category and the system will filter the categories by the keywords the user specifies. For example (<strong>Figure 2</strong>), if the user is interested
in <code>brain</code> tissue, then s/he can see all the tissues containing the keyword <code>brain</code>.</p>
Notice that most of the filters are free text based meaning that their values rely on the
annotations provided by the specific databases.
OmicsDI Team is always improving the automatic annotation system to move more attributes/properties of the dataset
from Free-Text to ontology-based values.
<p></br></br></p>
<h2 id="ranking-results">Ranking Results</h2>
<figure >
<img src="https://omicsdi.github.io/media/ranking.png" />
<figcaption>
<h4>Figure 3: Ranking the Search Results</h4>
</figcaption>
</figure>
<p>The final results of the search can be sorted by three different categories: <code>Accession</code>, <code>Relevance</code>, <code>Publication Date</code>. The
<code>Accession</code> is the accession of the datasets in the system; the <code>Relevance</code> is how close is the dataset to specific query; the
<code>Publication Date</code> sort the datasets by publication date.</p>
<p><strong>Relevance</strong></p>
The actual search is done via a call to Apache Lucene, which takes two arguments: the query and an upper bound on the number of hits (datasets) to return.
Lucene scoring uses a combination of the Vector Space Model (VSM) of Information Retrieval and the Boolean model to determine how relevant a given Document is to a User's query. In general, the idea behind the VSM is the more times a query term appears in a document relative to the number of times the term appears in all the documents in the collection, the more relevant that document is to the query. It uses the Boolean model to first narrow down the documents that need to be scored based on the use of boolean logic in the Query specification. Lucene also adds some capabilities and refinements onto this model to support boolean and fuzzy searching, but it essentially remains a VSM based system at the heart.
</description>
</item>
<item>
<title>Searching in OmicsDI</title>
<link>https://omicsdi.github.io/post/searching-in-omicsdi/</link>
<pubDate>Mon, 20 Feb 2017 10:02:07 +0000</pubDate>
<guid>https://omicsdi.github.io/post/searching-in-omicsdi/</guid>
<description>
<p>The <strong>main goal</strong> of the <a href="www.omicsdi.org">Omics Discovery Index</a> is to provide a platform for <code>searching</code> and <code>linking</code> omics public data.
OmicsDI has implemented a <strong>unique</strong> and <strong>novel</strong> <code>Search Engine</code> for omics datasets including public and protected data.</p>
<h2 id="the-omicsdi-search-box">The OmicsDI Search Box</h2>
<figure >
<img src="https://omicsdi.github.io/media/search-box.png" />
<figcaption>
<h4>Figure 1: OmicsDI Search Box</h4>
</figcaption>
</figure>
<p>The <code>OmicsDI Search Box</code> is the main component to searching in OmicsDI. The <code>user</code> can type a set of <strong>keywords</strong> that will enable the system
to find the datasets containing those keywords.</p>
If the user uses double quote "breast cancer" in their search the system will try to find the exact sentence in the datasets.
<p>The <code>OmicsDI Search Box</code> provides a unique <strong>auto-complete</strong> feature that enables <code>user</code> to select sentence after typing a subset of keywords. For example,
Figure 2 shows all sentences/phrases in OmicsDI containing the words <em>breast cancer</em>.</p>
<figure >
<img src="https://omicsdi.github.io/media/search-box-autocomplete.png" />
<figcaption>
<h4>Figure 2: OmicsDI Search Box with Auto-complete</h4>
</figcaption>
</figure>
<h2 id="query-syntax">Query Syntax</h2>
<p>When the user types any text in <code>OmicsDI Search Box</code>, the input is translated into an <a href="http://lucene.apache.org/">Apache Lucene query</a> that is then executed
to get the search results. The actual query executed is generated following the typical Apache Lucene query syntax in order to
provide a generic approach avoiding complex query rearrangements.</p>
<p>Multiple search terms separated by white spaces are combined by default in <code>AND</code> logic. Therefore an input text containing for example
<code>glutathione transferase</code> is treated as <code>glutathione AND transferase</code> and only entries having both terms will be found. The default order
of results is based on their relevance, i.e. the proximity of the terms in the entries.</p>
<p>Table 1: Overview of some useful query syntax elements is presented.</p>
<table>
<thead>
<tr>
<th><strong>Element</strong></th>
<th><strong>Meaning</strong></th>
<th><strong>Usage</strong></th>
<th><strong>Example</strong></th>
<th><strong>Notes</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td><code>AND</code></td>
<td>In addition to</td>
<td>term1 AND term2</td>
<td>glutathione AND transferase</td>
<td>Matches entries where both glutathione and transferase occur.</td>
</tr>
<tr>
<td><code>OR</code></td>
<td>Equivalence</td>
<td>term1 OR term2</td>
<td>glutathione OR transferase</td>
<td>Matches entries where either glutathione or transferase occur.</td>
</tr>
<tr>
<td><code>NOT</code></td>
<td>Exclusion</td>
<td>term1 NOT term2</td>
<td>coding NOT fragment</td>
<td>Matches entries containing coding but not fragment.</td>
</tr>
<tr>
<td><code>*</code></td>
<td>Wildcard</td>
<td>partialTerm*</td>
<td>gluta*</td>
<td>Matches for instance glutathione, glutamate, glutamic.</td>
</tr>
<tr>
<td>&rdquo; &ldquo;</td>
<td>Exact match</td>
<td>&ldquo;quoted text&rdquo;</td>
<td>&ldquo;x-ray diffraction&rdquo;</td>
<td>Exact matching for entries containing x-ray diffraction.</td>
</tr>
<tr>
<td><code>( )</code></td>
<td>Grouping</td>
<td>(text)</td>
<td>(reductase OR transferase) AND glutathione</td>
<td></td>
</tr>
<tr>
<td><code>Field:</code></td>
<td>Field-specific search</td>
<td>fieldId:term</td>
<td>description:dopamine</td>
<td>Matches for a field description containing dopamine.</td>
</tr>
</tbody>
</table>
<h4 id="escaping-special-characters">Escaping special characters</h4>
<p>The following characters within queries require to be escaped (using a &lsquo; \ &lsquo; before the character to escape) in order to be correctly interpreted:</p>
+ - & | ! ( ) { } [ ] ^ " ~ * ? : \ /
<p>Since Apache Lucene supports regular expression searches (matching a pattern between forwarding slashes) the forward slash &lsquo; / &rsquo; has become a special character to be escaped. For example to search for
<code>cancer/testis</code> use the query <code>cancer\/testis</code>. If special characters are not escaped the actual query performed may be different from what expected.</p>
<h4 id="query-examples">Query examples</h4>
<p>Following the aforementioned query syntax, users can easily search and filter results according to data content and characteristics.
A few examples of queries that can be performed using EBI Search are listed below.</p>
<ul>
<li>Search for <a href="http://www.omicsdi.org/search?q=insulin%20receptor">insulin receptor</a>.</li>
<li>Search Datasets that identified <a href="http://www.omicsdi.org/search?q=(UNIPROT:%20(%22P07900%22))">P07900</a></li>
</ul>
<h2 id="searching-using-biological-evidence">Searching using Biological Evidence</h2>
<p>The <code>OmicsDI Search Box</code> allows the end-users to search data using biological evidence such as the list of the proteins identified in the proteomics experiment or the metabolites
reported in the Metabolomics experiment. For example (Figure 3), if the user searches for <code>3-methyl-2-oxobutanoic</code> in the resource it will find one dataset in Metaboligths and five in Metabolome workbench
that identified the current molecule.</p>
<figure >
<img src="https://omicsdi.github.io/media/search-biological-evidences.png" />
<figcaption>
<h4>Figure 3: Search for Biological evidences 3-methyl-2-oxobutanoic</h4>
</figcaption>
</figure>
<p>The final search results are shown in the <a href="http://www.omicsdi.org/search?q=*:*">browser page</a> including <code>Refine Filters</code>. <a href='https://omicsdi.github.io/post/filter-results/'>Read More Here</a>.</p>
</description>
</item>
<item>
<title>Navigating Omics from the home page</title>
<link>https://omicsdi.github.io/post/navigation-from-home/</link>
<pubDate>Mon, 20 Feb 2017 01:17:55 +0000</pubDate>
<guid>https://omicsdi.github.io/post/navigation-from-home/</guid>
<description><p>The <a href="www.omicsdi.org">OmicsDI Home Page</a> provides different blocks for navigating through the datasets, some of them
are: <code>2D WordCloud</code>; the species/organism/diseases <code>Bubble Chart</code>, repository/omics <code>Bar Chart</code>, <code>Latest Datasets List</code>,
<code>Most Accessed Datasets</code> List, <code>Datasets per year</code> List. All the charts allow the user to search the data using
the specific attribute. These boxes also act as a statistic component of the resource: for example, the pie chart shows how many datasets
for each repository and omics the resource contains.</p>
<figure >
<img src="https://omicsdi.github.io/media/home-page.png" />
</figure>
<p>A <code>TagCloud</code> or <code>WordCloud</code> is a visual representation of metadata, typically used to depict keyword metadata (tags)
on datasets, or to visualize free form text. The <code>WordCloud</code> is built using the more frequent words for every
database/repository. The OmicsDI <code>WordCloud</code> can be considered as a two-dimensional term representation where the user can
select the database and the field they want to look for: description vs database. The user can click the highlighted word in
the wordcloud to search for this term in the resource.</p>
<p>The <code>Bubble Chart</code> block allows the users to navigate the data using three main categories: Tissues, Organisms,
and Diseases. The user can click on the bubble and s/he will be redirected to the search using the clicked term.</p>
<p>The Repo/Omics <code>Bar Chart</code> and the Omics vs Year bar chart allow the users navigate the data using the omics categories
(<strong>metabolomics</strong>, <strong>transcriptomics</strong>, <strong>proteomics</strong> and <strong>genomics</strong>). The <code>user</code> can click a bar or the pie and it will
be redirected to the search using the clicked term.</p>
<p>The Latest Datasets and Most accessed datasets blocks provide a list of the datasets by the two categories.</p>
</description>
</item>
<item>
<title>Welcome to OmicsDI Edu page</title>
<link>https://omicsdi.github.io/post/welcome-to-omicsdi/</link>
<pubDate>Fri, 17 Feb 2017 23:48:18 +0000</pubDate>
<guid>https://omicsdi.github.io/post/welcome-to-omicsdi/</guid>
<description>
<p><a href="http://www.omicsdi.org">Omics Discovery Index</a> is an integrated and open source platform
facilitating the access and dissemination of omics datasets. It provides a unique infrastructure to integrate
datasets coming from multiple omics studies, including at present <strong>proteomics</strong>, <strong>genomics</strong>, <strong>transcriptomics</strong> and
<strong>metabolomics</strong>.</p>
<p>OmicsDI stores metadata coming from the public datasets from every resource using an efficient
indexing system, which is able to integrate different biological entities including
genes, proteins and metabolites with the relevant life science literature. OmicsDI is updated daily, as new datasets get
publicly available in the contributing repositories.</p>
<h2 id="omics-data-submission">Omics Data Submission</h2>
<p>The increasing role of huge datasets in scientific research has important implications for the way the research
is conducted, for the way it should be organized and funded, and for the training of new researchers.
However, the advances in biomedical research depend on scientists’ ability to consult and use all available data,
independently from where they were originally produced: data sharing on a global scale is the best way to
‘advance science for the public good’.</p>
<p>The assumption underlying this policy is that the more scientists are allowed to access the same sets of data,
the more those data will be used to produce new knowledge about biological phenomena.</p>
<figure >
<img src="https://omicsdi.github.io/media/databases-workflow.png" />
<figcaption>
<h4>Figure 1: Current Schema of BioMedical Data Distribution/Dissemination </h4>
</figcaption>
</figure>
<p>Figure 1 shows how data is produced/stored and distributed in biomedical research. The result of an omics experiment (e.g Proteomics or Metabolomics)
is submitted to a public Archive (e.g PRIDE or Metaboligths). These <code>Data Archives</code> provide a common interface for <strong>submission</strong>,
<strong>validation</strong> and <strong>downloading</strong> of the original results/data. Importantly, each individual repository/archive define three major
components to guide the submission process:</p>
<ul>
<li>The metadata guidelines including Standards and Ontologies to define/describe a dataset and corresponding components (e.g. samples, instruments).</li>
<li>File formats to store and handle the underlying data in the Dataset.</li>
<li>The submission guidelines define how to submit and retrieve the data from the repository.</li>
</ul>
<p>In recent years the number of databases and archives has growth in all omics fields [1]. For example,
in Proteomics the results of a Mass spectrometry experiment can be submitted to four different databases members of
<a href="www.proteomexchange.org">ProteomeXchange</a>: PeptideAtlas/PASSEL, PRIDE, MassIVE, jPOST. In addition, each omics field
has developed and grown independently of the other fields including their metadata specifications, file formats, and submission
guidelines. For this reason, most of the <code>Data Archives</code> are field specific (e.g. Metabolomics - Metaboligths, Metabolomics Workbench).</p>
<h2 id="omics-data-dissemination">Omics Data Dissemination</h2>
<p>After the data is submitted to a formal <a href='#omics-data-submission'>Archive</a>, Knowledge Base Databases
(<code>BDs</code>) <strong>reuse</strong> part of the public data to respond to specific questions (e.g. Gene Expression Profiles - ExpressionAtlas). The
number of these <code>DBs</code> has growth in recent years. For example, Table 1 shows the list of Protein Expression Databases [1] that include
peptide sequences, post-translational modifications, expression profiles.</p>
<p><strong>Table 1</strong>: Proteomics Knowledge Base Databases (<code>BDs</code>) including information about Protein Expression informations (e.g
Peptides Sequences, Post-Translational Modifications).</p>
<table>
<thead>
<tr>
<th><strong>Resource</strong></th>
<th><strong>URL</strong></th>
<th><strong>Publication</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>Cancer Mutant Proteome Database</td>
<td><a href="http://cgbc.cgu.edu.tw/cmpd/">http://cgbc.cgu.edu.tw/cmpd/</a></td>
<td>Nucleic Acids Res. 2015.</td>
</tr>
<tr>
<td>MOPED</td>
<td><a href="https://www.proteinspire.org/MOPED/">https://www.proteinspire.org/MOPED/</a></td>
<td>Nucleic Acids Res. 2015.</td>
</tr>
<tr>
<td>ProteomicsDB</td>
<td><a href="https://www.proteomicsdb.org/">https://www.proteomicsdb.org/</a></td>
<td>Nature. 2014</td>
</tr>
<tr>
<td>MaxQB</td>
<td><a href="http://maxqb.biochem.mpg.de/mxdb/">http://maxqb.biochem.mpg.de/mxdb/</a></td>
<td>Mol Cell Proteomics. 2012</td>
</tr>
<tr>
<td>GPMDB</td>
<td><a href="http://gpmdb.thegpm.org/">http://gpmdb.thegpm.org/</a></td>
<td></td>
</tr>
<tr>
<td>COPaKB</td>
<td><a href="http://www.heartproteome.org/copa/Default.aspx">http://www.heartproteome.org/copa/Default.aspx</a></td>
<td>Circ Res. 2013</td>
</tr>
<tr>
<td>paxDB</td>
<td><a href="http://pax-db.org/#!home">http://pax-db.org/#!home</a></td>
<td>Mol Cell Proteomics. 2012</td>
</tr>
<tr>
<td>Human Proteinpedia</td>
<td><a href="http://www.humanproteinpedia.org/">http://www.humanproteinpedia.org/</a></td>
<td>Curr Protoc Bioinformatics. 2013</td>
</tr>
<tr>
<td>Human Proteome Map</td>
<td><a href="http://www.humanproteomemap.org/">http://www.humanproteomemap.org/</a></td>
<td>Nature. 2014</td>
</tr>
</tbody>
</table>
<h2 id="omicsdi-vision">OmicsDI Vision</h2>
<figure class="left">
<img src="https://omicsdi.github.io/media/vision.png" width="400" />
</figure>
<p>All these databases/repositories have created a complex and distributed scenario where the data can
be submitted into different <code>Archives</code> and reused in multiple and different <code>DBs</code>. The development of tools
facilitating <strong>data sharing</strong> and able to handle this complexity is a great challenge in itself.</p>
<p>In this context, we introduce here the <a href="http://www.omicsdi.org">Omics Discovery Index</a>, an open-source platform facilitating
the access, discovery, and dissemination of omics datasets. OmicsDI provides a unique infrastructure to integrate datasets
coming from multiple omics fields, including at present proteomics, genomics, metabolomics, and transcriptomics.</p>
<p>To date, <a href="www.omicsdi.org/databases/">eleven resources</a> have agreed on a common metadata structure framework and exchange format,
and have contributed to OmicsDI, including:</p>
<ul>
<li><code>Proteomics</code>: The PRoteomics IDEntifications (PRIDE) database, PeptideAtlas, the Mass spectrometry Interactive Virtual Environment (MassIVE)
and the Global Proteome Machine Database (GPMDB).</li>
<li><code>Metabolomics</code>: MetaboLights, the Global Natural Products Social Molecular Networking project (GNPS),
MetabolomeExpress, and the Metabolomics Workbench.</li>
<li><code>The major European Genome-Phenome Archive (EGA)</code>.</li>
<li><code>Transcriptomics</code>: ArrayExpress and Expression Atlas.</li>
</ul>
<p>OmicsDI stores biological and technical metadata coming from the public datasets available in every resource,
using an efficient indexing system, which is able to integrate differently
biological entities including <code>genes</code>, <code>transcripts</code>, <code>proteins</code> and <code>metabolites</code> with the relevant scientific literature.</p>
<p><strong>References</strong></p>
<ul>
<li>[1] Perez‐Riverol, Yasset, et al. &ldquo;Making proteomics data accessible and reusable: current state of proteomics databases and repositories.&rdquo;
Proteomics 15.5-6 (2015): 930-950.</li>
</ul>
</description>
</item>
<item>
<title>OmicsDI major partners</title>
<link>https://omicsdi.github.io/post/partners/</link>
<pubDate>Fri, 17 Feb 2017 23:09:18 +0000</pubDate>
<guid>https://omicsdi.github.io/post/partners/</guid>
<description>
<p>OmicsDI has been built with the collaboration of multiple consortia and individual databases. This collaboration has enabled
the standardization of the metadata across multiple resources and omics type. Each consortium group a set of databases around
the same topic (e.g. proteomics) and has previously agree in a common metadata including Ontology Terms, Study Design, etc.
At the same time, OmicsDI has collaborated with other individual archives and databases such as ArrayExpress or EGA.</p>
<h2 id="proteomexchange">ProteomeXchange</h2>
<figure >
<img src="https://omicsdi.github.io/media/PX_logo.png" />
</figure>
<p>The <a href="wwww.proteomexchange.org">ProteomeXchange Consortium</a> is a collaboration of currently four major mass spectrometry
proteomics data repositories, <a href="www.ebi.ac.uk/pride/archive">PRIDE</a> at EMBL-EBI in Cambridge (UK), <a href="www.peptideatlas.org">PeptideAtlas</a>
at ISB in Seattle (US), and <a href="massive.ucsd.edu">MASSive</a> at UCSD (US) and <a href="http://jpostdb.org/">jPOST</a>
offering a unified data deposition and discovery strategy across all three repositories. ProteomeXchange is a
distributed database infrastructure; the potentially very large raw data component of the data is only held at
the original submission database, while the searchable metadata is centrally collected and indexed.
All ProteomeXchange data is fully open after the release of the associated publication.</p>
<h2 id="metabolomexchange">MetabolomeXchange</h2>
<figure >
<img src="https://omicsdi.github.io/media/metabolomexchange.png" />
</figure>
<p><a href="http://www.metabolomexchange.org/site/">MetabolomeXchange</a> is a collaboration of 4 major metabolomics repositories,
with a total of 10 partners contributing. MetabolomeXchange was inspired by and is implementing similar coordination
strategies to ProteomeXchange. The founding partners are <a href="www.ebi.ac.uk/metabolights/">MetaboLights</a> at EMBL-EBI(UK),
Metabolomics Repository Bordeaux(FR), Golm Metabolome Database and the Metabolomics Workbench (US).
The <a href="metabolomicsworkbench.org/">Metabolomics Workbench</a> is an NIH-funded collaboration of 6 Regional
Comprehensive Metabolomics Resource Cores.</p>
<h2 id="the-european-genome-phenome-archive">The European Genome-Phenome Archive</h2>
<figure >
<img src="https://omicsdi.github.io/media/ega_logo.png" />
</figure>
<p>The <a href="https://www.ebi.ac.uk/ega/home">European Genome-Phenome Archive</a> (EGA) provides a service for the permanent archiving and distribution of
personally identifiable genetic and phenotypic data resulting from biomedical research projects. Strict protocols govern how information is managed, stored and
distributed by the EGA project. The EGA comprises a public metadata section, allowing searching and identifying
relevant studies, and the controlled access data section. Access to the data section for a particular study is only
granted after validation of a research proposal through the relevant ethics approval.</p>
</description>
</item>
</channel>
</rss>