-
Notifications
You must be signed in to change notification settings - Fork 1
/
NEWS
917 lines (608 loc) · 30.6 KB
/
NEWS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
CHANGES IN VERSION 1.9.2
--------------------------
BUG FIXES:
o Remove function overlap_GO because the VennDiagram package has issues
with condition tests on vectors with length greater than 1
CHANGES IN VERSION 1.9.1
--------------------------
BUG FIXES:
o Remove duplicated plot.title argument in ggplot call.
GENERAL UPDATES:
o Minor code cleaning.
CHANGES IN VERSION 1.7.1
--------------------------
GENERAL UPDATES:
o Updated email address.
CHANGES IN VERSION 1.5.5
--------------------------
BUG FIX:
o Function subEset also removes empty levels in factors not mentionned in
the subset argument. Necessary for randomForest as the factor of interest
cannot have empty levels.
CHANGES IN VERSION 1.5.4
--------------------------
NEW FEATURES:
o plot_design supports subset argument to restrict the plot to certain
subsets of samples.
CHANGES IN VERSION 1.5.3
--------------------------
GENERAL UPDATES:
o Clarify that GOexpress in not a conventional gene set enrichment
analysis tool. It is based on ranking, not enrichment.
CHANGES IN VERSION 1.5.2
--------------------------
BUG FIX:
o Rename "namespace" column to "namespace_1003"", not "namespace_1006".
CHANGES IN VERSION 1.5.1
--------------------------
GENERAL UPDATES:
o Updated maintainer email address.
CHANGES IN VERSION 1.3.2
--------------------------
BUG FIX:
o expression_plot() functions return error if an empty gene name is given
CHANGES IN VERSION 1.3.1
--------------------------
BUG FIX:
o heatmap_GO() function does not automatically resize margins when using
either gene names or gene identifiers. This caused the user-defined
margins to be systematically ignored.
CHANGES IN VERSION 1.1.12
--------------------------
NEW FEATURES:
o Summarisation function of scores and rank from feature-level to
ontology-level also affects the calculation of P-values by pValue_GO. The
default behaviour is to use the same function as used in the call to
GO_analyse, stored in the result object.
CHANGES IN VERSION 1.1.11
--------------------------
NEW FEATURES:
o Summarisation function of scores and rank from feature-level to
ontology-level can be overriden from the default "mean" to any function
specified by the user. The function need to support a list of numeric
values as an input, and return a unique numeric value as an output.
CHANGES IN VERSION 1.1.10
--------------------------
BUG FIX:
o margins argument in heatmap_GO() was not used in the call to heatmap.2()
CHANGES IN VERSION 1.1.9
--------------------------
TYPOS:
o Missing space character in error message.
CHANGES IN VERSION 1.1.8
--------------------------
TYPOS:
o Simone Coughlan, not Simone Coughland. Apologies!
CHANGES IN VERSION 1.1.7
--------------------------
BUG FIXES:
o Changed the rank of annotated features absent from the given
ExpressionSet to (number of features in ExpressionSet + 1) instead of
max(rank) + 1 to match the documentation, and desired behaviour. This
follows the logic of giving the same rank R to N tied-scored features,
while the next feature receives a rank of R + N.
CHANGES IN VERSION 1.1.6
--------------------------
BUG FIXES:
o overlap_GO() was crashing for 3-group Venn diagrams, except if the
VennDiagram was loaded manually loaded in the workspace using
libray(VennDiagram). The function will now run seemlessly without that
manual step, as loading GOexpress will immediately load VennDiagram
in the workspace (stated as a dependency in the DESCRIPTION file).
NEW FEATURES:
o New function pValue_GO() allows calculation of P-value for each
ontology using permutation of genes labels. This allows users to estimate
the chance of seeing a GO term reach a particular rank (or score).
Features a fancy progres bar shamelessly adapted from StackOverflow.
o heatmap_GO now semi-autmoatically resizes the bottom and right margins
to accomodate large gene and sample labels, respectively. The user may
control those margins using the "margins" argument of the function.
o heatmap_GO default call now shows the gene feature identifier for those
missing an annotated gene name, when gene names are requested (also
the default).
o A rank.by slot is now created by the GO_analyse() function in the result
object to state the metric used to order the result tables.
o a filters.GO slot stating the filters and cutoffs applied to the result
object is now created or updated by successive uses of the subset_scores()
function. Warnings and notes are displayed if conflicting filters and
cutoffs are applied on a previously filtered result object.
o rerank() function now supports re-ordering by P-value. Note that this
is only applicable to the output of the pValue_GO() function mentioned
above.
o rerank() function now updates the rank.by slot of the result object
to state the current ordering metric.
o subset_scores() function now allows filtering by P-value. Note that this
is only applicable to the output of the pValue_GO() function mentioned
above.
o Backward compatibility with Ensembl annotation releases 75 and earlier,
which used 'external_gene_id', which was renamed to 'external_gene_name'
in releases 76 and later.
o table_genes() function defaults to sorting genes by decreasing score
(equivalent to increasing rank). Gene feature name or identifier are
supported alternative filters for sorting.
UPDATED FEATURES:
o Allow user to override row_labels in heatmap_GO. This way, the
color-coding of the sample can be kept, while better description of the
samples can be used to label them, instead of the phenodata values.
o In heatmap_GO(), if the labRow argument is of length 1, it is assumed to
be the name of a column in the phenoData slot. Useful to re-label
subsetted ExpressionSet objects.
GENERAL UPDATES:
o Updated the AlvMac training dataset to include 'RPL36A' an example
of multiple Ensembl gene identifier annotated to the same gene name.
o Updated the AlvMac example custom annotations to match the updated
dataset.
o Updated the example AlvMac_results to match the updated dataset.
o Set the random seed prior to running the GO_analyse() example in
the vignette. Hopefully, this should allow reproducible testing by the
users.
o In User Guide, load package before loading the attached data.
o In User Guide, new sections and examples dealing with the re-labelling
of heatmap samples, the use of P-values, the re-ranking and subsetting
of results using P-values. New sub-sections for clarity. Emphasis on
the use and generation of local annotation, rather than use of current
online Ensembl annotation release.
o No more code connecting to the Ensembl server in any the help files
and User Guide.
o Help pages examples with more consistent indentation of code.
CHANGES IN VERSION 1.1.5
--------------------------
BUG FIXES:
o Forgot to commit image file of shiny screenshot in release 1.1.5
CHANGES IN VERSION 1.1.4
--------------------------
NEW FEATURES:
o Custom annotations can be provided to the GO_analyse function using
three new arguments: "GO_genes", "all_GO", and "all_genes". See below for
individual description.
o The GO_genes argument allows the user to provide associations between
feature identifiers in the ExpressionSet and gene ontology identifiers.
This will skip all calls to the Ensembl BioMart server. Consequently, we
recommend the use of the "all_GO" and "all_genes" arguments whenever
"GO_genes" is used. Otherwise, some downstream functions may not work.
For instance, the expression_plot, expression_profiles, and heatmap_GO
function can be used to generate plots,although lacking gene and gene
ontology names. See below for more details.
o The all_GO argument allows the user to provide the name and namespace
("biological_process", "molecular_function", or "cellular_component")
corresponding to gene ontology identifiers. This enables subsequent
filtering of result tables by namespace, and annotation of heatmaps with
the name of the gene ontology.
o The all_genes argument allows the user to provide the name and
description corresponding to feature identifiers in the ExpressionSet.
This enables annotation of expression plots with the gene name associated
with the feature.
UPDATED FEATURES:
o In the GO_analyse function, the eSet argument is now formally checked
to be of class ExpressionSet prior to any calculation. If not, the
function returns an appropriate error message.
o The error messages caused when the user gives a name that does not exist
in the phenoData slot of the ExpressionSet were updated to use the word
"column", rather than "factor".
GENERAL UPDATES:
o Updated the help page for function GO_analyse to describe the new
features described above and provide a code example.
o Used the BiocStyle package to format the vignette.
o Added a new section in the vignette to document the new features
described above.
o Added a new section in the vignette to mention the creation of shiny
applications using the output of GOexpress. Included a screenshot of
a shiny application developed from the original AlvMac dataset.
o Added a new section in the vignette to highlight the availability of
a "subset" argument to avoid the need for additional ExpressionSet objects
to analyse or visualise subsets of samples.
CHANGES IN VERSION 1.1.3
--------------------------
NEW FEATURES:
o Included private function used to generate the prefix2dataset table.
o Included private function used to generate the microarray2dataset table.
o GO_analyse states the number of gene features in the given dataset that
were found in the BioMart dataset. This allows the user to interrupt the
script if a suspiciously low number of mapped features suggests an
incorrect BioMart dataset was used.
UPDATED FEATURES:
o Updated the prefix2dataset table. Three more species
("Chlorocebus sabaeus", "Papio anubis" , and "Poecilia formosa"), and two
more columns ("species" and "sample").
o Updated the microarray2dataset table. Probeset patterns are now hard
coded in the package, but dynamically defined as unique to a platform
(or not) by the function building the microarray2dataset table. Species
without microarray platforms were also included in the table with NA
values. Code was added in the GO_analyse method to ignore those before
trying to resolve the origin of the expression data, if not specified
by the user.
o Renamed column "prefix" to "pattern" in microarray2dataset table.
o Disabled manual check of C. elegans and D. melanogaster, as the
automated detection is doing the exact same thing. Only S. cerevisiae
requires a manual check of the "Y" prefix instead of the automatically
extracted "Y[:LETTERS:]{2}" pattern.
o Methods expression_plot_symbol and expression_profiles_symbol now
return the list of gene feature identifiers instead of NULL when multiple
plots are produced. That may be confusing as they return the list of close
matches when an invalid gene symbol was given, and they return the ggplot
when obly one plot is produced.
GENERAL UPDATES:
o Code layout. Lines of code over 80 characters were split around
brackets and commas to use the built-in indentation defaulted to 4 space
characters. Plus, this made the code more readable in many cases.
CHANGES IN VERSION 1.1.2
--------------------------
GENERAL UPDATES:
o DESCRIPTION file minor correction: Inappropriate "Metagenomics"
biocView removed.
CHANGES IN VERSION 1.1.1
--------------------------
BUG FIXES:
o Ensembl BioMart has changed the column 'external_gene_id' to
'external_gene_name'. Renamed my biomaRt queries accordingly to prevent
GO_analyse() from crashing during the analysis step.
o Updated the AlvMac_results variable containing sample results annotated
with the deprecated 'external_gene_id'. It now includes the new
'external_gene_name'.
GENERAL UPDATES:
o DESCRIPTION file minor correction.
CHANGES IN VERSION 0.99.17
--------------------------
BUG FIXES
o NAs were introduced in the average score and rank of GO terms following
analysis of microarray data (ANOVA and randomForest) for GO terms without
associated features in the ExpressionSet. The problem was not found in any
case for analysis of RNA-Seq data. This was causing issues during the
subset_scores function and subsequent plots, such as main titles made of
multiple lines and NAs. GO terms without associated probesets are now
given average score of 0 and average rank equal to the maximum in the
dataset plus 1.
CHANGES IN VERSION 0.99.16
--------------------------
GENERAL UPDATES
o Included co-authors who participated in the generation and analysis of
data used to test the package.
CHANGES IN VERSION 0.99.15
--------------------------
GENERAL UPDATES
o Updated man pages GO_analyse to describe the recently added subset
slot in the result variable.
CHANGES IN VERSION 0.99.14
--------------------------
BUG FIXES
o The anova method of the GO_analyse method was broken since the
introduction of ExpressionSet, release 0.99.4.
UPDATED FEATURES
o The subset argument was also added to the GO_analyse method. This allows
the identification of genes clustering a subset of groups, at a subset of
time-points, etc. while plotting the expression profile of the entire
dataset, if desired.
GENERAL UPDATES
o Updated man pages GO_analyse to allow only one example to be run. This
saves time during CMD check, while making the man page more easily
readable.
CHANGES IN VERSION 0.99.13
--------------------------
UPDATED FEATURES
o All expression plots were updated to a default ylim range corresponding
to the minimum and maximum expression values found in the entire
ExpressionSet. This is meant to avoid mis-interpretation of the amplitude
of variation between sample groups, as suggested in the paper:
Rougier, N.P., Droettboom, M., and Bourne, P.E. (2014). Ten simple rules
for better figures. PLoS computational biology 10, e1003833.
CHANGES IN VERSION 0.99.12
--------------------------
UPDATED FEATURES
o All expression plots were given new arguments for more control: xlab
allows users to change the default title for the X-axis, ylim allows users
to override the lower and upper boundaries of the Y axis.
GENERAL UPDATES
o Updated man pages AlvMac, AlvMac_results, microarray2dataset, and
prefix2dataset. Replaced the LaTeX describe statement by an itemise
statement to make the man page more readable in a terminal window.
o Moved functions between the post_analysis script and the toolkit
script. From now on, only functions accessible to the users should be
present in the post_analysis script, while toolkit should contain
method called internally.
o The User's Guide was updated to redirect the users to the new support
site for Bioconductor rather than the bioc-devel mailing list.
CHANGES IN VERSION 0.99.11
--------------------------
NEW FEATURES
o The new subEset() method subsets an ExpressionSet given a list where
item names are column names of the phenoData slots and item values are
vectors of values corresponding to sample to retain (e.g.
list(Time=c("2H", "6H")) will retain samples with value "2H" or "6H" in
the "Time" column of the phenoData slot).
UPDATED FEATURES
o All relevant visualisation methods have been added an argument to
subset samples to plot to those with a given set of values for a given
column in the phenoData (uses the new function subEset described above).
o The example analysis results was renamed from "raw_results" to
"AlvMac_results", so that it can now loaded running data(AlvMac_results).
The new name is meant to be more specific to the package.
CHANGES IN VERSION 0.99.10
--------------------------
GENERAL UPDATES
o Replaced \format by \value sections to the following man pages:
AlvMac.Rd, microarray2dataset.Rd, prefix2dataset.Rd, raw_results.Rd
to avoid having an empty value section, as recommended by the Bioconductor
package tracker. Hopefully, it does not require a non-empty \format
section as well...
o Restricted lines to less than 80 characters and indentation by multiple
of four space characters.
CHANGES IN VERSION 0.99.9
-------------------------
UPDATED FEATURES
o all post analysis functions were given more sanity checks to verify
that the "result" argument contains the required slots of a GO_analyse()
output and the the arguments pointing at a phenotypic data column are
valid column names.
BUG FIXES
o expression_plot_symbol and expression_profiles symbol used the default
value of col.palette and colourF instead of forwarding the user-defined
one to the expression_plot and expression_profiles functions.
GENERAL UPDATES
o Added cross-reference in UsersGuide to point at examples of usages of
factor and numeric values for the expression plots
CHANGES IN VERSION 0.99.8
-------------------------
GENERAL UPDATES
o Resaved R data files to reduced package disk size. No more WARNING in
R CMD check.
GENERAL UPDATES
o Removed reference to GitHub in the README file. The weblink is given
in the DESCRIPTION file anyway if users are interested.
CHANGES IN VERSION 0.99.7
--------------------------
UPDATED FEATURES
o fixed a typo in the code of heatmap_GO which made it crash for any other
dataset than the example dataset.
CHANGES IN VERSION 0.99.6
-------------------------
UPDATED FEATURES
o expression_profiles_symbol() method was missing the "index" argument to
select the feature identifier to plot alone.
CHANGES IN VERSION 0.99.5
-------------------------
NEW FEATURES
o expression_profiles() method plots the expression profiles of individual
samples series, as opposed to grouped samples series handled by
expression_plot functions.
o expression_profiles_symbol() method plots the expression profiles of
individual samples series using a gene name instead of an Ensembl gene
identifier.
UPDATED FEATURES
o overlap_GO can print to screen, if filename argument is set to NULL
(Default).
o heatmap_GO, cluter_GO and plot_design can resize title font and wrap the
text on multiple lines.
o expression_plot and expression_plot_symbol can orient X axis labels at
a given angle.
o replaced return(NULL) statement by stop() when no close match is found
to a gene name in the family of expression_plot functions.
GENERAL UPDATES
o User's Guide updated.
o List of contributors updated in User's Guide and DESCRIPTION.
CHANGES IN VERSION 0.99.4
-------------------------
UPDATED FEATURES
o Use of ExpressionSet instead of numeric named matrix and
AnnotatedDataFrame. Better consistency with other Bioconductor packages.
GENERAL UPDATES
o Implemented corrections requested following the Bioconductor review.
Includes typos, consistent terminology through the package code and
metafiles, additional information in help files, no reference to GitHub
as an alternate installation option, use of arrow signs instead of equal
signs for value assignment.
o Restricted lines to 80 characters, and used 4-space tabulations.
o Corrected out-of-date documentation.
CHANGES IN VERSION 0.99.3
-------------------------
UPDATED FEATURES
o Control the size of the legend text in the two expression plot figures.
Updated help files accordingly.
o Updated vignette with new section "Statistics".
o Complete cleaning of code files for lines shorter than 80 columns.
o Cleanup of help files for lines shorter than 80 columns.
o Enabled filtering of raw results on the average score of a GO term.
CHANGES IN VERSION 0.99.2
-------------------------
UPDATED FEATURES
o Metadata lines in the preamble of the Sweave file
CHANGES IN VERSION 0.99.1:
UPDATED FEATURES
o Sweave vignette implemented.
o Replaced all message() statements by cat() to make Sweave
output the full message in the vignette.
o Updated a missed F into FALSE
o Updated an invalid biocViews (typo)
o Date field added for a proper citation() method.
CHANGES IN VERSION 0.99.0
-------------------------
UPDATED FEATURES
o Replaced all cat() statements by message() to match the
Biocondcutor guidelines.
CHANGES IN VERSION 0.6.2
------------------------
UPDATED FEATURES
o Fixed typo in the example of the vignette of GO_anova().
o Advice to use the subset_scores() function in the GO_anova()
vignette.
o Helpful warning message in the vignette of GO_anova() to make
sure that the experimental factor to analyse is actually formatted
as a factor in the true R language meaning.
o Updated license to GPL (>= 3) instead of GPL-2.
CHANGES IN VERSION 0.6.1
------------------------
NEW FEATURES
o GOexpress now implements the randomForest framework to
estimate the importance of each gene on the clustering of the samples
according to the different levels of the desired factor (See section
UPDATED FEATURES below for more details).
o rerank() function return a re-ordered version of the result
variable given, ordered using either the rank or the score metrics.
o Sample data raw_result was added to provide an example output
of the analysis, and also allow to test the visualisation functions
on a readily available variable instead of having to run the analysis
function everytime (would have killed the R CMD check duration).
UPDATED FEATURES
o GO_analyse() default has been changed to a random forest
statistical framework. This framework has various advantages over the
previous ANOVA approach; namely 1) the bootstrapping and re-sampling
of gene subsets to measure the importance of each gene in predicting
the desired factor over many iterations (Default: 1,000 trees), 2)
a shorter analysis duration due to the Fortran code underlying the
randomForest package, 3) randomForest does not make assumptions on the
distribution of the expression level within and between genes, 4)
randomForest deals implicitely with interactions in multifactorial
experiments.
o Genes are now initially ranked according to their rank and GO terms
according to the average rank of their associated genes. Genes
associated with a GO term but absent from the dataset are assigned a
value of max(rank)+1 and a score of 0. Ties between genes are resolved
by giving all genes the minimal rank, the next gene being given
rank+length(tied_genes). The average_rank metric is expected to be
more robust than the average "score". Depending on the statistical
framework used, "score" may mean "importance" (randomForest) or
"F.value" (ANOVA).
o More sanity checks at the start of some methods to ensure smooth
execution of the downstream code and more helpful error messages.
o More synonyms allowed for subsetting and filtering of the result
variable.
o list_genes() and table_genes() methods now allow to chose whether
to return all feature identifiers associated with a GO term, or only
those also present in the expression dataset.
CHANGES IN VERSION 0.5.5
------------------------
UPDATED FEATURES
o heatmap_GO() default colorscale has been changed to
blue-white-red better suited to the representation of expression
level. The previous colorscale green-black-red is now suggested if
differential expression data is used (e.g. log2FC).
CHANGES IN VERSION 0.5.4
------------------------
NEW FEATURES
o Sample data accessible with the data() method.
CHANGES IN VERSION 0.5.3
------------------------
NEW FEATURES
o Method overlap_GO() produces a Venn diagram showing the overlap
of gene sets associated with two to five GO terms. The Venn diagram
is saved in a TIFF image file.
CHANGES IN VERSION 0.5.2
------------------------
UPDATED FEATURES
o Method heatmap_GO() updated to color-code the samples by levels
of a factor, and a better colormap is used to represent the level of
expression of genes.
CHANGES IN VERSION 0.5.1
------------------------
UPDATED FEATURES
o Package renamed GOexpress after verifying that no other package
on Bioconductor uses this name.
o Method GO_anova() renamed to GO_analyse() as other metrics than
ANOVA may be implemented in the future.
CHANGES IN VERSION 0.4.1
------------------------
NEW FEATURES
o Supports microarray probeset identifiers as gene identifiers
in the expression dataset.
o Argument "result=" is no more optional in all the post-analysis
functions.
UPDATED FEATURES
o expression_ploto() functions now dynamically adapt the colormap
to the number of groups of samples instead of three hard-coded colors
used for the colormap.
CHANGES IN VERSION 0.3.1
------------------------
NEW FEATURES
o expression_plot() plots the expression profile of the gene
corresponding to an Ensembl identifier, given valid variable name
for the X-axis and a grouping factor for the Y-axis.
o expression_plot_symbol() plots the expression profile of the
gene(s) with the Ensembl identifier(s) corresponding to a gene
symbol, given valid variable name for the X-axis and a grouping
factor for the Y-axis.
o plot_design() plots the univariate effect of each level of each
factor available in the AnnotatedDataFrame on the expression levels
of genes associated with a GO term.
UPDATED FEATURES
o The scoring function now uses the total count of genes associated
with a GOterm instead of the count of genes in the dataset and
associated with the GO term. Genes absent from the expression dataset
are assigned a F.value of 0, similarly to the genes with a non-
significant ANOVA result. This was not found to significantly impact
the ranking of GO terms, but is a more objective scoring of the
GO terms if the expression data was generated through objective
filtering (e.g. filtering out lowly expressed genes).
o subset_scores() now also filters the $mapping and $anova slots
to retain only mappings and gene information related to the GO terms
left in the $scores slot after filtering. This reduces significantly
the memory space of the filtered variable, while the object containing
the raw results of GO_anova() will still contain the full information
fetched from the Ensembl BioMart server if the user wants it.
o ggplot2 and grid are now required for the proper installation of
anovaGO, even though these packages are only required for a subset
of non-essential features in anovaGO. This way, all features in
anovaGO will be available as soon as the package is installed.
o Messages printed to the user during the executing of the funcions
were made as clear as possible, particularly when fetching information
from the Ensembl BioMart server.
CHANGES IN VERSION 0.2
----------------------
UPDATED FEATURES
o subset_scores() can now filter and keep only GO terms of a given
type, i.e. "Biological process", "Molecular function", or "Cellular
Component".
CHANGES IN VERSION 0.1
----------------------
OVERVIEW
o This package was designed for the analysis of bioinformatics-related
data based on gene expression measurements. It requires 3 input
values: (1) a gene-by-sample table providing the expression level
of genes (rows) in each sample (columns), (2) an AnnotatedDataFrame
from the Biobase package providing phenotypic information about the
samples grouping them by levels of the factor, (3) the name of the
grouping factor to investigate, which must be a valid column name
in the AnnotatedDataFrame.
o The analysis will identify all Gene Ontology (GO) terms represented
by at least one gene in the expression dataset. A one-way ANOVA
will be performed on the grouping factor for each gene present in
the expression dataset. Following multiple-testing correction, genes
below the threshold for significance will be assigned an F.value of
0. GO terms will be scored and ranked on the average F.value of
associated genes.
o Functions are provided to investigate and visualise the results of
the above analysis. The score table can be filtered for rows over
given thresholds. The distribution of scores can be visualised. The
quantiles of scores can be obtained. The genes associated with a
given GO term can be listed, with or without descriptive information.
Hierarchical clustering of the samples can be performed based on the
expression levels of genes associated with a given GO term. Heatmaps
accompanied by hierarchical clustering of samples and genes can be
drawn and customised.
FEATURES
o GO_anova() scores all Gene Ontology (GO) terms represented in
the dataset based on the ability of their associated genes to cluster
samples according to a predefined grouping factor. It also returns
the table used to map genes to GO terms, the table summarising the
one-way ANOVA results for each gene, and finally the predefined
grouping factor used for ANOVA. Genes annotated to a GO term but
absent from the expression dataset are ignored.
o get_mart_dataset() returns a connection to the appropriate BioMart
dataset of the Ensembl server based on the gene name of the first gene in
the expression dataset. The choice of the dataset can be overriden by the
user if a valid BioMart Ensembl dataset is specified.
o subset_scores() filters the table of scored GO terms in the
output of GO_anova() and returns a list formatted identically to the
output of GO_anova() with the resulting filtered table of scores.
o hist_scores() plots the distribution of average F scores in the
output of GO_anova() or subset_scores().
o quantiles_scores() returns the quantile values corresponding
to defined percentiles.
o list_genes() returns the list of Ensembl gene identifiers
associated with a given GO term.
o table_genes() returns a table of information about the Ensembl
gene identifiers associated with a given GO term.
o cluster_GO() plots a hierarchical clustering of the samples
based on the expression levels of genes associated with a given
GO term.
o heatmap_GO() plots a heatmap with hierarchical clustering of
the samples and genes based on the expression levels of genes
associated with a given GO term.