Skip to content

Commit

Permalink
Merge branch 'master' into issue-514-remove-bdbag-refs
Browse files Browse the repository at this point in the history
  • Loading branch information
suzialeksander authored Jul 3, 2024
2 parents 3373fb2 + 366a8c6 commit b4cdfed
Show file tree
Hide file tree
Showing 23 changed files with 496 additions and 71 deletions.
58 changes: 58 additions & 0 deletions .github/workflows/deploy.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
name: Deploy Jekyll site to Pages

on:
# Allow the workflow to be triggered manually. In particular, this allows it to be triggered
# from a workflow in the go-site repository.
workflow_dispatch:
push:
branches: ["master"]

# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages
permissions:
contents: read
pages: write
id-token: write

# Allow only one concurrent deployment, skipping runs queued between the run in-progress and latest queued.
# However, do NOT cancel in-progress runs as we want to allow these production deployments to complete.
concurrency:
group: "pages"
cancel-in-progress: false

jobs:
# Build job
build:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup Ruby
uses: ruby/setup-ruby@v1
with:
ruby-version: '2.7'
bundler-cache: true # runs 'bundle install' and caches installed gems automatically
- name: Setup Pages
id: pages
uses: actions/configure-pages@v4
- name: Fetch GO_REFs
run: make _data/gorefs.yaml
- name: Build with Jekyll
# Outputs to the './_site' directory by default
run: bundle exec jekyll build --baseurl "${{ steps.pages.outputs.base_path }}"
env:
JEKYLL_ENV: production
- name: Upload artifact
# Automatically uploads an artifact from the './_site' directory by default
uses: actions/upload-pages-artifact@v3

# Deployment job
deploy:
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
runs-on: ubuntu-latest
needs: build
steps:
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v4
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,10 @@ tramp
.org-id-locations
*_archive

## Other IDEs
.vscode
.idea

###
### From upstream jekyll theme.
###
Expand All @@ -42,3 +46,6 @@ _site
vendor/bundle

_algolia_api_key

## Transient data
_data/gorefs.yaml
3 changes: 3 additions & 0 deletions Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,10 @@ group :jekyll_plugins do
gem "jekyll-redirect-from"
gem "jekyll-seo-tag"
gem 'jekyll-algolia', '~> 1.0'
gem 'jekyll-datapage-generator'
end

# Windows does not include zoneinfo files, so bundle the tzinfo-data gem
gem 'tzinfo-data', platforms: [:mingw, :mswin, :x64_mingw, :jruby]

gem "rinku", "~> 2.0"
4 changes: 4 additions & 0 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ GEM
nokogiri (~> 1.6)
progressbar (~> 1.9)
verbal_expressions (~> 0.1.5)
jekyll-datapage-generator (1.4.0)
jekyll-feed (0.11.0)
jekyll (~> 3.3)
jekyll-redirect-from (0.16.0)
Expand Down Expand Up @@ -65,6 +66,7 @@ GEM
rb-fsevent (0.11.0)
rb-inotify (0.10.1)
ffi (~> 1.0)
rinku (2.0.6)
rouge (1.11.1)
safe_yaml (1.0.5)
sass (3.7.4)
Expand All @@ -81,10 +83,12 @@ PLATFORMS
DEPENDENCIES
jekyll (= 3.4.3)
jekyll-algolia (~> 1.0)
jekyll-datapage-generator
jekyll-feed
jekyll-redirect-from
jekyll-seo-tag
jekyll-sitemap
rinku (~> 2.0)
tzinfo-data
webrick

Expand Down
4 changes: 4 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
.PHONY: _data/gorefs.yaml

_data/gorefs.yaml:
wget -O $@ https://raw.githubusercontent.com/geneontology/go-site/master/metadata/gorefs.yaml
11 changes: 8 additions & 3 deletions _config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,15 @@ gems:
- jekyll-redirect-from
- jekyll-seo-tag
- jekyll-sitemap
- jekyll-datapage-generator

exclude:
- Gemfile
- Gemfile.lock
- .idea/
- .gitignore
- README.md
- vendor # In GitHub workflows, the ruby/setup-ruby action will install gems here
timezone: America/Los_Angeles
defaults:

Expand Down Expand Up @@ -60,9 +62,6 @@ collections:
permalink: /blog/:year/:month/:day/:title/
output: true

plugins_dir:
- jekyll-redirect-from

sass:
sass_dir: _sass

Expand All @@ -81,3 +80,9 @@ algolia:
- covid-19.html
# nodes_to_index: 'article' # elements to be indexed
nodes_to_index: 'p,blockquote,li,div,paragraph,td,span,h1,h2,h3'

page_gen:
- data: gorefs
template: goref
dir: GO_REF
name_expr: "record['id'].sub('GO_REF:', '')"
68 changes: 57 additions & 11 deletions _docs/download-go-annotations.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,23 +9,69 @@ redirect_from:

# Download annotations

## Current GO annotation downloads
The [GAF download page](http://current.geneontology.org/products/pages/downloads.html) has GAF files for selected species.
### Getting annotations for a selected organism

GAF & GPAD+GPI files are also available from the [/annotations/](http://current.geneontology.org/annotations/index.html){:target="blank"} directory of the current release: [http://current.geneontology.org](http://current.geneontology.org){:target="blank"}
This page has instructions for getting GO annotations for almost any organism. If your organism is not available in the [official GO products](http://current.geneontology.org/products/pages/downloads.html), [UniProt GAFs by proteome](https://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/){:target="blank"}, or [NCBI RefSeq](https://ftp.ncbi.nlm.nih.gov/genomes/refseq/){:target="blank"}, we recommend using the latest version of [InterProScan](https://interproscan-docs.readthedocs.io/en/latest/){:target="blank"} for unannotated organisms.

### Other species
If your organism is not available in the above links, you can use [AmiGO's annotation search](https://amigo.geneontology.org/amigo/search/annotation) feature to view or download annotations. [See our FAQ](https://geneontology.org/docs/faq/#where-can-i-view-or-download-the-complete-sets-of-go-annotations) for further information as well as how to retrieve annotations for species that are not available in AmiGO.
Jump to a section:
- [Commonly studied organisms](/docs/download-go-annotations/#1-commonly-studied-organisms)
- [All other organisms](/docs/download-go-annotations/#2-all-other-organisms)

## About GO annotation formats
+ Released monthly
+ Files are taxon-specific, with a few exceptions including the Reactome and *Candida* Genome Database files
#### Required Files
Most tools that use GO annotations take two input files:
1. a file with the **annotations** (in Gene Annotation Format, or GAF)
2. a file with the GO **ontology** structure (in Open Biomedical Ontology Format, or OBO)

Because the ontology and annotations are constantly being improved over time, we recommend downloading the latest version of the annotations for your organism and the corresponding ontology file for that GO version. The version should be specified in the header of the annotation file.

#### Citing GO
To ensure reproducibility for any publication where GO was used at any point in the research, please include:
* [appropriate GO publication(s)- refer to the full GO citation policy](/docs/go-citation-policy/)
* the URL where the files were obtained
* the date on the header of the GAF file
* the ontology version number

### [1. Commonly studied organisms](http://current.geneontology.org/products/pages/downloads.html)
[This GAF download page has annotations for selected commonly-studied species](http://current.geneontology.org/products/pages/downloads.html).

For organisms with many expert-curated GO annotations (those with MODs, dedicated databases, etc.), we recommend downloading annotations from the links in the above-linked table. These organisms often have a large number of manual annotations supported by direct experimental evidence as well as annotations based on other evidence types.
<!-- * Most of these have two downloads available, one with the full set of GO annotations, and one with only the “core” function annotations (PAN-GO) for each organism. /-->
* These annotations should be used with the [latest version of the GO ontology](http://current.geneontology.org/ontology/index.html).
* Annotations for these organisms are also available as GPAD/GPI companion files; see the [/annotations/](http://current.geneontology.org/annotations/index.html){:target="blank"} directory of the current release [http://current.geneontology.org](http://current.geneontology.org){:target="blank"}. For more information on these infrequently used filetypes see the format pages for [GPAD](/docs/gene-product-association-data-gpad-format/)+[GPI](/docs/gene-product-information-gpi-format/).

### 2. All other organisms
For all other organisms we recommend downloading annotations from one of the following sources: UniProt or NCBI RefSeq. Both of these provide highly accurate computational methods. The header of the annotation file specifies the version of the ontology you should use to accompany the annotation file. Older versions of the [GO ontology can be downloaded from the GO download archives](http://release.geneontology.org/).

* [UniProt GAFs by proteome](https://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/){:target="blank"}: Annotation files are available for about 20,000 complete proteomes (one protein sequence per protein-coding gene). Use these files if you want to use **UniProtKB identifiers**.
* Go to [https://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/](https://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/){:target="blank"}
* Navigate to your organism & download the `.goa` file, e.g. [`22426.A_gambiae.goa`](https://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/22426.A_gambiae.goa){:target="blank"}
*Tip: use your browser's in-page search to find the species name.*

* [NCBI RefSeq](https://ftp.ncbi.nlm.nih.gov/genomes/refseq/){:target="blank"}: If your organism has a reference genome assembly in NCBI, GO annotations are available in GAF format through NCBI Gene identifiers. Annotation files are available for all eukaryotic genomes available at NCBI RefSeq. Note that GO annotations are not currently available for archaea, bacteria or viruses.
* Go to [NCBI](https://www.ncbi.nlm.nih.gov/){:target="blank"}
* Navigate to your organism, e.g. [Anopheles gambiae](https://www.ncbi.nlm.nih.gov/search/all/?term=Anopheles%20gambiae){:target="blank"}
* Follow the ["Genomes" link](https://www.ncbi.nlm.nih.gov/datasets/genome/?taxon=7165){:target="blank"}
* Select the [reference assembly](https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_943734735.2/) at the top of the list; this entry is indicated with a green "reference genome" icon and a GCF identifer listed in the RefSeq column
* Click on the [FTP link](https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/943/734/735/GCF_943734735.2_idAnoGambNW_F1_1/){:target="blank"}
* Download the file with the suffix `gene_ontology.gaf.gz`, e.g. `GCF_943734735.2-RS_2023_12_gene_ontology.gaf.gz`

### 3. If you cannot find annotations for your organism for download as described above
[Get help from the GO helpdesk](https://help.geneontology.org/).

### 4. If your organism’s genome sequence is not yet publicly available
For example, if you have a set of new (protein) sequences that you want to annotate with GO terms, we recommend that you generate annotations using the latest version of InterProScan.
For most genomic analyses, your input file should have one protein sequence per protein-coding gene, though any set of protein sequences can be used.
Download InterProScan at [https://www.ebi.ac.uk/interpro/about/interproscan](https://www.ebi.ac.uk/interpro/about/interproscan/){:target="blank"}.

## More information on GO annotation formats
+ GO has monthly releases
+ Annotation files are taxon-specific, with a few exceptions including the Reactome and *Candida* Genome Database files
+ Current format guides:
+ [GAF format](/docs/go-annotation-file-gaf-format-2.2/)
+ [GAF format 2.2](/docs/go-annotation-file-gaf-format-2.2/)
+ [GPAD](/docs/gene-product-association-data-gpad-format/) + [GPI](/docs/gene-product-information-gpi-format/) companion files

## Programmatic access to GO annotations
As for any resource in GO, GO annotations are accessible through the DOI-versioned release stored in [Zenodo](https://doi.org/10.5281/zenodo.1205159){:target="blank"}.

## Error or omission ?
Any errors or omissions in annotations should be reported by writing to the [GO helpdesk](http://help.geneontology.org/){:target="blank"}
## Error or omission?
Any errors or omissions in annotations should be reported by writing to the [GO helpdesk](http://help.geneontology.org/){:target="blank"}.
2 changes: 1 addition & 1 deletion _docs/download-go-cams.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,4 @@ permalink: /docs/download-go-cams/
As for any resource in GO, GO-CAMs are accessible through the DOI-versioned release stored in [Zenodo](https://doi.org/10.5281/zenodo.1205159){:target="blank"}.

## Error or omission ?
Any errors or omissions in annotations should be reported by writing to the [GO helpdesk](https://help.geneontology.org/){:target="blank"}
Any errors or omissions in annotations should be reported by writing to the [GO helpdesk](https://help.geneontology.org/){:target="blank"}.
4 changes: 1 addition & 3 deletions _docs/download-ontology.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Files are available in the following formats:

|**Subset name**|**Maintainer**|**File name**|**OBO format**|**OWL format**|**json format**|
|------------------|-------------|-------------|-------------|------------|-------------|
|**GO slim AGR subset**|Developed by GO Consortium for the [Alliance of Genomes Resources](https://www.alliancegenome.org/){:target="blank"} |goslim_agr |[obo](https://current.geneontology.org/ontology/subsets/goslim_agr.obo) |[owl](https://current.geneontology.org/ontology/subsets/goslim_agr.owl){:target="blank"} |[json](https://current.geneontology.org/ontology/subsets/goslim_agr.json){:target="blank"} |
|***A*lliance of *G*enome *R*esources subset**|Developed by GO Consortium for the [Alliance of Genomes Resources](https://www.alliancegenome.org/){:target="blank"} |goslim_agr |[obo](https://current.geneontology.org/ontology/subsets/goslim_agr.obo) |[owl](https://current.geneontology.org/ontology/subsets/goslim_agr.owl){:target="blank"} |[json](https://current.geneontology.org/ontology/subsets/goslim_agr.json){:target="blank"} |
|**Generic GO subset**|[GO Consortium](https://help.geneontology.org/){:target="blank"} |goslim_generic|[obo](https://current.geneontology.org/ontology/subsets/goslim_generic.obo)| [owl](https://current.geneontology.org/ontology/subsets/goslim_generic.owl){:target="blank"} |[json](https://current.geneontology.org/ontology/subsets/goslim_generic.json){:target="blank"} |
|*__Aspergillus__* **subset**|[_Aspergillus_ Genome Data](http://www.aspgd.org/){:target="blank"} |goslim_aspergillus|[obo](https://current.geneontology.org/ontology/subsets/goslim_aspergillus.obo) |[owl](https://current.geneontology.org/ontology/subsets/goslim_aspergillus.owl){:target="blank"} |[json](https://current.geneontology.org/ontology/subsets/goslim_aspergillus.json){:target="blank"} |
|*__Candida albicans__* **subset**|[_Candida_ Genome Database](http://www.candidagenome.org/){:target="blank"} |goslim_candida|[obo](https://current.geneontology.org/ontology/subsets/goslim_candida.obo)|[owl](https://current.geneontology.org/ontology/subsets/goslim_candida.owl){:target="blank"} |[json](https://current.geneontology.org/ontology/subsets/goslim_candida.json){:target="blank"} |
Expand All @@ -52,8 +52,6 @@ For internal checking purposes, GO maintains two "anti-slims", terms to which an
|**Subset name**|**Usage** |**File name** |**OBO format** |**OWL format** |**json format** |
|------------------|----------|----------|----------|----------|----------|
|**Do not annotate**|The set of high level terms that are useful for grouping, but should have no direct annotations| gocheck_do_not_annotate |[obo](https://current.geneontology.org/ontology/subsets/gocheck_do_not_annotate.obo)| [owl](https://current.geneontology.org/ontology/subsets/gocheck_do_not_annotate.owl){:target="blank"} |[json](https://current.geneontology.org/ontology/subsets/gocheck_do_not_annotate.json){:target="blank"} |
|**Do not manually annotate**|The set of high level terms that are useful for grouping, but should have no direct annotations except from automated tools| gocheck_do_not_manually_annotate|[obo](https://current.geneontology.org/ontology/subsets/gocheck_do_not_manually_annotate.obo)|[owl](https://current.geneontology.org/ontology/subsets/gocheck_do_not_manually_annotate.owl){:target="blank"} |[json](https://current.geneontology.org/ontology/subsets/gocheck_do_not_manually_annotate.json){:target="blank"} |


## Cross-references of GO to other classification systems

Expand Down
17 changes: 10 additions & 7 deletions _docs/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -822,19 +822,22 @@ No - the term will always have the same children wherever, and however many time
{::comment}

<span class="rdf-meta element-hidden" property="dc:title" content="How do I get the term names for my list of GO IDs? How do I get GO IDs for my GO terms? What about definitions?"></span>
FAQ tags: 
FAQ tags:

[mappings](/faq-tags/mappings)

[ontology](/faq-tags/ontology)
{:/comment}

You can use the YeastMine Analyze tool available at SGD! This tool will return a table of GO ID, GO term name, GO term namespace (cellular component, molecular function, or biological process) and GO term description for each valid GO ID you supply. This will work for any organism, as the GO is the same!
You can use the AllianceMine's Upload List tool available at the Alliance website! This tool will return a table of GO ID, GO term name, and GO term description for each valid GO ID you supply. This will work for any organism, as the GO is the same!

1. Go to the [Upload List tool on AllianceMine](https://www.alliancegenome.org/bluegenes/alliancemine/upload/input){:target="blank"}
2. In the List Type pull down, select `GO Term`
3. Enter your GO ids or upload a file, making sure GO IDs have the correct format (GO:0016020, GO:0016301...)
4. Click on `Continue`, and then on the next page use the `Save List` button.
5. You can use the `Save list` button on the next page to use this list in AllianceMine, or use the `Export` button to see download options.

1. Go to the [Analyze tool on YeastMine](http://yeastmine.yeastgenome.org/yeastmine/bag.do){:target="blank"}
2. In the Select Type pull down, select `GO Term`
3. Enter your GO ids or upload a list in the full format (GO:0016020, GO:0016301...)
4. Click on `Create List`. The tool offers several options to download the list when you use the `Save a list of...` button.
If you need the aspect (cellular component, molecular function, or biological process) for each term, you can add this to the results before saving. Use the `Add Columns`, click `Namespace` to highlight that option, then click the `Add 1 columns` button in the lower right. You can also use the AllianceMine features to filter your list, for example to select only molecular_function terms in your list.

If you have a list of GO terms and wish to retrieve GO IDs and/or definitions, you can use the steps above. Make sure multi-word GO terms are in double quotes (sporulation,"lactase activity","codeine metabolic process") as the tool will otherwise recognise spaces as delimiters.

Expand All @@ -843,7 +846,7 @@ If you have a list of GO terms and wish to retrieve GO IDs and/or definitions, y
{::comment}

<span class="rdf-meta element-hidden" property="dc:title" content="Can I download the ontologies as an Excel spreadsheet?"></span>
FAQ tags: 
FAQ tags:

[ontology](/faq-tags/ontology)
{:/comment}
Expand Down
Loading

0 comments on commit b4cdfed

Please sign in to comment.