Skip to content

Commit

Permalink
Merge pull request #88 from viralemergence/readme-update
Browse files Browse the repository at this point in the history
Update README.md
  • Loading branch information
cjcarlson authored Oct 17, 2024
2 parents 74332bb + 7589ff7 commit 0793750
Showing 1 changed file with 141 additions and 9 deletions.
150 changes: 141 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,146 @@
# The VIRION database

## Changelog
![GitHub last commit (branch)](https://img.shields.io/github/last-commit/viralemergence/virion/gh-pages)
![](https://img.shields.io/badge/Code%20license-MIT-green)
![](https://img.shields.io/badge/Data%20license-CC--0-brightgreen)

_October 17, 2024._
VIRION is an atlas of the vertebrate-virus network, maintained by [Verena](https://www.viralemergence.org/), an NSF Biology Integration Institute. The development of the VIRION database through 2022 is documented [here](https://journals.asm.org/doi/10.1128/mbio.02985-21); as we work to transition to a new platform, an interim changelog can also be found below. We strongly encourage researchers to read both the _mBio_ publication and this README file before using these data.

The GLOBI dataset has been de-indexed from VIRION, due to a growing number of concerns about the reliability of text-mined data. This represents an important departure from the pipeline described in the _mBio_ publication. The current pipeline is described in an updated diagram below:
> [!CAUTION]
> Several known taxonomic errors exist in the data. These are documented in the issues on this repository, and we encourage users to add new issues as appropriate. Major changes to viral taxonomy, such as the switch to binomial nomenclature, are also not currently reflected. Finally, users should be cautious about interpretation of the data, and consult the manuscript for guidance about how to handle some of VIRION's unique features, such as a mix of evidentiary standards.
![image](https://github.com/user-attachments/assets/799962a7-f05a-4fc0-8aad-7b4e15f8db85)
## How to get VIRION

Over the coming months, we hope to roll out additional changes. Goals include:
- a revision of the license VIRION is distributed under
- incorporation of new data sources
- improved data standardization and taxonomic reconciliation
- periodic updates to the static components of CLOVER to address known errors
For now, VIRION lives on Github in a fully open and reproducible format. Downloading the data directly from this website, or cloning the repository, is the easiest way to access the data. The full database is available in a single comprehensive file (`Virion/Virion.csv.gz`), but specific components of the data are also shared as individual files. Currently, all files are compressed in .csv.gz format to minimize storage requirements. The data can be unzipped using tools such as the `vroom` R package or the `CSV.jl` Julia library.

### Download links

- [**Full database**](https://github.com/viralemergence/virion/blob/main/Virion/Virion.csv.gz)

- [Simplified edgelist](https://github.com/viralemergence/virion/blob/main/Virion/Edgelist.csv.gz)

- [Provenance metadata](https://github.com/viralemergence/virion/blob/main/Virion/Provenance.csv.gz)

- [Detection metadata](https://github.com/viralemergence/virion/blob/main/Virion/Detection.csv.gz)

- [Temporal metadata](https://github.com/viralemergence/virion/blob/main/Virion/Temporal.csv.gz)

- [Host higher taxonomy](https://github.com/viralemergence/virion/blob/main/Virion/TaxonomyHost.csv.gz)

- [Virus higher taxonomy](https://github.com/viralemergence/virion/blob/main/Virion/TaxonomyVirus.csv.gz)

## How to use VIRION

### Citing VIRION

If you use VIRION for your research, please both:
1. Cite the publication: Carlson CJ, Gibb RJ, Albery GF, Brierley L, Connor R, Dallas T, Eskew EA, Fagre AC, Farrell MJ, Frank HK, Muylaert RL, Poisot T, Rasmussen AL, Ryan SJ, Seifert SN. The Global Virome in One Network (VIRION): an Atlas of Vertebrate-Virus Associations. mBio. 2022 Mar 1. DOI: 10.1128/mbio.02985-21.
2. Include the following statement in your acknowledgements: "This project was supported by the Verena data ecosystem, funded by the U.S. National Science Foundation (NSF DBI 2213854)."

If necessary (e.g., for specific journal requirements), you can also cite VIRION itself using this DOI: [![DOI](https://zenodo.org/badge/319686363.svg)](https://zenodo.org/badge/latestdoi/319686363).

### Reproducing VIRION

VIRION is currently distributed under a CC-0 license. As such, you can do just about anything with it that you'd like. We would _prefer_ it not be reproduced into other formats that lose intentional aspects of VIRION's design (e.g., in other databases that drop metadata like evidence standards; as static supplemental files on studies that will never be updated; etc.), but it's your party!

## How VIRION works

The VIRION database is periodically re-compiled from two static sources (CLOVER and PREDICT) and one dynamic source (GenBank). If you want to reproduce the vignettes we present in the publication, you can also download the entire release of [version 0.2.1](https://github.com/viralemergence/virion/releases/tag/v0.2.1-beta).

```mermaid
flowchart TB
%% Nodes
GMPD2("GMPD2")
HP3("HP3")
Shaw("Shaw")
EID2("EID2 (2015)")
RECON1("
**Reconciliation 1**
• names reconciled to NCBI taxonomy
• temporal and sampling metadata standardized
• manual curation of unmatched names
*mammals only!*")
style RECON1 text-align:left, fill:#efefef, stroke:#a0a0a0;
style RECON2 text-align:left, fill:#efefef, stroke:#a0a0a0;
CLOVER("CLOVER")
GenBank("fa:fa-arrows-spin GenBank")
PREDICT("fa:fa-box-archive PREDICT")
RECON2("**Reconciliation 2**
• update of dynamic datasets (GenBank)
• names reconciled to NCBI and ICTV
• additional quality checks
• community sourcing for validation
")
subgraph virion[**Virion flat files**]
Virion["fa:fa-file-zipper Virion.csv.gz"]
Edgelist["fa:fa-table Edgelist.csv"]
TaxonomyHost["fa:fa-table TaxonomyHost.csv"]
TaxonomyVirus["fa:fa-table TaxonomyVirus.csv"]
Provenance["fa:fa-file-zipper Provenance.csv.gz"]
Detection["fa:fa-file-zipper Detection.csv.gz"]
Temporal["fa:fa-file-zipper Temporal.csv.gz"]
direction TB
Edgelist --> Virion
TaxonomyHost --> Virion
TaxonomyVirus --> Virion
Provenance --> Virion
Detection --> Virion
Temporal --> Virion
end
style virion stroke:#a0a0a0, fill:#efefef;
%% Edge connections between nodes
GMPD2 --> RECON1;
HP3 --> RECON1;
Shaw --> RECON1;
EID2 --> RECON1;
RECON1 --> CLOVER;
CLOVER --> RECON2;
GenBank --> RECON2;
PREDICT --> RECON2;
RECON2 --> virion;
%% Individual node styling. Try the visual editor toolbar for easier styling!
style GMPD2 color:#FFFFFF, fill:#AA00FF, stroke:#AA00FF
style HP3 color:#FFFFFF, fill:#AA00FF, stroke:#AA00FF
style Shaw color:#FFFFFF, fill:#AA00FF, stroke:#AA00FF
style EID2 color:#FFFFFF, fill:#AA00FF, stroke:#AA00FF
style CLOVER color:#FFFFFF, stroke:#00C853, fill:#00C853
style GenBank color:#FFFFFF, stroke:#2962FF, fill:#2962FF
style PREDICT color:#FFFFFF, stroke:#2962FF, fill:#2962FF
%% You can add notes with two "%" signs in a row!
```

# Additional information

## Contact

- For general questions about VIRION, please reach out to [Colin Carlson](mailto:[email protected])
- For specific questions about the CLOVER dataset, please contact [Rory Gibb](mailto:[email protected])
- For specific questions about the GitHub actions pipeline, please contact [Timothée Poisot](mailto:[email protected])

### Changelog

**October 17, 2024**: The GLOBI dataset has been de-indexed from VIRION, due to a growing number of concerns about the reliability of text-mined data. This represents an important departure from the pipeline described in the _mBio_ publication. The current pipeline is described at the top of this README.

> [!NOTE]
> Over the coming months, we hope to roll out additional changes. Goals include:
> - a revision of the license VIRION is distributed under
> - incorporation of new data sources
> - improved data standardization and taxonomic reconciliation
> - periodic updates to the static components of CLOVER to address known errors

0 comments on commit 0793750

Please sign in to comment.