Virus-Database-Notes

Notes about a potential virus database

Database Requirements / Features

Architecture

The backend server should be able to be standalone-installed, where potential users could setup a server of their own for their own local use, whether installing it on their own machine or via a Docker image. It should also be able to scale to user needs.

Current planned backend setup: Cloudflare workers or just npm run dev for local environments, connected to Supabase (with a PostgreSQL database) and file storage on Cloudflare R2 (likely a local file system or some kind of mock S3 / R2 system for local environments). Look into libraries that help manage code complexity / abstract platform-specific syntax away.

Data Storage

CRAM format: Highly compressed, lossless, reference-based, sequence data format. May have higher processing overhead for compression/decompression.

For possibly an even more compressed form (possibly importing / exporting data), can look into: https://github.com/refresh-bio/agc.

Data Operations

1. Importing / Exporting Databases

2. Querying Data

Querying By Sequence

Querying By Metadata

Live-Sequence Searching (?)

Searching For Mutations

3. Adding Data

Read Mapping

4. Analyzing / Updating Data

Multi-Sequence Alignment

Consensus Sequence Generation

Variant Calling

5. Archiving / Deleting Data

Potential Database Paradigms

Will likely initially use PostgreSQL because it is an industry standard, has a comprehensive ecosystem, and can also support full text search (among other features). I (Daniel) also am the most familiar with this database language, so it will be easiest for me to get some kind of MVP out. Possibly can be later augmented with Redis, a graph database, or a search engine database.

1. Relational Database (PostgreSQL)

2. Key-Value Database (Redis)

3. Graph Database (Neo4j)

https://medium.com/codex/turn-neo4j-into-a-genome-browser-e94c7311dfab https://medium.com/geekculture/analyzing-genomes-in-a-graph-database-27a45faa0ae8 https://neo4j.com/blog/geneweaver-building-a-graph-to-map-variants-to-genes-using-neo4j-4-x-and-bulk-import/

4. Search Engine (Lucene, ElasticSearch)

Possible speed up of finding sequences in large database.

5. Vector Database

Possible speed up of finding similar sequences (as represented by vectors that point in the similar direction). Could be useful for variant calling speed up.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870668 (Look into DNA sequence embedding models to turn it into vectors).

Databases in Literature / in Use

GeNemo: https://pubmed.ncbi.nlm.nih.gov/27098038/

Frontend Interface

Frontend Requirements / Features

See Data Operations.

Frontend Tech Stack

Svelte (likely).

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Virus-Database-Notes

Database Requirements / Features

Architecture

Data Storage

Data Operations

1. Importing / Exporting Databases

2. Querying Data

3. Adding Data

4. Analyzing / Updating Data

5. Archiving / Deleting Data

Potential Database Paradigms

1. Relational Database (PostgreSQL)

2. Key-Value Database (Redis)

3. Graph Database (Neo4j)

4. Search Engine (Lucene, ElasticSearch)

5. Vector Database

Databases in Literature / in Use

Frontend Interface

Frontend Requirements / Features

Frontend Tech Stack

About

Releases

Packages

Contributors 2

License

Niema-Lab/Virus-Database-Notes

Folders and files

Latest commit

History

Repository files navigation

Virus-Database-Notes

Database Requirements / Features

Architecture

Data Storage

Data Operations

1. Importing / Exporting Databases

2. Querying Data

3. Adding Data

4. Analyzing / Updating Data

5. Archiving / Deleting Data

Potential Database Paradigms

1. Relational Database (PostgreSQL)

2. Key-Value Database (Redis)

3. Graph Database (Neo4j)

4. Search Engine (Lucene, ElasticSearch)

5. Vector Database

Databases in Literature / in Use

Frontend Interface

Frontend Requirements / Features

Frontend Tech Stack

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages