Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download SNV frequency table and lineage table filtered by location/date range #437

Open
rosekantor opened this issue Nov 12, 2021 · 2 comments
Assignees
Labels
data UI User Interface

Comments

@rosekantor
Copy link

Hello,

This site is my go-to resource for identifying lineages based on mutations in wastewater, and I've been sharing it with colleagues who really appreciate it, too. I am now looking specifically for mutations common in California within a date range (for example to check my primers against the prevalent sequences or to know what SNVs I might expect to find in specific wastewater samples).

There are two tables I'm interested in being able to download:

  1. A table of the data shown in the second bar from the left on the locations page (says "AA SNVs" or "NT SNVs" on the top, depending on the applied filters) - see screenshot below.
  2. A filtered version of the full lineage table obtained by clicking "download" > "consensus mutation". Here, the applied filters do not appear to have any effect- perhaps should be a separate issue.

Thanks in advance,

Rose

Screen Shot 2021-11-11 at 3 56 51 PM

@atc3
Copy link
Member

atc3 commented Nov 12, 2021

Hi Rose,

A table of the data shown in the second bar from the left on the locations page (says "AA SNVs" or "NT SNVs" on the top, depending on the applied filters) - see screenshot below.

To get the data for the legend, you can select "Download Aggregate Data" (see picture below)

This results in a CSV file, where each unique combination of mutations is aggregated to a row. The mutations are in the form pos|ref|alt, and are delimited by semicolons. To get the frequencies of single mutations, you'll have to pull apart that mutation string and count each mutation as you go through the rows

I understand this is a bit of work, so I'll make a download item that splits this up and collapses by date like you requested.

A filtered version of the full lineage table obtained by clicking "download" > "consensus mutation". Here, the applied filters do not appear to have any effect- perhaps should be a separate issue.

You're correct - the consensus mutations are calculated across the entire dataset and are not computed based on the user's selections.

I can add a checkbox into the "download" -> "consensus mutation" dialog that specifies whether to use the whole dataset or just the sequences from the user selection.

We're currently working on a refactor of some of the core components of the site – so these changes can't be implemented immediately... maybe a week or two? I'll let you know when it's live.

Albert

@atc3 atc3 self-assigned this Nov 12, 2021
@atc3 atc3 added data UI User Interface labels Nov 12, 2021
atc3 added a commit that referenced this issue Nov 30, 2021
atc3 added a commit that referenced this issue Nov 30, 2021
* Add grouping columns (and indices) to sequence-mtuation tables

* bump version

* Add omicron

* Fix capitalization in group label

* Add group aggregate download (#437)
@atc3
Copy link
Member

atc3 commented Jan 20, 2022

Hi Rose,

Apologies for the late reply to this.

  1. I've added a download for this data, it's named "Group Counts" under the download button

  2. I added another download endpoint for consensus mutations, that also filters on date ranges, locations, etc. Right now it's not linked up to any part of the site (still figuring that out), but it's available as an API endpoint. I've described it here: https://github.com/vector-engineering/covidcg/blob/master/API.md#dynamic-group-mutation-frequencies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data UI User Interface
Projects
None yet
Development

No branches or pull requests

2 participants