Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This a draft PR to outline a JSON file format which would contain all information about the Variants & Mutations that are tracked on CoVariants, with defining mutations, to allow a 'lookup' that other apps and programs could automatically link to by using the list of information.
This is a Draft PR. I would love feedback.
I recognise that in the file there are comments which are not allowed - those are to provide clarity to the file structure. It also currently just includes 1 example of a Variant & Mutation, to settle on a good format. I will then write a script which generates this file from the existing files.
I'm not very familiar with JSON format and have found it restrictive - some parts I didn't even convert as I'm not sure if they're useful & I wasn't sure how to convert in a way that's concise.
@nodrogluap and @chaoran-chen I'd really appreciate your thoughts on this for what you have in mind to do!
Information:
alignment_defining
mutations will be most useful as these can be used to try to identify sequences from alignment only. However, this will miss sequences that have reversions, miscalls or are missing coverage at this position.phylogenetic_defining
are what are used to put the 'labels' on Nextstrain trees - they mark the branch where all these mutations are present, and all sequence below this (whether or not any particular sequence has these mutations - so it takes care of reversions and non-coverage)build_name
is what's used in file names & URLs as it's 'safe'. Sometimes it corresponds better to thedisplay_name
, sometimes not. If the discrepancy is big enough that it's problematic, then I could try to reconcile this within CoVariants.Questions:
phylogenetic_defining
useful? or just leave it?color
useful?pango_name
useful? This may not match 1:1 with running Pango. It's just taken from the name table on CoVariantsTo do: