forked from trvrb/antigen
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New GeometricSeqPhenotype class #22
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Haddox
reviewed
Aug 15, 2022
Haddox
reviewed
Aug 15, 2022
Haddox
reviewed
Aug 15, 2022
Haddox
reviewed
Aug 15, 2022
Haddox
reviewed
Aug 15, 2022
Haddox
reviewed
Aug 15, 2022
Haddox
reviewed
Aug 15, 2022
Haddox
reviewed
Aug 15, 2022
Haddox
reviewed
Aug 16, 2022
Haddox
reviewed
Aug 16, 2022
Haddox
reviewed
Aug 16, 2022
Haddox
reviewed
Aug 16, 2022
Haddox
reviewed
Aug 16, 2022
Haddox
reviewed
Aug 16, 2022
zorian15
reviewed
Jan 20, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was nice to refresh on this code -- nice job @thienktran !
General thoughts:
- We should think about adding support for reading in fasta files or the like so users don't have to copy and paste the genetic sequence into a
params.yml
file. - In the eventual documentation for
antigen-prime
-- or even just now in theREADME
, we should add some details about the kinds of formats we expect for certain file parameters (i.e., the DMS file, epitope sites (if we support making that a file), etc.
… dictionary of vectors
…rp/antigen into random-epitope-mutations
Allow users to specify if predefined vectors should be used
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
GeometricSeqPhenotype
is a subclass ofGeometricPhenotype
. Along with the parameters that determines the position of the object in Euclidean space, it stores nucleotide sequence information and the number of epitope and non-epitope mutations in fields. Only the nucleotide sequence is being stored to improve space and time complexity. The translation of a nucleotide sequence to a protein sequence can be done after the simulation. However, this does not mean amino acids don't play an important role in our model. For each site in the sequence, a matrix of vectors is precomputed before the simulation runs. The vectors are drawn from a gamma distribution, whose parameters can be changed inparameters.yml
. The number of epitope sites, which ranges from 0 to (the starting sequence length / 3), is parameterized in that file as well. Epitope sites remain the same until the program exits. Epitope sites have corresponding matrices with vectors drawn from a gamma distribution with different parameters than non-epiope sites.mutate()
This method randomly selects an index in the nucleotide sequence to mutate. Based on the transition-transversion ratio, a cumulative sum distribution array is used to determine what the nucleotide will mutate to. If the mutant amino acid is a stop codon, the process repeats starting with randomly selecting an index to mutate. Once a valid mutation occurs, where the object moves in space is determined by the vectors in the site's matrix at entry m, n which represents the index of the wild type and mutant amino acids in
Parameters.AMINO_ACIDS
, respectively.this
does not update. Instead, a newGeometricSeqPhenotype
with the updated nucleotide sequence, new position parameters, and number of epitope and non-epitope mutations is created. There must be a mutation if this method is called, so the nucleotide sequence must be different and the number of epitope mutations xor non-epitope mutations must be updated by 1. However, the new position parameters might not change since a nucleotide mutation does not necessarily cause a mutation in the protein sequence.Closes #17
Tests
GeometricSeqPhenotype
’s representation invariant, a condition that must be true throughout an object’s existence, is verified throughout an entire simulation using the debug flag inGeometricSeqPhenotype.java
. Whenever a method of the object is called, the representation invariant is checked to confirm that the nucleotide sequence doesn’t change in length (point mutation rather than frameshift mutation) or contain any stop codons.The JUnit tests in
TestGeometricSeqPhenotype.java
are used to make sure the constructors and methods ofGeometricSeqPhenotype.java
are working as expected.GeometricSeqPhenotype()
:getTraits()
getSequence()
distance()
mutate()
riskOfInfection()
toString()
The values returned by the constructors and methods are tested against the results calculated by hand. These JUnit tests only review logic errors. For example,
testMutate()
makes sure that a nucleotide at the given site in the nucleotide sequence is mutating to the given nucleotide. It does not test that the ratio of the number of transitions to the number of transversions.In order to do sanity checks that Antigen is actually taking biology and statistics into account, we instead have to visualize data from the simulation separately.
Transition-Transversion Ratio
The transition-transversion ratio can be specified in
parameters.yml
. The default value is 5.0. Each nucleotide mutation in a simulation is recorded and saved in a CSV file,mutations.csv
.mutations.csv
has one column with the following format: XY, where X is the wild type nucleotide and Y is the mutant nucleotide.The graph below shows the frequency of each possible nucleotide mutation. Notice the number of transition mutation occurs more frequently than the number of transversion mutation. The calculated ratio is 5.087. It’s not exactly 5.0 for various reasons. The starting sequence doesn’t contain the same number of each nucleotide, and some mutations cause stop codons so it must be mutated again.
Epitope and Non-epitope sites ~Gamma
The effects of an epitope or non-epitope mutation can be specified in
parameters.yml
. For each amino acid site’s corresponding matrix of vectors, the mutation notation, size of vector, and theta of each entry are recorded and saved in a CSV file,test/valuesGammaDistribution/0_siteX.csv
where X is the amino acid site number.The graph for non-epitope sites (
meanStep: 0.0001
andsdStep: 0.0001
) show that mutations that occur in non-epitope sites don’t move the phenotype very far in antigenic space.The graph for epitope sites (
meanStep: 2.0
andsdStep: 1.0
) are slightly different for each amino acid site, which is what we want. All epitope site distributions are consistent. (The orange line is the distribution used in the original Antigen and is used for reference).Checklist:
#278
) has been searched for in the code to find relevant notes(Sorry about all the nontrivial updates. Here’s a reminder on how to hide white space changes).