Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New GeometricSeqPhenotype class #22

Merged
merged 65 commits into from
Jun 20, 2023
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
9bd296f
Created a GeometricSeqPhenotype class that implements Phenotype and u…
thienktran Jun 13, 2022
e8e1972
Maintain translation from nucleotides to amino acids
thienktran Jun 14, 2022
139d01e
Precompute vectors from gamma distribution
thienktran Jun 16, 2022
a725bcd
Made sure nucleotide is being mutated to a different letter.
thienktran Jun 17, 2022
942f99a
Test matrix of vectors drawn from the gamma distribution
thienktran Jun 23, 2022
3f40eee
Remove gamma distribution csv output from repo
thienktran Jun 23, 2022
38d12b1
Keep mutating until it's not a stop codon
thienktran Jun 24, 2022
6f28dd7
if mutation creates a stop codon, switch index to mutate
thienktran Jun 24, 2022
909b6c8
represent sequence using a char[]
thienktran Jun 27, 2022
5a46d50
use inheritance instead of composition
thienktran Jun 28, 2022
59cfd80
Clean up code, add documentation
thienktran Jul 2, 2022
e4e53a7
finish unit tests for geometricseq
thienktran Jul 7, 2022
475fc44
clean up repo
thienktran Aug 14, 2022
7b9c27d
each sentence gets own line.
zorian15 Aug 21, 2022
df858e1
Address Hugh and Zorian's comments. First round.
thienktran Aug 22, 2022
be5082b
Merge branch 'main' of https://github.com/matsengrp/antigen into 17-g…
thienktran Aug 22, 2022
9808dcc
Remove SequencePhenotype.java
thienktran Aug 22, 2022
9add838
Merge remote-tracking branch 'origin/17-geometric-sequence' into 17-g…
thienktran Aug 22, 2022
3c098e9
Fix issue #23
thienktran Aug 30, 2022
3dfc3cf
Remove DMS file requirement
thienktran Aug 30, 2022
df37af1
Clean up code and refactor sanity check
thienktran Aug 30, 2022
157f73c
Created a GeometricSeqPhenotype class that implements Phenotype and u…
thienktran Jun 13, 2022
27e0dbf
Maintain translation from nucleotides to amino acids
thienktran Jun 14, 2022
023de81
Precompute vectors from gamma distribution
thienktran Jun 16, 2022
c00b8ac
Made sure nucleotide is being mutated to a different letter.
thienktran Jun 17, 2022
17d1512
Test matrix of vectors drawn from the gamma distribution
thienktran Jun 23, 2022
65e1a9a
Remove gamma distribution csv output from repo
thienktran Jun 23, 2022
87253ad
Keep mutating until it's not a stop codon
thienktran Jun 24, 2022
01b2baf
if mutation creates a stop codon, switch index to mutate
thienktran Jun 24, 2022
0f59c97
represent sequence using a char[]
thienktran Jun 27, 2022
7133f81
use inheritance instead of composition
thienktran Jun 28, 2022
7563523
Clean up code, add documentation
thienktran Jul 2, 2022
8bc8381
finish unit tests for geometricseq
thienktran Jul 7, 2022
7373b88
clean up repo
thienktran Aug 14, 2022
3c02bf0
Address Hugh and Zorian's comments. First round.
thienktran Aug 22, 2022
c729e52
Fix issue #23
thienktran Aug 30, 2022
aa63a7d
Remove DMS file requirement
thienktran Aug 30, 2022
f3fc0f3
Clean up code and refactor sanity check
thienktran Aug 30, 2022
ec69825
Merge branch '17-geometric-sequence' of https://github.com/matsengrp/…
zorian15 Jan 10, 2023
ca1fdb1
format updates.
zorian15 Jan 10, 2023
91ef2a7
remove illegal characters from filename
zorian15 Jan 10, 2023
e7b93f3
update readme
zorian15 Jan 10, 2023
167f45a
Nonfunctional changes to code from Erick's review of PR #22
thienktran Jan 10, 2023
e4c5c50
Test CodonMap values are correct
thienktran Jan 10, 2023
f97108f
Merge branch '17-geometric-sequence' of https://github.com/matsengrp/…
thienktran Jan 10, 2023
75bfc27
Update out.tips for GeometricSeq fields
thienktran Jan 14, 2023
9a551ae
change names from traitA/B to ag1/2 to consistency
thienktran Jan 15, 2023
3545f63
Clean up commented and unused code
thienktran Jan 31, 2023
11c4fa3
print fasta for branches
thienktran Feb 14, 2023
e25260b
Add startingSequence and epitopeSites files and output fasta file
thienktran Feb 21, 2023
b1d0e30
Initial commit for testing geometric seq phenotype without predefined…
thienktran Mar 7, 2023
6e2a279
Clean up and refactor things
thienktran Mar 28, 2023
bb7309e
Fix bug with reset and organize/optimize a little more in GeometricSe…
thienktran Mar 29, 2023
fe5ca12
Make sure that the nucleotide sequence is updated even for synonymous…
thienktran Mar 30, 2023
198ea35
fasta file rename for pipeline
zorian15 Apr 19, 2023
e44a14c
Update README.md
zorian15 Apr 20, 2023
748b8c9
updates to params for customization of output path
zorian15 Apr 20, 2023
e9b8edc
Merge branch 'random-epitope-mutations' of https://github.com/matseng…
zorian15 Apr 20, 2023
c1fcfb4
remove .out from outputs
zorian15 Apr 20, 2023
2bf0b15
Merge pull request #24 from matsengrp/random-epitope-mutations
thienktran Apr 20, 2023
4312ea2
Add headers to fasta file sequences
thienktran Apr 20, 2023
14f65c1
Update to deme name instead of ID
thienktran Apr 20, 2023
f45756d
allow last codon to be a stop
zorian15 May 3, 2023
1bfd70a
log file creation
zorian15 May 3, 2023
28e945e
file formatting
zorian15 May 3, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion Antigen.java
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
/* Implements an individual-based model in which the infection's genealogical history is tracked through time */

import java.io.FileNotFoundException;

class Antigen {
public static void main(String[] args) {
public static void main(String[] args) throws FileNotFoundException {

// initialize random number generator
cern.jet.random.AbstractDistribution.makeDefaultGenerator();
Expand Down
397 changes: 397 additions & 0 deletions Biology.java

Large diffs are not rendered by default.

316 changes: 316 additions & 0 deletions GeometricSeqPhenotype.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,316 @@
import java.util.Arrays;

/**
* <b>GeometricSeqPhenotype</b> stores a Virus's genetic sequence and antigenic phenotype,
* and tracks the cumulative number of mutations in epitope and non-epitope sites.
* Antigenic phenotype is defined as a position in 2D space, where the antigenic distance
* between two viruses can be computed as the Euclidean distance between them.
* Multiple Viruses can reference a single GeometricSeqPhenotype which is identified by its data contents.
*
* @author Thien Tran
*/
public class GeometricSeqPhenotype extends GeometricPhenotype {
/**
* The nucleotide sequence of this GeometricSeqPhenotype
*/
private char[] nucleotideSequence;

/**
* The first parameter that determines the position of this GeometricPhenotype in Euclidean space
*/
private double traitA;

/**
* The second parameter that determines the position of this GeometricPhenotype in Euclidean space
*/
private double traitB;

/**
* The number of epitope mutations this GeometricPhenotype went through (counting from the startingSequence)
*/
private int eMutation;
thienktran marked this conversation as resolved.
Show resolved Hide resolved

/**
* The number of non-epitope mutations this GeometricPhenotype went through (counting from the startingSequence)
*/
private int neMutation;

/**
*
*/
public static final boolean SANITY_TEST = true;

/**
* Run expensive tests iff DEBUG == true.
*/
public static final boolean DEBUG = false;

// Abstraction Function:
// A GeometricSeqPhenotype, gsp, is null if gsp.sequence = null,
// otherwise gsp.sequence = sequence for sequence.length() > 0 and sequence.length() % 3 == 0.
//
// The location of gsp in Euclidean space is defined by (traitA, traitB) each gsp is associated with a sequence of nucleotides.
// gsp1.sequence and gsp2.sequence are allowed to be equal for two given GeometricSeqPhenotype

// Representation invariant for every GeometricSeqPhenotype:
// gsp.sequence != null && gsp.sequence.length() > 0 && gsp.sequence.length() % 3 == 0 && gsp.sequence.length() == startingSequence.length()
// for all indexSite such that gsp.sequence.charAt(i):
// indexSite is a character in ALPHABET
// for all codon such that gsp.sequence.charAt(i) + gsp.sequence.charAt(i + 1) + gsp.sequence.charAt(i + 2):
// codon is not "STOP"

/**
* Constructor that creates a new GeometricSeqPhenotype.
*
* @effects Constructs a new GeometricSeqPhenotype that has a random starting sequence of length Parameters.sequence.length()
* and an empty GeometricPhenotype
*/
public GeometricSeqPhenotype() {
this.traitA = 0.0;
this.traitB = 0.0;
this.nucleotideSequence = Parameters.startingSequence.toCharArray();
this.eMutation = 0;
this.neMutation = 0;
checkRep();
}

/**
* Constructor that creates a new GeometricSeqPhenotype.
*
* @param tA the x-coordinate of the new GeometricSeqPhenotype.
* @param tB the y-coordinate of the new GeometricSeqPhenotype.
* @effects Constructs a new GeometricSeqPhenotype that has a random starting sequence of length Parameters.sequence.length()
* and GeometricPhenotype with the data content of the given parameters.
*/
public GeometricSeqPhenotype(double tA, double tB) {
this();
this.traitA = tA;
this.traitB = tB;
checkRep();
}

/**
* Constructor that creates a new GeometricSeqPhenotype.
*
* @param tA the x-coordinate of the new GeometricSeqPhenotype.
* @param tB the y-coordinate of the new GeometricSeqPhenotype.
* @param startingSequence the starting nucleotide sequence of the GeometricSeqPhenotype to be constructed.
* @requires nucleotideSequence != null && nucleotideSequence.length() > 0 && nucleotideSequence.length() % 3 == 0
* @effects Constructs a new GeometricSeqPhenotype that has a starting sequence
* and GeometricPhenotype with the data content of the given parameters.
*/
public GeometricSeqPhenotype(double tA, double tB, char[] startingSequence) {
this();
this.traitA = tA;
this.traitB = tB;
this.nucleotideSequence = startingSequence;
checkRep();
}

/**
* Constructor that creates a new GeometricSeqPhenotype.
*
* @param tA the x-coordinate of the new GeometricSeqPhenotype.
* @param tB the y-coordinate of the new GeometricSeqPhenotype.
* @param startingSequence the starting nucleotide sequence of the GeometricSeqPhenotype to be constructed.
* @param e the number of epitope mutations this GeometricSeqPhenotype is from startingSequence
* @param nE the number of non-epitope mutations from startingSequence is from startingSequence
* @requires nucleotideSequence != null && nucleotideSequence.length() > 0 && nucleotideSequence.length() % 3 == 0
* @effects Constructs a new GeometricSeqPhenotype that has a starting sequence
* and GeometricPhenotype with the data content of the given parameters.
*/
public GeometricSeqPhenotype(double tA, double tB, char[] startingSequence, int e, int nE) {
this.traitA = tA;
this.traitB = tB;
this.nucleotideSequence = startingSequence;
this.eMutation = e;
this.neMutation = nE;
checkRep();
}

/**
* Return x-coordinate of where this GeometricSeqPhenotype is in Euclidean space
*
* @return position of this GeometricSeqPhenotype along the x-axis
*/
public double getTraitA() {
return traitA;
}

/**
* Return y-coordinate of where this GeometricSeqPhenotype is in Euclidean space
*
* @return position of this GeometricSeqPhenotype along the y-axis
*/
public double getTraitB() {
return traitB;
}

/**
* Returns the nucleotide sequence of this GeometricSeqPhenotype.
* Valid example outputs include "ACG" and "ACGTGTACGTGT"
*
* @return the nucleotide sequence of the GeometricSeqPhenotype represented by this.
*/
public String getSequence() {
thienktran marked this conversation as resolved.
Show resolved Hide resolved
return String.valueOf(this.nucleotideSequence);
}

/**
* Return the (Euclidean) distance between this GeometricSeqPhenotype and p in Euclidean space
*
* @param p GeometricSeqPhenotype to calculate the distance between
* @return distance between this GeometricSeqPhenotype and p
*/
public double distance(Phenotype p) {
GeometricSeqPhenotype p2d = (GeometricSeqPhenotype) p;
double distA = (getTraitA() - p2d.getTraitA());
double distB = (getTraitB() - p2d.getTraitB());
double dist = (distA * distA) + (distB * distB);
dist = Math.sqrt(dist);
checkRep();
thienktran marked this conversation as resolved.
Show resolved Hide resolved
return dist;
}

/**
* Returns a mutated copy of this GeometricSeqPhenotype (point substitution), original GeometricSeqPhenotype is unharmed
*
* @return a mutated copy of this GeometricSeqPhenotype
*/
public Phenotype mutate() {
// Implementation:
// Mutates a nucleotide in the sequence at a random index
// If mutation creates a stop codon, then mutate at another index

// this.nucleotideSequence is represented using a char[] instead of
// - String since Strings are immutable. New Strings have to be created to mutate a sequence,
// which is messy and uses up extra heap space. Plus, Strings include an extra byte '\n'.
// - StringBuffer because it's just a Wrapper around a char[], and has functionality that we don't need such as resizing.

String wildTypeAminoAcid = "", mutantAminoAcid = "";
int nucleotideMutationIndex = -1;
char wildTypeNucleotide = ' ', mutantNucleotide = ' ';
int cycle = 0;

// Make a single nucleotide mutation to the sequence
// If the mutation results in a stop codon, then throw that mutation away and try another one
while (mutantAminoAcid.equals("") || mutantAminoAcid.equals("STOP")) {
cycle++;
// choose random index to mutate in this.nucleotideSequence
nucleotideMutationIndex = Random.nextInt(0, this.nucleotideSequence.length - 1);

wildTypeNucleotide = this.nucleotideSequence[nucleotideMutationIndex];

// get mutant nucleotide (transition/transversion ratio)
mutantNucleotide = Biology.MutationType.MUTATION.getNucleotide(wildTypeNucleotide);

String[] wildTypeMutantAminoAcids = mutateHelper(nucleotideMutationIndex, mutantNucleotide);

wildTypeAminoAcid = wildTypeMutantAminoAcids[0];
mutantAminoAcid = wildTypeMutantAminoAcids[1];

if (SANITY_TEST) {
// (proteinMutationIndex + 1) to show one-based numbering
TestGeometricSeqPhenotype.mutations.print("" + wildTypeNucleotide + mutantNucleotide + "," + wildTypeAminoAcid + "," +
mutantAminoAcid + "," + cycle + "," + this.hashCode() + "\n");
}
}

int proteinMutationIndex = nucleotideMutationIndex / 3; // site # where mutation is occurring {0, . . ., total number of sites - 1}
// Update the x and y coordinates of the virus in antigenic space
double[] vectors = updateAntigenicPhenotype(proteinMutationIndex, wildTypeAminoAcid, mutantAminoAcid);
double mutA = vectors[0];
double mutB = vectors[1];

// Update the virus's nucleotide sequence by introducing the mutation from above
char[] copyNucleotideSequence = Arrays.copyOf(this.nucleotideSequence, Parameters.startingSequence.length());
copyNucleotideSequence[nucleotideMutationIndex] = mutantNucleotide;

// Determine whether the mutation occurred in an epitope or non-epitope site, and update
// the counts of epitope and non-epitope mutations accordingly
int eMutationNew = this.eMutation;
int neMutationNew = this.neMutation;
if (Biology.SiteMutationVectors.VECTORS.getEpitopeSites().contains(proteinMutationIndex)) {
eMutationNew += 1;
} else {
neMutationNew += 1;
}

checkRep();
Phenotype mutP = new GeometricSeqPhenotype(mutA, mutB, copyNucleotideSequence, eMutationNew, neMutationNew);
return mutP;
}

// Updates a virus's location in antigenic space upon a mutation by taking the
// vector giving the virus's current location and then summing it with a
// precomputed vector that gives the antigenic effect of the mutation
private double[] updateAntigenicPhenotype(int mutationIndexSite, String wildTypeAminoAcid, String mutantAminoAcid) {
double mutA = 0.0; // virus's location in the x dimension
double mutB = 0.0; // virus's location in the y dimension
if (!wildTypeAminoAcid.equals(mutantAminoAcid)) {
// get indices for the matrix based on the wild type and mutant amino acids
// matrix i,j correspond with the String "ACDEFGHIKLMNPQRSTWYV"
int mSiteMutationVectors = Biology.AlphabetType.AMINO_ACIDS.getValidCharacters().indexOf(wildTypeAminoAcid);
int nSiteMutationVectors = Biology.AlphabetType.AMINO_ACIDS.getValidCharacters().indexOf(mutantAminoAcid);

double[] mutations = Biology.SiteMutationVectors.VECTORS.getVector(mutationIndexSite, mSiteMutationVectors, nSiteMutationVectors); // use matrix at site # where mutation is occurring
mutA = getTraitA() + mutations[0]; // r * cos(theta)
thienktran marked this conversation as resolved.
Show resolved Hide resolved
mutB = getTraitB() + mutations[1]; // r * sin(theta)
}

return new double[]{mutA, mutB};
}

// Mutates nucleotide sequence at given nucleotideMutationIndex with given char mutantNucleotide.
// Returns the wild type and mutant amino acid as a String[]
//
// package private helper method so that the updating of the nucleotide sequence can be tested in TestGeometricSeqPhenotype.java
String[] mutateHelper(int nucleotideMutationIndex, char mutantNucleotide) {
int nucleotideMutationCodonIndex = nucleotideMutationIndex % 3; // index of nucleotide being mutated in codon {0, 1, 2}
int proteinMutationIndex = nucleotideMutationIndex / 3; // protein site # where mutation is occurring {0, . . ., total number of sites - 1}
int nucleotideMutationFirstCodonIndex = 3 * proteinMutationIndex; // start index of nucleotide being mutated in codon {0, 3, 6, . . .}

String wildTypeCodon = "" + this.nucleotideSequence[nucleotideMutationFirstCodonIndex] +
this.nucleotideSequence[nucleotideMutationFirstCodonIndex + 1] +
this.nucleotideSequence[nucleotideMutationFirstCodonIndex + 2];
String wildTypeAminoAcid = Biology.CodonMap.CODONS.getAminoAcid(wildTypeCodon);

// get new codon after mutation occurs
StringBuilder mutantCodon = new StringBuilder(wildTypeCodon);
mutantCodon.setCharAt(nucleotideMutationCodonIndex, mutantNucleotide);
String mutantAminoAcid = Biology.CodonMap.CODONS.getAminoAcid(mutantCodon.toString());

if (SANITY_TEST) {
// (proteinMutationIndex + 1) to show one-based numbering
TestGeometricSeqPhenotype.mutations.print((nucleotideMutationIndex + 1) + "," + wildTypeCodon + "," + mutantCodon + ",");
}

return new String[]{wildTypeAminoAcid, mutantAminoAcid};
}

/**
* Returns the virus's sequence, position in antigenic space,
* and cumulative number of epitope and non-epitope mutations in its evolutionary history
* Valid example outputs include "ACG, 0.0, 0.0, 0, 0" and "ACGTGTACGTGT, 2.3, 8.9, 40, 10".
*
* @return the String representation of the GeometricSeqPhenotype represented by this.
*/
public String toString() {
String fullString = String.format("%s, %.4f, %.4f, %d, %d", String.valueOf(this.nucleotideSequence), this.getTraitA(),
this.getTraitB(), this.eMutation, this.neMutation);
thienktran marked this conversation as resolved.
Show resolved Hide resolved
return fullString;
}

// Throws an exception if the representation invariant is violated.
private void checkRep() {
if (DEBUG) {
assert(this.nucleotideSequence.length == Parameters.startingSequence.length()) : "Nucleotide sequences must remain the same length throughout a Simulation";
for (int i = 0; i < this.nucleotideSequence.length; i += 3) {
String triplet = "" + this.nucleotideSequence[i] + this.nucleotideSequence[i + 1] + this.nucleotideSequence[i + 2];
String translatedAminoAcid = Biology.CodonMap.CODONS.getAminoAcid(triplet);

assert(!translatedAminoAcid.equals("STOP")) : "There should not be a stop codon at site " + (i/3);
}
}
}
}
2 changes: 1 addition & 1 deletion HostPopulation.java
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ public class HostPopulation {
private double tmrca;
private double netau;
private double serialInterval;
private double antigenicDiversity;
private double antigenicDiversity;

private int newContacts;
private int newRecoveries;
Expand Down
Loading