-
Notifications
You must be signed in to change notification settings - Fork 8
MCO AC
The procedure is rather dated, nevertheless still functional. Hence, the documentation is still included.
Another option is to use a classification algorithm based on correlation to different class averages (multiple correlation optimization, MCO). The goal is to find a distribution of particles that optimises the correlation to a set of different references.
MCO comes in two different flavors: the simple version is a optimization by
a local, gradient optimization (expectation maximization) akin to k-means classification
(MCO-EXMX
). Alternatively, stochastic elements (simulated annealing) can be
incorporated into the classification procedure (MCO-AC
)
(Hrabe et. al. 2012).
In short, the idea is to group the subtomograms such that they match best to
a class average. In an iterative procedure the assignment of particles is
changed such that the (constrained) correlation of the class average to its respective
class members get optimized (usage of the constrained correlation makes sure
that the missing wedge is considered in the similarity measure).
The simplest optimization algorithm is expectation maximization (EXMX), which
is also the basis for subtomogram alignment. This algorithm converges rapidly,
but it can end up in local minima. Therefore, we designed an 'outer loop'
for performing EXMX multiple times in the AC algorithm. At the start of each iteration the class
assignments are perturbed and then optimized using EXMX. The new assignments
are accepted or rejected according to the so-called Metropolis criterion,
which depends on a pseudo-temperature. Thus, MCO-AC
is a (computationally
much more demanding) extension of MC-EXMX
.
The method are called using the scripts mcoEXMX.py
or mcoAC.py
like this:
mpirun --hostfile "pathToYourHostfile" -c "numberOfCPUs" pytom
PathToPytom/bin/mcoAC.py -j class.xml
<Class Name="0"/>
)
or no class property at all. The program will take more than one class as
a priori distribution and continue from there.
Required parameters for classification jobs are:
-
Particle List. A particle list in
XML
format that points to all subtomograms. - Mask. The mask may be spherical or non-symmetrical. To focus alignment on a particular region, you may use a mask of aribtrary shape. Spherical masks Can be generated using PyTom following these instructions.
-
Number of local refinment rounds: (
MCO-EXMX
) - Number of classes: The initial number of classes used for classification. The final number of classes can be different because classes can be depopulated.
- Convergence criterion: If the numer of class changes drops below this value (0 ≤ value ≤ 1) , classification will terminate
-
Temperature type (
MCO-AC
only): A little explanation. -
Classification criterion (
MCO-AC
only): Can either be Metropolis criterion (), or Threshold Acceptance () -
Initial temperature (
MCO-AC
only): A little explanation. -
Annealing step (
MCO-AC
only): (decrease) during iterations
MCO-EXMX
and MCO-AC
We start with the simpler Output of MCO-EXMX
. Inside the destination directory, you will find:
-
RandomisedParticleList.xml
: this is the initial random class distribution determined prior to starting the classification -
directories
0,1,2, ... N
: these are the directories for each EXMX iteration. Inside each of these directories you will find:- directories
class0
,class1
, ...,classC
: these C (=number of classes) directories contain class averages determined according to the current class distribution. -
AlignmentList-1.xml
,AlignmentList-2.xml
, ...AlignmentList-N.xml
: AlignmentLists after each iteration. These AlignmentLists contain the class assignments. - Scores determined for each subtomogram with the respective class average
-
classifiedParticles.xml
: Particle list with class labels determined in the current EXMX iteration. Class averages will be created at the start of the next iteration.
- directories
Output of MCO-AC
in the destination directory:
-
directories
0,1,2, ..., N_anneal
: These are the directories for each annealing round.
Inside of each directory there is the content of anMCO-EXMX
run (see above). -
swapList.xml
: List of class swaps induced by annealing. -
currentBestParticleList.xml
: The best scoring particle assignment is stored. During the annealing iterations it may get worse.
postLocalizationClassification
with MCO-EXMX
and classification after alignment with MCO-AC
. Please note that the order was chosen based on our standard workflow but can be swapped for individual adjustment.
-
MCO-EXMX
ThepostLocalizationClassification
folder contains all scripts and files required to run a coarse classification on the tutorial particles. As specified injob.xml
, it will classify all particles based on their orientations determined during localization. Furthermore, it will split the whole data-set into 5 (NumberClasses
) classes and run through 10 iterations (NumberIterations
). It will stop before the 10th iteration if less than 0.05 (EndThreshold
) : 5% of all particles have changed their class assignment.
Please note that the initial cluster assignments are distributed randomly so that results are not necessarily reproduceable. If you want to reproduce results, you must start with theRandomisedParticleList.xml
in theresults
directory. Classification progress is deterministic from then on.
ThemergeClasses.sh
script demonstrates how to merge multiple classes back into one. Running all commands in this script will ultimately display a similarity tree where you can determine which classes to merge and should be used for further resolution refinement. Please also follow the scripts at the bottom of this page. -
MCO-AC
ThemcoAClassification
folder contains classification scripts and results after the previouslyMCO-EXMX
classified particles were aligned. As mentioned above and in Hrabe et. al. 2012,MCO-AC
is essentially an extended version ofMCO-EXMX
. Classification is adjusted to 3 (NumberIterations
) annealing iterations, meaning that each of three rounds ofMCO-EXMX
will be followed by an annealing step where particles randomly change class labels. This stochastic process is adjusted (on the bottom ofjob.xml
) toAnnealingTemperature
where the particle temperature and step is specified. The most important feature ofAnnealingCriterion
isAllowClassCollapse
. Random assignment may cause extinction where no particles will be assigned to one class. If you setAllowClassCollapse
toTrue
, you allow for this to happen. If it is set toFalse
, the algorithm will repeat random assignment up to 10x in case one class goes missing. If this happens over 10 tries, the algorithm will continue with less one class. WhileMCO-AC
is running, you will see result folders come and go. These folders storeMCO-EXMX
results and will be deleted after each iteration. The realMCO-AC
iscurrentBestParticleList.xml
with the best class assignment found thus far.
It is a common problem of classification algorithms that they tend to ignore small classes. A solution to this problem is to oversample the class number in the classification: e.g., if ~4 classes are expected to be present in the dataset one may initially group the dataset into ~20 classes, which increases the chances that a sparse class will be distinguished and not fused into a more populated class. Thus, of the ~20 classes, many classes will be essentially identical, which can subsequently be merged again.
In order to merge classes by hirarchical clustering, you have to run the following commands.
-
Average: Generate class averages with this script:
$ ipytom from pytom.basic.structures import ParticleList pl = ParticleList() pl.fromXMLFile('bestAfter31Its.xml') pls.sortByClassLabel() pls = pl.splitByClass() for i in xrange(len(pls)): c = pls[i] print i, ' == classlabel : ', c[0].getClassName(), ' Class size: ' , len(c) c.average('classAverages/class' + str(i) + '.em')
classAveragesList = ParticleList('classAverages') classAveragesList.loadDirectory() classAveragesList.toXMLFile('classAverageList.xml')
-
CCC: Determine a correlation matrix between all class averages, which can then be used for hierarchical clustering. Copy the below into a script and run either in parallel or sequential.
$ ipytom from pytom.cluster.correlationMatrixStructures import CorrelationMatrixJob from pytom.basic.structures import ParticleList,Mask,Particle from pytom.cluster.correlationMatrix import distributedCorrelationMatrix import os pl = ParticleList() pl.fromXMLFile(''classAverageList.xml')
mask=Mask('./mask.em') job = CorrelationMatrixJob(particleList=pl, mask=mask, resultMatrixName = 'correlationMatrix.em',applyWedge=False, binningFactor=2, lowestFrequency=0, highestFrequency=17) distributedCorrelationMatrix(job,False)
-
CCC to ascii: Convert matrix from EM format to text file that allows import to
R
:$ ipytom from pytom_volume import read from pytom.tools.files import writeMatrix2RMatrix m = read('correlationMatrix.em') writeMatrix2RMatrix(m,'correlationMatrix-R.txt')
-
Clustering: Code for hierarchical classification using the following commands in
scipy
:$ ipytom from pytom.basic.files import read from pytom_volume import abs from pytom_numpy import vol2npy from scipy.cluster.hierarchy import linkage,dendrogram from matplotlib.pyplot import show
matrix = abs(read('correlationMatrix.em')-1)
classification= linkage(vol2npy(matrix)) print classification dendrogram(classification) show()
-
Merge classes:
Using the below script you first split the particle list according the previously defined
(too fine) classes and subsequently merge selected classes (in this example class 0, 2, and 3
into one class, classes 1 and 4 into another, and 5 and 6 into a third new class).
$ipytom from pytom.basic.structures import ParticleList pl = ParticleList() pl.fromXMLFile(''classAverageList.xml') pls.sortByClassLabel() pls = pl.splitByClass()
proteinState1 = pls[0] + pls[2] + pls[3] proteinState1.average('proteinInState1') proteinState2 = pls[1] + pls[4] proteinState2.average('proteinInState2')
outliers = pls[5] + pls[6] outliers.average('outliers')
Please note that we provide the mcoEXMXJob.py
and mcoACJob.py
scripts to set up classification jobs through the terminal. Check the terminal parameters in mcoEXMXJob.py --help
and mcoACJob.py --help
for details.
You can perform classification directly after localization in order to get rid of obvious false positives such as gold beads. In order to do so, we recomend to set up an MCO-EXMX classification job (without annealing).
Hence, you can directly classify your reconstructed subtomograms using the coarse alignment determined during localization. Please read below how to set up an MCO-EXMX job, which is essentially identical to setting up a coarse MCO-AC job. You will have to use the particle list you obtained after localization.
After MCO-EXMX, you want to select the good subtomograms. You can do so by running the following script:
$ipytom
from pytom.basic.structures import ParticleList
pl = ParticleList()
#6 is the iteration number
#classifiedParticles.xml is the result
pl.fromXMLFile('./6/classifiedParticles.xml')
pl.sortByClassLabel()
pls = pl.splitByClass()
for cl in pls:
className = cl[0].getClassName()
cl.average('class' + str(className) + '.em')
print className, ' contains ' , len(cl) , ' particles'
#choose the corresponding result class you want to use for further processing, e.g., classes 1,2,3,4
#the decision on which particles to take you will make based on the averages written out above
ribos = pls[1] + pls[2] + pls[3] + pls[4]
ribos.toXMLFile('filtered_pl.xml')
ribos.average('./filteredIniRef.em')