-
Notifications
You must be signed in to change notification settings - Fork 1
Ml_tomo_v3
Align and classify 3D images with missing data regions in Fourier space, e.g. subtomograms or RCT reconstructions, by a 3D multi-reference refinement based on a maximum-likelihood (ML) target function. For several cases, this method has been shown to be able to both align and classify in a completely __reference-free__manner, by starting from random assignments of the orientations and classes. The mathematical details behind this approach are explained in detail in
Scheres et al. (2009) Structure, 17, 1563-1572
Please cite this paper if this program is of use to you! There also exists a standardized python script xmipp_protocol__mltomo.py for this program. Thereby, rather than executing the command line options explained below, the user can submit his jobs through a convenient GUI in the GettingStartedWithProtocols, although we still recommend reading this page carefully in order to fully understand the options given in the protocol. Note that this protocol is available from the main xmipp_protocols setup window by pressing the Additional protocols button.)
Parameters
$--missing <metadata=> : Metadata file with missing data region definitions :
Angular sampling$--psi_sampling <float=-1.> : Angular sampling rate for the in-plane rotations(in degrees) :
Regularization$--reg_steps <int=5> : Number of iterations in which the regularization is changed from reg0 to regF :
Others$--thr <int=1> : Number of shared-memory threads to use in parallel :
Additional options:$--noimp_threshold <float=1.> : Threshold to avoid division by zero for weighted averaging :
The input metadata should contain theimage
column and =missingRegionNumber
, indicating the subtomogran filename and the missing region number, respectively. It canalso contains columns with angles and shift information. The output will be a metadatawith the same format. Follow is an example:
# XMIPP_STAR_1 *data_
loop_
_image
_missingRegionNumber
_angleRot
_angleTilt
_anglePsi
_shiftX
_shiftY
_shiftZ
_ref
_logLikelihood
32_000001.scl 1.00 0.000000 0.000 0.000 0.000000 0.000000 0 0.000000 1 32_000002.scl 1.00 0.000000 0.000 0.000 0.000000 0.000000 0 0.000000 1
# XMIPP_STAR_1 * # Wedgeinfo data_ loop_ _missingRegionNumber _missingRegionType _missingRegionThetaY0 _missingRegionThetaYF 1 wedge_y -64 64
The first columnmissingRegionNumber
(starting at 1) is required for each type of missing region, this number should appears in the input images metadata HeremissingRegionType
can be one of the following:
-
wedge_y
for a missing wedge where the tilt axis is along Y, columsmissingRegionThetaY0
andmissingRegionThetaYF
are used -
wedge_x
for a missing wedge where the tilt axis is along X, columsmissingRegionThetaX0
andmissingRegionThetaXF
are used -
pyramid
for a missing pyramid where the tilt axes are along Y and X, same columns aswedge_y
andwedge_x
are used -
cone
for a missing cone (pointing along Z) columnmissingRegionThetaY0
is used
In total 25 iterations will be performed. The run is started from a weighted average structure obtained from random orientations of all particles (i.e. probably some sort of blob). Initially, 15 iterations with an angular sampling of 15 degrees and exhaustive searches, then 5 iterations with an angular sampling of 10 degrees and search ranges of 50 degrees, and finally a sampling of 5 degrees and search ranges of 25 degrees. In the first run, small images (of size 32x32x32) are used to speed up the computationally expensive exhaustive searches. In the next runs the full-sized images are used, but the maximum resolution taken into account is limited to 0.35 pixel^-1. In all runs, the angular samplings will be perturbed by a different random rotation in each iteration.
mkdir run1_align
ml_tomo -i images.sel --oroot run1_align/nref1_15deg --nref 1 --doc images.doc --missing wedges.doc --iter 15 --ang 15 --dim 32 --perturb
ml_tomo -i images.sel --oroot run1_align/nref1_10deg --nref 1 --doc run1_align/nref1_15deg_it000015.doc --keep_angles --missing wedges.doc --iter 5 --ang 10 --ang_search 50 --maxres 0.35 --perturb
ml_tomo -i images.sel --oroot run1_align/nref1_5deg --nref 1 --doc run1_align/nref1_10deg_it000005.doc --keep_angles -missing wedges.doc --iter 5 --ang 5 --ang_search 25 --maxres 0.35 -perturb
In this example, the aligned data set from the previous example is divided into three classes. The angles from the previous iteration are kept in the initial reference generation, so that the three initial references will be aligned. Then, local angular searches around these angles are performed, so that the particles may re-adjust their orientation as the references improve due to the classification into distinct classes. This is done in two stages, one with an initial coarser sampling and larger search range, and a second one with finer sampling and a more limited search range. To prevent getting stuck in local minima in the early stages of the classification, a regularization is applied in the first run that imposes similarity on the three references during the first five iterations.
ml_tomo -i images.sel --oroot run2_3classes/nref3_10deg --nref 3 --doc run1_align/nref1_5deg_it000005.doc --keep_angles --missing wedges.doc --iter 20 --ang 10 --ang_search 50 --maxres 0.35 --perturb --reg0 5 --regF 0 --reg_steps 5
ml_tomo -i images.sel --oroot run2_3classes/nref3_5deg --nref 3 --doc run2_3classes/nref3_10deg_it000020.doc --keep_angles --missing wedges.doc --iter 5 --ang 5 --ang_search 25 --maxres 0.35 --perturb
Note that a MUCH faster classification may be obtained by keeping the angles completely fixed and only perform a separation into classes. This will only work if the orientations are not (much) affected by the alignment against the single consensus average. In this case, one also has the option to provide a mask (1=to be classified, 0=to be ignored) to focus the classification on an interesting area in the images (not tested extensively yet). The syntaxis would be:
ml_tomo -i images.sel --oroot run2_3classes/nref3_noalign --nref 3 --doc run1_align/nref1_5deg_it000005.doc --keep_angles --missing wedges.doc --iter 20 --dont_align --maxres 0.35 --reg0 5 --regF 0 --reg_steps 5 --mask interesting.msk
In the two examples above, the images were first aligned in a reference-free manner using a single class and then classified into three classes. This process may be combined in a single run by using a three-reference refinement where both the initial class assignments and the initial orientation assignments are random. Again, to speed up the process, small images and relatively coarse angular samplings are used. As explained above, subsequent runs may be performed with finer angular samplings, bigger images etc.
ml_tomo -i images.sel --oroot run3_3classes/nref3_15deg --nref 3 --doc images.doc --missing wedges.doc --iter 25 --ang 15 --dim 32 --perturb --reg0 5 --regF 0 --reg_steps 5
Because the Gaussian distributions inside the ML calculations use squared residuals as distance metric, they are highly sensitive to the absolute intensities in the reference. As long as your reference comes from the same data set you want to align (e.g. from the reference-free protocols mentioned above) this is not a problem. However, often we have an external reference structure that is not on the correct intensity (or grey) scale. In that case it is better to use a constrained cross-correlation coefficient (which is normalized and therefore invariant to the intensity scale). In the example below the same reference is used in three subsequent runs. This reference is assumed to be a nice one (e.g. from the EMDB or PDB) and only the angular sampling is gradually decreased.
ml_tomo -i images.sel --oroot run4_extref/15deg --ref myreference.vol --doc images.doc --missing wedges.doc --iter 1 --ang 15
ml_tomo -i images.sel --oroot run4_extref/7deg --ref myreference.vol --doc run4_extref/15deg_it000001.doc --missing wedges.doc --iter 1 --ang 7 --ang_search 20
ml_tomo -i images.sel --oroot run4_extref/3deg --ref myreference.vol --doc run4_extref/7deg_it000001.doc --missing wedges.doc --iter 1 --ang 3 --ang_search 10
1. How should I prepare my data?
-
It is not necessary to downscale your images (to obtain faster results), as this can be done internally with
-dim
option of the program. -
Any density that is not related to the molecule you want to average will bother with your alignment, and probably even more with your classification. Therefore, try to avoid images with strong densities for gold particles, neighbouring molecules or other artifacts. Note that masking these densities out may lead to under-estimation in the standard deviation of the noise, which is usually a bad thing to do in ML-restimation. Therefore, windowing your particles more tightly may be a better option, although extensive testing on this issue has not yet been performed.
-
The probability calculations (i.e. the similarity measures) are based on squared differences between the reference and the experimental images. This makes them highly sensitive to differences in grey-scale intensity of background mean. Therefore, normalization of your input data is important. One can use theNormalize program, with the
-vol
option to set the average to zero and the standard deviation to one for each image. Alternatively, one can do this within a binary mask (1=protein, 0=solvent) using the following script:
#!/bin/csh -f
#
#
set volin = $1
set mask = $2
set volout = $3
if ( $# == 3 ) then
set avesig=`xmipp_statistics -i $volin --mask $mask | tail -1|awk '{print $6,$8}'`
echo $avesig
xmipp_operate -i $volin -minus $avesig[1] -o $volout
xmipp_operate -i $volout -divide $avesig[2] -o $volout
else
echo "Usage: normalize_within_mask Vin mask Vout"
endif
2. What is the convention of the Euler angles and translations?
All input and output angles and translations are to transform the experimental images onto the reference structure(s). First the translations in x, y and z are applied, and then the rotation is applied (both according toConventions).
Note that one can convert these transformations to other conventions. For example, Julio Ortiz (Martinsried) figured out the following convention to TOM Toolbox.
phi_tom = 90 - rot_xmipp
psi_tom = 270 - psi_xmipp
theta_tom = tilt_xmipp
xoff_tom = xoff_xmipp
yoff_tom = yoff_xmipp
zoff_tom = zoff_xmipp
But then, the transformation in TOM is used to bring the reference onto the experimental images, so that the order of the angles must be changed and the origin offsets should be inverted:
phi_tom_b = -psi_tom
theta_tom_b = -theta_tom
psi_tom_b = -phi_tom
xoff_tom = -xoff_tom
yoff_tom = -yoff_tom
zoff_tom = -zoff_tom