sxisac - Perform Iterative Stable Alignment and Clustering (ISAC) on a 2-D image stack
sxisac.py stack_file --ir=ir --ou=ou --rs=rs --xr=xr --yr=yr --ts=ts --maxit=maxit --dst=dst --FL=FL --FH=FH --FF=FF --init_iter=init_iter --main_maxit=main_iter --iter_reali=iter_reali --match_first=match_first --max_round=max_round --match_second=match_second --stab_ali=stab_ali --thld_err=thld_err --indep_run=indep_run --thld_grp=thld_grp --img_per_grp=img_per_grp --generation=generation
sxisac exists only in MPI version.
Before running the ISAC program, it is strongly recommended to perform the following steps to the ensure the best performance of the ISAC program.
1. Due to long running time (see note below) it is advisable to reduce image size to 64x64. This can be done in one of the two ways: (i) if images are reasonably centered and they fit into the original window tightly, one can reduce the size of original data:
- sxprocess.py bdb:orgstack bdb:stack --changesize --ratio=0.25
where the ratio is computed as the original image size in pixels divided by 64. (ii) alternatively, if images are not well centered and relatively small in comparison with the original window size, it is preferable to first pre-align them (do steps 2-4 including sxtransform2d.py), and only then reduce the size AND window so the particles fit 64 windows tightly.
2. For each particle in the stack, set the attribute 'xform.align2d' to 0
- sxheader.py bdb:stack --params=xform.align2d --zero
For Random Conical Tilt (RCT) work it is advisable to set unique identifier in the headers so images in the ultimate class averages can be connected to their tilted counterparts. We assume that the number of untitled the tilted images are the same, they result in two equal-lengths stacks and that both stacks have the same numbering, i.e., k'th untitled particle corresponds to kith tilted particle. Note the parameter name below is arbitrary, one can use any name (not conflicting with reserved names) and there can be more than one, if necessary.
- sxheader.py bdb:stack --params=myID --consecutive
3. Phase-flip all particles in the stack. Currently the ISAC program works only with phase-flipped data, so for cryo data this is the place to use the CTFs.
- sxprocess.py bdb:stack_ali bdb:test --phase_flip --shift
4. Pre-align the particles in the stack and apply the resulting parameters to create a pre-aligned stack. Here, ou is the radius of the alignment area, you need to set it based on your situation. Generally speaking, it should be slightly larger than half of your particle size. For example, if your image is 128x128, but the particle is only of size 80, you should set ou to be slightly larger than 40, say 45. Other parameters generally work and there is no need to change them.
- mpirun -np 16 sxali2d.py bdb:stack jojo --ou=28 --xr="2 1" --ts="1 0.5" --maxit=20 --dst=90 --MPI
- sxtransform2d.py bdb:stack bdb:test
5. After these preparation steps, one may begin to run the ISAC program. In most cases, the default values of the program work good and there is no need to change it. The only parameters you may want to change is generation, img_per_grp, stab_ali and thld_err. The most important one is thld_err, which is the threshold for pixel error. If you generate none or very few class averages in the first generation, you may want to loosen this threshold two to three times. For detailed explanation of all parameters, please refer to the section below.
The ISAC program is designed to run in multiple generations in the following way: we first run the first generation on the data set, which will generate some class averages. We call the particles that are assigned to these class averages accounted for and set them aside. We will make a stack of the unaccounted for particles and run the second generation on this stack. Then we set aside the particles that are accounted for in the second generation and make a new stack for the particles that are still unaccounted for. We will run the third generation on the new stack, and so on. The process repeated until no or very few class averages can be generated in one generation. The parameter generation is used to tell the program which generation you are running right now.
- mpirun -np 32 sxisac.py bdb:test --img_per_grp=150 --generation=1
- e2bdb.py bdb:test --makevstack=bdb:test2 --list=this_generation_1_unaccounted.txt
- mpirun -np 32 sxisac.py bdb:test2 --img_per_grp=150 --generation=2
- e2bdb.py bdb:test --makevstack=bdb:test3 --list=this_generation_2_unaccounted.txt
- mpirun -np 32 sxisac.py bdb:test3 --img_per_grp=150 --generation=3
- e2bdb.py bdb:test --makevstack=bdb:test4 --list=this_generation_3_unaccounted.txt
- mpirun -np 32 sxisac.py bdb:test4 --img_per_grp=150 --generation=4
The ISAC program needs MPI environment to work properly. Number of used MPI processes MUST BE multiplicity of given indep_run parameter (see parameters list below).
Depending on the cluster you are running, the way of using MPI will be significantly different. On some clusters,
- mpirun -np 32 sxisac.py ...
will be sufficient. On some clusters, one need to specify the host name:
- mpirun -np 32 --host node1,node2,node3,node4 sxisac.py ...
On some clusters, one need to submit a script to run MPI, please ask your system manager about how to run MPI program on your machine.
Also, different systems have different ways of storing the printout. On some clusters, printout is automatically saved. If it is not, it is recommended to use the linux command nohup to run the program, so that the printout is automatically saved to a textfile nohup.out. For example:
- nohup mpirun -np 32 sxisac.py bdb:test --img_per_grp=150 --generation=1
If there is no nohup on your system, it is recommended to redirect the printout to a textfile.
mpirun -np 32 sxisac.py bdb:test --img_per_grp=150 --generation=1 > output.txt
Time and Memory
Unfortunately, ISAC program is very time- and memory-consuming. For example, on my cluster, it takes 15 hours to process 50,000 64x64 particles on 256 processors. Therefore, before embarking on the big dataset, it is recommended to run a test dataset (about 2,000~5,000 particles) first to get a rough idea of timing. If the timing is beyond acceptable, the first parameter you may want to change is max_round, you may change it to 10 or even 5 with mild effects on the results.
set of 2-D images in a stack file (format must be bdb), images have to be square (nx=ny)
The parameters preceded with -- are optional and default values are given in parenthesis.
- inner ring of the resampling to polar coordinates (default = 1, units - pixels)
- outer ring of the resampling to polar coordinates (default = nx/2-1, units - pixels)
- ring step of the resampling to polar coordinates (default = 1, units - pixels)
- x range of translational search (default = 1.0, units - pixels)
- y range of translational search (default = 1.0, units - pixels)
- search step of translational search (default = 1.0, units - pixels)
- number of iterations for reference-free alignment (default = 30)
- discrete angle used in within group alignment (default = 90.0)
- lowest stopband frequency used in the tangent filter (default = 0.2)
- highest stopband frequency used in the tangent filter (default = 0.3)
- fall-off of the tangent filter (default = 0.2)
- number of runs of ab-initio within-cluster alignment for stability evaluation in SAC initialization (default = 3)
- number of runs of ab-initio within-cluster alignment for stability evaluation in SAC (default = 3)
- every iter_reali iterations of SAC stability checking is performed (default = 1)
- number of iterations to run 2-way matching in the first phase (default = 1)
- maximum rounds of generating candidate class averages in the first phase (default = 20)
- number of iterations to run 2-way (or 3-way) matching in the second phase (default = 5)
- number of alignments when checking stability (default = 5)
- the threshold of pixel error when checking stability, equals root mean square of distances between corresponding pixels from set of found transformations and theirs average transformation, depends linearly on square of radius (parameter ou) (default = 0.7, units - pixels)
- specifies the level of m-way matching for reproducibility tests. The default = 4 will perform full ISAC to 4-way matching. Value indep_run=2 will restrict ISAC to 2-way matching and 3 to 3-way matching. Note the number of used MPI processes requested in mpirun must be a multiplicity of indep_run.
- the threshold of the size of reproducible class (essentially minimum size of class) (default = 10)
- number of images per class in the ideal case (essentially maximum size of class) (default = 100)
- the n-th approach on the dataset (default = 1)
For each generation of running the program, there are two phases. The first phase is an exploratory phase. In this phase, we set the criteria to be very loose and try to find as much candidate class averages as possible. This phase typically should have 10 to 20 rounds (set by max_round, default = 20). The candidate class averages are stored in class_averages_candidate_generation_n.hdf.
The second phase is where the actual class averages are generated, it typically have 3~9 iterations (set by match_second, default = 5) of matching. The first half of iterations are 2-way matching, the second half of iterations are 3-way matching, and the last iteration is 4-way matching. In the second phase, three files will be generated:
class_averages_generation_n.hdf : class averages generated in this generation, there are two attributes associated with each class average which are important. One is members, which stores the particle IDs that are assigned to this class average; the other is n_objects, which stores the number of particles that are assigned to this class average.
generation_n_accounted.txt : IDs of accounted particles in this generation
generation_n_unaccounted.txt : IDs of unaccounted particles in this generation
RCT information retrieval
Let us assume we would want to generate a RCT reconstruction using as a basis group number 17 from ISAC generation number 3. We have to do the following steps:
- Retrieve original image numbers in the selected ISAC group. The output is list3_12.txt, which will contain image numbers in the main stack (bdb:test) and thus of the tilted counterparts in the tilted stack.
- sxprocess.py bdb:test3 class_averages_generation_3.hdf list3_12.txt --isacgroup=12 --params=myid
- Extract the identified images from the main stack (into subdirectory RCT, has to be created):
- e2bdb.py bdb:test --makevstack=bdb:RCT/group3_12 --list=list3_12.txt
- Extract the class average from the stack (NOTE the awkward numbering of the output file!).
- e2proc2d.py --split=12 --first=12 --last=12 class_averages_generation3.hdf group3_12.hdf
- Align particles using the corresponding class average from ISAC as a template (please adjust the parameters):
- sxali2d.py bdb:RCT/group3_12 None --ou=28 --xr=3 --ts=1 --maxit=1 --template=group3_12.12.hdf
- Extract the needed alignment parameters. The order is phi,sx,sy,mirror. sx and mirror are used to transfer to tilted images.
- sxheader.py group3_12.12.hdf --params=xform.align2d --export=params_group3_12.txt
Yang, Z., Fang, J., Chittuluru, F., Asturias, F. and Penczek, P. A.: Iterative Stable Alignment and Clustering of 2D Transmission Electron Microscope Images, Structure 20, 237-247, February 8, 2012.
Author / Maintainer
Zhengfan Yang, Jia Fang, Francisco Asturias, and Pawel A. Penczek
- category 1
- works for author, works for others as far as the author knows.
None right now.