sxrviper - Initial 3D Model - RVIPER: Reproducible ab initio 3D structure determination. The program is designed to determine a validated initial intermediate resolution structure using a small set (<100?) of class averages produced by ISAC.
usage in command line
sxrviper.py stack output_directory --ir=inner_radius --radius=outer_radius --rs=ring_step --xr=x_range --yr=y_range --ts=translational_search_step --delta=angular_step --center=center_type --maxit1=max_iter1 --maxit2=max_iter2 --L2threshold=0.1 --doga=doga --ref_a=S --sym=c1 --n_shc_runs=n_shc_runs --n_rv_runs=n_rv_runs --n_v_runs=n_v_runs --outlier_percentile=outlier_percentile --iteration_start=iteration_start --npad=npad --fl=fl --aa=aa --pwreference=pwreference --mask3D=mask3D --moon_elimination --criterion_name --outlier_index_threshold_method --angle_threshold=angle_threshold
sxrviper exists only in MPI version.
mpirun --npernode 16 -np 48 --host node1,node2,node3 sxrviper.py stack output_directory --radius=outer_radius --outlier_percentile=95 --fl=0.25 --xr=2 --moon_elimination=750,4.84
The RVIPER program needs MPI environment to work properly. Number of used MPI processes MUST BE a multiple of --n_shc_runs (default = 3).
Since RVIPER makes use of group of processors working together, it is important from a time efficiency point of view to have processors within a group being allocated on the same node. This way any data exchange within the group does not use network traffic. The "--npernode" option of mpirun is useful in accomplishing this goal. As shown in the example below when "--npernode" is used mpi allocates the ranks of the processors sequentially, not moving to the next node until the current one is filled. If "--npernode" is not used then processors are allocated in a round robin fashion (i.e. jumping to the next node with each allocation). Since in VIPER, groups contain consecutively ranked processors, it is important to provide "--npernode XX" where XX is the number of processors per node.
Time and Memory
On our cluster, it takes about 6 hours to process 400 88x88 particles on 64 processors. Memory needs are about 0.5GB per processor.
- Input images stack: The images must be square. (default required string)
- Particle radius [Pixels]: has to be less than half the box size. (default required int)
- Point-group symmetry: (default c1)
GA population size: This defines the number of quasi-independent volumes generated. (same as '--nruns' parameter from sxviper.py. (default 4)
- Output directory: The directory will be automatically created and the results will be written here. If the directory already exists, results will be written there, possibly overwriting previous runs. (default required string)
The directory structure generated by sxrviper is shown in the figure below. Each "runXXX" directory contains the output of running the VIPER algorithm (please see sxviper). The "runXXX" directory contains the reconstructed volume of stage1, refvolf2.hdf, and parameters into refparams2.txt. After stage 2, the final volume and parameters will be written to volf.hdf and params.txt. Other output files are log.txt and previousmax.txt. Each "mainXXX" directory contains the output of "n_v_runs" viper runs (default 4). The number of "mainXXX" directories is given by "n_rv_runs".
- This program uses multiple VIPER runs to find unstable projections. Based on user chosen criterion it eliminates the unstable projections and reruns again until all projections are stable. Since the VIPER program is used as a building block all requirements from VIPER must be satisfied. Attributes xform.projection have to be set in the header of each file. If their values are not known, all should be set to zero.
- Determining whether the "n_v_runs" reconstructed volumes in the current RVIPER iteration have a core set of stable projections is done using one of the following criteria shown in the figures below. The y axis represents the error angle. For example, if a projection has the following assigned angles in three different reconstructed volumes 30,45 and 55 then the error associated with this image is abs(30-45) + abs(30-55) + abs(45-55))/3 = 16.6. The x axis represents the image index of the sorted array of error angles.
- The first criterion, called "80th percentile" (left image) is satisfied when the 80th percentile is less or equal to 20% of the maximum.
- The second criterion, called "fastest increase in the last quartile" is satisfied when the last quartile has a length greater than 20% of the maximum.
- If finishing criterion is not met after executing 10 VIPER runs, (the criterion fails for all combinations of "n_v_runs"(default=3) taken by 10 (120 in total)) then the program stops.
- Once a criterion is met, a decision is made regarding which images to keep. Currently there are three options implemented:
- percentile (all images (sorted by their angle error), below "outlier_percentile" given in the command line are kept for the next iteration)
- angle_measure (all images that have angle error below "angle_threshold" given in the command line are kept for the next iteration)
- discontinuity_in_derivative (as shown in the figure below, two lines (green and red) are fitted together against the error curve (blue) while their common point moves along the x axis between 80th percentile and "outlier_percentile" (provided in the command line). The point on the x coordinate where the projections of the best fit lines meet is chosen as the outlier index threshold. All images before it are kept for the next iteration.
Example of RVIPER output
In the example below, RVIPER found in the third iteration (main003) a set of 3 reconstructed volumes whose projections show stable angle assignment. Based on the three reconstructed volumes the program generates "variance_volume.hdf" and "average_volume.hdf" which can be used as an initial reference.
Penczek 1994, "The ribosome at improved resolution: new techniques for merging and orientation refinement in 3D cryo-electron microscopy of biological particles", Ultramicroscopy 53, 251-270.
Author / Maintainer
Horatiu Voicu, Pawel A. Penczek
- category 1
- category 3
- works for author, often works for others.
Did not discover any yet.