Name

sxk_means_stable - Collect stable class averages with several independent runs of k-means

Usage

Usage in command line:

sxk_means_stable.py stack outdir <maskfile> --K=2 --nb_part=5 --th_nobj=10 --rand_seed=10 --maxit=1000 --normalize --CTF --MPI

Usage in python programming:

Normal version:

k_means_stab_stream(stack, outdir, maskfile, K, nb_part, th_onobj, rand_seed, CTF)

MPI version:

k_means_stab_MPI_stream(stack, outdir, maskfile, K, nb_part, th_nobj, rand_seed, CTF)

'Note: when 2D input images were aligned (see sxali2d), the program will apply the 2D alignment parameters (xform.align2d) stored in headers prior to clustering.

MPI Note: MPI version is under development.

Examples:

sxk_means_stable.py data.hdf kmeans_stab mask2d_26.hdf --K=8 --nb_part=5

mpirun -np 5 sxk_means_stable.py 'bdb:data' kmeans_stab mask2d_26.hdf --K=8 --th_nobj=5 --MPI

Input

stack
the input stack images (bdb, hdf or txt)
outdir
name of directory where the results are written
maskfile
optional mask file to be used (bdb or hdf)
  • Parameters preceded with -- are optional and their default values are given in parenthesis.

  • K
    the requested number of clusters (default 2)
    nb_part
    number of partitions used to select stable averages (default 5). In the mpi version, the nb_part is determined internally by the number of cpus used.
    th_nobj

    minimum number of objects per class average required in the stable partition. All classes with a number of images per group < th_nobj will not transfer to the final partition (default 1, meaning keep all averages)

    rand_seed
    the seed used to generate random numbers (set to 0)
    CTF
    if set, CTF information stored in file headers will be used (default: no CTF).
    MPI
    to use MPI version of k_means_stable

    Output

    outdir/main_log.txt
    the main logfile, all steps are written in order to watch the progress of the program
    outdir/averages.hdf
    the final stable class averages
    outdir/averages_**.hdf
    intermediate class averages produce by the independent runs of clustering. '**' correspond to the number of cluster with a format '00', if the number of partitions is 5, for example, the directory outdir/ will contain averages_00.hdf through averages_04.hdf

    Txt case

    This function is able to use a text file format as input. The structure file must contain one 'image' per line, and data of image must be separate by a space, ex:

    0.34 5.46 2.34 6.78

    3.78 2.23 1.78 5.67

    this file contain 2 'images' of 4 'pixels'. As this data will be convert to an image structure you can still use a hdf file to mask it. But the output directory will contain only text file format with the membership of each group for all partitions:

    outdir/kmeans_grp_***.txt
    the final stable membership, for each group *** (format to '000'), the file store the list of id of images in the group. If K is equal to 3, it will have tree files k_means_grp_000.txt, k_means_grp_001.txt and k_means_grp_002.txt
    outdir/k_means_part_**_grp_***.txt
    intermediate membership produce by independent run of clustering. For each partition ** format ('00') a file is produce for every groups containing the list of id of images.

    Description

    K_means_stable will use the function k_means to repeat independents clustering of the data set. For a complete details of k-means parameters (K, option method, ...) see the function page sxk_means. The random seed of each clustering appears to the mainlog.txt file. After repeat nb_part clustering, all partitions will be matched together to compare their membership. If there are two partitions, the matching algorithm used will be the optimal Hungarian algorithm. Otherwise, if the number of partitions is more than two, then the matching algorithm used is an in-house branching algorithm. Images appearing to the same cluster each time will be kept to create a stable averages. A coefficient of stability in percent appears to the mainlog.txt file, whose value reflects the similarity between membership. If the number of images in a stable group is under the value th_nobj, those images will be consider as not used , and this average will be remove of the final stable class averages. th_nobj allows to remove useless averages which contains fewer images. Number of final averages appears also to the mainlog.txt file.

    Reference

    This program used the function Munkres algorithm (or Hungarian algorithm) when the number of partition is two, http://www.clapper.org/bmc, BSD-like Licence, copyright (c) 2008 Brian M. Clapper. Otherwise the in house branch algorithm will be used.

    Author / Maintainer

    Julien Bert

    Keywords

    Files

    applications.py

    See also

    sxk_means, sxk_means_group

    Maturity

    beta
    works for author, often works for others.

    Bugs

    HDF file limitation to the number of attributes see bug to sxk_means

    sxk_means_stable (last edited 2015-05-28 19:55:17 by penczek)