sxk_means_stable - Collect stable class averages with several independent runs of k-means


Usage in command line: stack outdir <maskfile> --K=2 --nb_part=5 --th_nobj=10 --rand_seed=10 --maxit=1000 --normalize --CTF --MPI

Usage in python programming:

Normal version:

k_means_stab_stream(stack, outdir, maskfile, K, nb_part, th_onobj, rand_seed, CTF)

MPI version:

k_means_stab_MPI_stream(stack, outdir, maskfile, K, nb_part, th_nobj, rand_seed, CTF)

'Note: when 2D input images were aligned (see sxali2d), the program will apply the 2D alignment parameters (xform.align2d) stored in headers prior to clustering.

MPI Note: MPI version is under development.

Examples: data.hdf kmeans_stab mask2d_26.hdf --K=8 --nb_part=5

mpirun -np 5 'bdb:data' kmeans_stab mask2d_26.hdf --K=8 --th_nobj=5 --MPI


the input stack images (bdb, hdf or txt)
name of directory where the results are written
optional mask file to be used (bdb or hdf)
  • Parameters preceded with -- are optional and their default values are given in parenthesis.

  • K
    the requested number of clusters (default 2)
    number of partitions used to select stable averages (default 5). In the mpi version, the nb_part is determined internally by the number of cpus used.

    minimum number of objects per class average required in the stable partition. All classes with a number of images per group < th_nobj will not transfer to the final partition (default 1, meaning keep all averages)

    the seed used to generate random numbers (set to 0)
    if set, CTF information stored in file headers will be used (default: no CTF).
    to use MPI version of k_means_stable


    the main logfile, all steps are written in order to watch the progress of the program
    the final stable class averages
    intermediate class averages produce by the independent runs of clustering. '**' correspond to the number of cluster with a format '00', if the number of partitions is 5, for example, the directory outdir/ will contain averages_00.hdf through averages_04.hdf

    Txt case

    This function is able to use a text file format as input. The structure file must contain one 'image' per line, and data of image must be separate by a space, ex:

    0.34 5.46 2.34 6.78

    3.78 2.23 1.78 5.67

    this file contain 2 'images' of 4 'pixels'. As this data will be convert to an image structure you can still use a hdf file to mask it. But the output directory will contain only text file format with the membership of each group for all partitions:

    the final stable membership, for each group *** (format to '000'), the file store the list of id of images in the group. If K is equal to 3, it will have tree files k_means_grp_000.txt, k_means_grp_001.txt and k_means_grp_002.txt
    intermediate membership produce by independent run of clustering. For each partition ** format ('00') a file is produce for every groups containing the list of id of images.


    K_means_stable will use the function k_means to repeat independents clustering of the data set. For a complete details of k-means parameters (K, option method, ...) see the function page sxk_means. The random seed of each clustering appears to the mainlog.txt file. After repeat nb_part clustering, all partitions will be matched together to compare their membership. If there are two partitions, the matching algorithm used will be the optimal Hungarian algorithm. Otherwise, if the number of partitions is more than two, then the matching algorithm used is an in-house branching algorithm. Images appearing to the same cluster each time will be kept to create a stable averages. A coefficient of stability in percent appears to the mainlog.txt file, whose value reflects the similarity between membership. If the number of images in a stable group is under the value th_nobj, those images will be consider as not used , and this average will be remove of the final stable class averages. th_nobj allows to remove useless averages which contains fewer images. Number of final averages appears also to the mainlog.txt file.


    This program used the function Munkres algorithm (or Hungarian algorithm) when the number of partition is two,, BSD-like Licence, copyright (c) 2008 Brian M. Clapper. Otherwise the in house branch algorithm will be used.

    Author / Maintainer

    Julien Bert



    See also

    sxk_means, sxk_means_group


    works for author, often works for others.


    HDF file limitation to the number of attributes see bug to sxk_means

    sxk_means_stable (last edited 2015-05-28 19:55:17 by penczek)