OpenCV 中 Kmeans 用法整理

来源:互联网 发布:大数据探索性分析 编辑:程序博客网 时间:2024/05/01 09:42

1.K-Means clustering in OpenCV

K-Means is an algorithm to detect clusters in a given set of points. It does this without you supervising or correcting the results. It works with any number of dimensions as well (that is, it works on a plane, 3D space, 4D space and any other finite dimensional spaces). And OpenCV comes with this algorithm built right into it!

K-means with OpenCV’s C++ interface

The function you need to call to execute the algorithm is:

double kmeans(const Mat& samples,              int clusterCount,              Mat& labels,              TermCriteria termcrit,              int attempts,              int flags,              Mat* centers)

This function is in the cv namespace. So you can use it by cv::kmeans or by simply including the cv namespace. If you know how K-means works, the parameters should be self explanatory.

Parameters

  • samples(input) The actual data points that you need to cluster. It should contain exactly one point per row. That is, if you have 50 points in a 2D plane, then you should have a matrix with 50 rows and 2 columns.
  • clusterCount(input) The number of clusters in the data points.
  • labels(output) Returns the cluster each point belongs to. It can also be used to indicate the initial guess for each point.
  • termcrit(input) This is an iterative algorithm. So you need to specify the termination criteria (number of iterations & desired accuracy)
  • attempts(input) The number of times the algorithm is run with different center placements
  • flags(input) Possible values include:
    • KMEANS_RANDOM_CENTER: Centers are generated randomly
    • KMEANS_PP_CENTER: Uses the kmeans++ center initialization
    • KMEANS_USE_INITIAL_LABELS: The first iteration uses the suppliedlabels to calculate centers. Later iterations use random or semi-random centers (use the above two flags for that).
  • centers(output) This matrix holds the center of each cluster.

Returns

The function returns the compactness of the final clustering. What is compactness? It’s a measure of how good the labeling was done. The smaller the better.

When attempts is 1, the value returned is the compactness of the only iteration that happened. If attempts is more than 1, the final labeling returned is the one with the least compactness.

转自:http://www.aishack.in/2010/08/k-means-clustering-in-opencv/


2.Kmeans clustering in OpenCV with C++


Kmeans clustering is one of the most widely used UnSupervised Learning Algorithms. If you are not sure what Kmeans is, refer this article. Also if you have heard about the term Vector Quantization, Kmeans is closely related to that (refer this article to know more about it). Autonlab has a great ppt on Kmeans Clustering.

First, I'll talk about the kmeans usage in OpenCV with C++ and then I'll explain it with a program. If you are not yet comfortable in OpenCV with  C++, please refer to this article and the pretty much everything else is the same as in C API (where you use IplImage*,etc).

Function call in C++ API of OpenCV accepts the input in following format:
double kmeans(const Mat& samples, int clusterCount, Mat& labels, TermCriteria termcrit, int attempts, int flags, Mat* centers);

Parameters explained as follows:

  1. samples: It contains the data. Each row represents a Feature Vector. Each co lumn in a row represent a dimension. So, we can have multiple dimensions of data in the feature vector. Example if we have 50, 5 dimensional feature vector, we will have 50 rows, 5 colums of this matrix. One thing interesting which I've noticed is kmeans doesn't work with CV_64F type.
  2. clusterCount: It should be specified beforehand. We need to know how many clusters do we divide the data into. It is an integer.
  3. labels: It is an output Matrix. If we had a Matrix of above specified size (i.e 50 x 5 ), we will have 50 x 1 output Matrix. It determines which cluster the feature vector belongs. It starts with 0, 1, .... (number of clusters-1).
  4. TermCriteria: It determines the criteria in applying the algorithm. Max iterations, accuracy,etc. 
  5. attempts: number of attempts made with different initial labelling. Also refer documentation for elaborate information on this parameter.
  6. flags: It can be
    KMEANS_RANDOM_CENTERS   (for random initialization of cluster centers).
    KMEANS_PP_CENTERS   (for kmeans++ version of initializing cluster centers)
    KMEANS_USE_INITIAL_LABELS   (for user defined initialization).
  7. centers: Matrix holding center of each cluster. If we divide the 50 x 5feature vector into 2 clusters, we will have 2 centers of each in 5 dimensions.
Sample program is explained as follows:

#include "opencv2/highgui/highgui.hpp"#include "opencv2/core/core.hpp"#include <iostream> using namespace cv;using namespace std;  int main( int /*argc*/, char** /*argv*/ ){ cout << "\n Usage in C++ API:\n double kmeans(const Mat& samples, int clusterCount, Mat& labels, TermCriteria termcrit, int attempts, int flags, Mat* centers) \n\n\n" << endl;  Mat points(sampleCount,dimensions, CV_32F,Scalar(10)); Mat labels; Mat centers(clusterCount, 1, points.type());  int clusterCount = 2; int dimensions = 5; int sampleCount = 50;  // values of 1st half of data set is set to 10 //change the values of 2nd half of the data set; i.e. set it to 20  for(int i =24;i<points.rows;i++) {  for(int j=0;j<points.cols;j++)  {   points.at<float>(i,j)=20;  } }   kmeans(points, clusterCount, labels, TermCriteria( CV_TERMCRIT_EPS+CV_TERMCRIT_ITER, 10, 1.0), 3, KMEANS_PP_CENTERS, centers);    // we can print the matrix directly. cout<<"Data: \n"<<points<<endl; cout<<"Center: \n"<<centers<<endl; cout<<"Labels: \n"<<labels<<endl; return 0;}

转自:http://www.developerstation.org/2012/01/kmeans-clustering-in-opencv-with-c.html


3.Kmeans


Finds centers of clusters and groups input samples around the clusters.

C++: double kmeans(InputArray data, int K, InputOutputArray bestLabels, TermCriteria criteria, int attempts, int flags, OutputArray centers=noArray())
Python: cv2.kmeans(data, K, criteria, attempts, flags[, bestLabels[, centers]]) → retval, bestLabels, centers
C: int cvKMeans2(const CvArr* samples, int cluster_count, CvArr* labels, CvTermCriteria termcrit, int attempts=1, CvRNG* rng=0, int flags=0, CvArr*_centers=0, double* compactness=0 )
Python: cv.KMeans2(samples, nclusters, labels, termcrit, attempts=1, flags=0, centers=None) → float
Parameters:
  • samples – Floating-point matrix of input samples, one row per sample.
  • data – Data for clustering.
  • cluster_count – Number of clusters to split the set by.
  • K – Number of clusters to split the set by.
  • labels – Input/output integer array that stores the cluster indices for every sample.
  • criteria – The algorithm termination criteria, that is, the maximum number of iterations and/or the desired accuracy. The accuracy is specified as criteria.epsilon. As soon as each of the cluster centers moves by less than criteria.epsilon on some iteration, the algorithm stops.
  • termcrit – The algorithm termination criteria, that is, the maximum number of iterations and/or the desired accuracy.
  • attempts – Flag to specify the number of times the algorithm is executed using different initial labellings. The algorithm returns the labels that yield the best compactness (see the last function parameter).
  • rng – CvRNG state initialized by RNG().
  • flags –

    Flag that can take the following values:

    • KMEANS_RANDOM_CENTERS Select random initial centers in each attempt.
    • KMEANS_PP_CENTERS Use kmeans++ center initialization by Arthur and Vassilvitskii [Arthur2007].
    • KMEANS_USE_INITIAL_LABELS During the first (and possibly the only) attempt, use the user-supplied labels instead of computing them from the initial centers. For the second and further attempts, use the random or semi-random centers. Use one of KMEANS_*_CENTERS flag to specify the exact method.
  • centers – Output matrix of the cluster centers, one row per each cluster center.
  • _centers – Output matrix of the cluster centers, one row per each cluster center.
  • compactness – The returned value that is described below.

The function kmeans implements a k-means algorithm that finds the centers of cluster_count clusters and groups the input samples around the clusters. As an output, \texttt{labels}_i contains a 0-based cluster index for the sample stored in the i^{th} row of the samples matrix.

The function returns the compactness measure that is computed as

\sum _i  \| \texttt{samples} _i -  \texttt{centers} _{ \texttt{labels} _i} \| ^2

after every attempt. The best (minimum) value is chosen and the corresponding labels and the compactness value are returned by the function. Basically, you can use only the core of the function, set the number of attempts to 1, initialize labels each time using a custom algorithm, pass them with the ( flags =KMEANS_USE_INITIAL_LABELS ) flag, and then choose the best (most-compact) clustering.

Note

  • An example on K-means clustering can be found at opencv_source_code/samples/cpp/kmeans.cpp
  • (Python) An example on K-means clustering can be found at opencv_source_code/samples/python2/kmeans.py

partition

Splits an element set into equivalency classes.

C++: template<typename _Tp, class _EqPredicate> int partition(const vector<_Tp>& vec, vector<int>& labels, _EqPredicate predicate=_EqPredicate())
Parameters:
  • vec – Set of elements stored as a vector.
  • labels – Output vector of labels. It contains as many elements as vec. Each label labels[i] is a 0-based cluster index of vec[i] .
  • predicate – Equivalence predicate (pointer to a boolean function of two arguments or an instance of the class that has the method booloperator()(const _Tp& a, const _Tp& b) ). The predicate returns true when the elements are certainly in the same class, and returns falseif they may or may not be in the same class.

The generic function partition implements an O(N^2) algorithm for splitting a set of N elements into one or more equivalency classes, as described inhttp://en.wikipedia.org/wiki/Disjoint-set_data_structure . The function returns the number of equivalency classes.

[Arthur2007]Arthur and S. Vassilvitskii. k-means++: the advantages of careful seeding, Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, 2007

转自:http://docs.opencv.org/modules/core/doc/clustering.html


0 0