RANSAC算法及其代码解析

来源：互联网发布：钢结构算量用什么软件编辑：程序博客网时间：2024/05/16 15:08

RANSAC算法简介

随机抽样一致性算法（Random sample consensus,RANSAC）是一种使用迭代的方法，从受噪声污染的数据集中估计数学模型参数的算法，与最小二乘法所起作用相同，各自的适用情况不同。RANSAC算法假设数据集中存在局内点（inlier）和局外点（outlier），并且假设只有判定的局内点才可以用来计算模型，局外点不应该对模型参数的求解产生任何影响。该算法是Fischler和Bolles 1981年在SRI International上提出的，他们使用该算法来解决3D重建中的位置确定问题（Location Determination Problem, LDP）。目前RANSAC算法被广泛用于计算机视觉领域中图像匹配、全景拼接等问题，比如从数对匹配的特征点中求得两幅图片之间的射影变换矩阵，OPENCV实现stitching类时即使用了该算法。
RANSAC算法与最小二乘法的不同之处主要有以下两点：
1. 最小二乘法总是使用所有的数据点来估计参数，而RANSAC算法仅使用局内点；
2. 最小二乘法是一种确定性算法，给定数据集，每一次所得到的模型参数都是相同的；而RANSAC算法是一种随机算法，受迭代次数等的影响，每一次得到的参数一般都不相同。
3. 一般而言，RANSAC算法先根据一定的准则筛选出局内点和局外点，然后对得到的局内点进行拟合，拟合方法可以是最小二乘法，也可以是其他优化算法，从这个角度来说，RANSAC算法是最小二乘法的扩展。
算法的求解过程如下：

首先从数据集中随机选出一组局内点（其数目要保证能够求解出模型的所有参数），计算出一套模型参数。

用得到的模型去测试其他所有的数据点，如果某点的误差在设定的误差阈值之内，就判定其为局内点，否则为局外点，只保留目前为止局内点数目最多的模型，将其记录为最佳模型。

重复执行1,2步足够的次数（即达到预设的迭代次数）后，使用最佳模型对应的局内点来最终求解模型参数，该步可以使用最小二乘法等优化算法。

最后可以通过估计局内点与模型的错误率来评估模型。

C++代码实现

Ziv Yaniv以最简单的直线拟合为例，写过一版RANSAC算法的C++实现，没有依赖任何其他库，但该版代码对C++的依赖较重，使用了C++的一些高级数据结构，比如vector, set，在遍历时还使用了递归算法。详细代码可参见RANSAC代码示例
部分代码及相关注释如下：

//顶层变量定义vector<double> lineParameters;//存储模型参数LineParamEstimator lpEstimator(0.5);//误差阈值设置为0.5vector<Point2D> pointData;//数据点集int numForEstimate=2;//进行一次参数估计所需的最小样本点数，因为是直线拟合，所以可以直接设为2//顶层函数的定义    /**    * Estimate the model parameters using the maximal consensus set by going over ALL possible    * subsets (brute force approach).    * Given: n -  data.size()    *        k - numForEstimate    * We go over all n choose k subsets       n!    *                                     ------------    *                                      (n-k)! * k!    * @param parameters A vector which will contain the estimated parameters.    *                   If there is an error in the input then this vector will be empty.    *                   Errors are: 1. Less data objects than required for an exact fit.    * @param paramEstimator An object which can estimate the desired parameters using either an exact fit or a    *                       least squares fit.    * @param data The input from which the parameters will be estimated.    * @param numForEstimate The number of data objects required for an exact fit.    * @return Returns the percentage of data used in the least squares estimate.    *    * NOTE: This method should be used only when n choose k is small (i.e. k or (n-k) are approximatly equal to n)    *    *///T是数据的类型，该例子中是二维坐标点，作者自己定义了一个类Point2D来表示；S是参数的类型，此处为双精度double型template  <class T, class S>double Ransac<T, S>::compute(std::vector<S> &parameters,ParameterEsitmator<T, S> *paramEstimator,std::vector<T> &data,int numForEstimate){    std::vector<T *> leastSquaresEstimateData;    int numDataObjects = data.size();//数据集的大小，100    int numVotesForBest = -1;//最佳模型所对应的局内点数目初始化为-1    int *arr = new int[numForEstimate];//要进行一次计算所需的样本数：2    short *curVotes = new short[numDataObjects];  //one if data[i] agrees with the current model, otherwise zero    short *bestVotes = new short[numDataObjects];  //one if data[i] agrees with the best model, otherwise zero                                                   //there are less data objects than the minimum required for an exact fit    if (numDataObjects < numForEstimate)        return 0;//computeAllChoices函数寻找局内点数目最多的模型，并将局内点信息存储在bestVotes数组中，作为最终的模型    computeAllChoices(paramEstimator, data, numForEstimate,        bestVotes, curVotes, numVotesForBest, 0, data.size(), numForEstimate, 0, arr);//将所有的局内点取出，存储在leastSquareEstimateData数组中for (int j = 0; j<numDataObjects; j++) {    if (bestVotes[j])                            leastSquaresEstimateData.push_back(&(data[j]));    }//利用所有局内点进行最小二乘参数估计，估计的结果存储在parameters数组中    paramEstimator->leastSquaresEstimate(leastSquaresEstimateData, parameters);//释放动态数组    delete[] arr;    delete[] bestVotes;    delete[] curVotes;//返回值为局内点占所有数据点的比值    return (double)leastSquaresEstimateData.size() / (double)numDataObjects;}//寻找最佳模型的函数定义如下：//使用递归算法来对数据集进行n!/((n-k)!k!)次遍历template<class T, class S>void Ransac<T, S>::computeAllChoices(ParameterEsitmator<T, S> *paramEstimator, std::vector<T> &data, int numForEstimate,    short *bestVotes, short *curVotes, int &numVotesForBest, int startIndex, int n, int k, int arrIndex, int *arr){    //we have a new choice of indexes    //每次k从2开始递减到0的时候，表示新取了2个数据点，可以进行一次参数估计    if (k == 0) {        estimate(paramEstimator, data, numForEstimate, bestVotes, curVotes, numVotesForBest, arr);        return;    }    //continue to recursivly generate the choice of indexes    int endIndex = n - k;    for (int i = startIndex; i <= endIndex; i++) {        arr[arrIndex] = i;        computeAllChoices(paramEstimator, data, numForEstimate, bestVotes, curVotes, numVotesForBest,            i + 1, n, k - 1, arrIndex + 1, arr);//递归调用     }}//进行参数估计，并根据情况更新当前最佳模型的函数，最佳模型的局内点信息存储在数组bestVotes中，而局内点的数目则是由numVotesForBest存储//arr数组存储的是本轮两个样本点在data中的索引值template<class T, class S>void Ransac<T, S>::estimate(ParameterEsitmator<T, S> *paramEstimator, std::vector<T> &data, int numForEstimate,    short *bestVotes, short *curVotes, int &numVotesForBest, int *arr){    std::vector<T *> exactEstimateData;    std::vector<S> exactEstimateParameters;    int numDataObjects;    int numVotesForCur;//initalize with -1 so that the first computation will be set to best    int j;    numDataObjects = data.size();    memset(curVotes, '\0', numDataObjects * sizeof(short));//数组中的点全部初始化为局外点    numVotesForCur = 0;    for (j = 0; j<numForEstimate; j++)        exactEstimateData.push_back(&(data[arr[j]]));// 取出两个数据的地址    paramEstimator->estimate(exactEstimateData, exactEstimateParameters);//用取出的两点来拟合出一组参数    for (j = 0; j<numDataObjects; j++) {    //依次判断是否为局内点        if (paramEstimator->agree(exactEstimateParameters, data[j])) {            curVotes[j] = 1;            numVotesForCur++;        }    }    //如果当前模型inlier的数目大于目前最佳模型inlier的数目，则取代目前最佳模型，并更新信息    if (numVotesForCur > numVotesForBest) {        numVotesForBest = numVotesForCur;        memcpy(bestVotes, curVotes, numDataObjects * sizeof(short));    }}

RANSAC算法的优缺点

该算法最大的优点就是具有较强的鲁棒性，即使数据集中存在明显错误的数据时，也可以得到较好的模型参数。但是要求局外点的数目不能太多，虽然我没有做过具体的实验，一般而言，局外点所占比例不能超过50%。当然后来又出现了一些改进的RANSAC算法，比如2013年Anders Hast提出的：Optimal RANSAC算法。
同时，RANSAC算法的缺点也是显而易见的，在本文给出的示例中，N个数据点，每次估计需要K个数据点，那么如果要遍历所有的情况，需要进行N!(N−K)!K!次参数估计过程，十分耗费资源。实际中一般不会全部遍历，而是根据经验设置一个合理的迭代次数，这显然不能保证最终得到的是最佳参数。此外，还涉及到误差阈值的选取问题，增加了算法的复杂度。

关于RANSAC算法的更详细介绍请参见维基百科~

0 0