纹理对象的实时姿态估计

来源：互联网发布：c语言人事管理系统编辑：程序博客网时间：2024/06/05 05:25

如今，增强现实技术是计算机视觉和机器人领域的热门研究课题之一。The most elemental problem in augmented reality is the estimation of the camera pose respect of an object in the case of computer vision area to do later some 3D rendering or in the case of robotics obtain an object pose in order to grasp it and do some manipulation。然而，这不是图像处理中最常见的问题，实际上最常见的问题是应用的很多算法或数学运算解决一个问题的计算消耗。

目的：

在文将解释如何建立一个实时相机姿态估计应用程序，用来跟踪一个具有六个自由度（six degrees of freedom）的纹理对象（基于给定的 2D image and its 3D textured model）.

代码具有以下功能部分：

1) 读取3D纹理对象和对象的网格数据(Read 3D textured object model and object mesh)
2) 输入源为摄像头或者视频文件（Take input from Camera or Video）
3) 提取ORB特征（Extract ORB features and descriptors from the scene）
4) 基于Flann算法对ORB特征描述子进行匹配（Match scene descriptors with model descriptors using Flann matcher）
5) 使用PnP + Ransac进行姿态估计（Pose estimation using PnP + Ransac）
6) 使用线性卡尔曼滤波去除错误的姿态估计（Linear Kalman Filter for bad poses rejection）

理论

在计算机视觉中，从n对3D到2D点对应关系中估计相机的的姿态是一个很基础和很好理解的问题。姿态估计（The most general version）一般需要估计姿态的六个自由度和五个标定参数: focal length, principal point, aspect ratio and skew. 　姿态估计使用著名的DLT(Direct Linear Transform)算法，可以使用最少6组对应关系建立（It could be established with a minimum of 6 correspondences）。对于DLT算法的精度提高，有非常多的简化方法，最常用简化方法为参数标定，也就是Perspective-*n*-Point问题：

（问题的公式）Problem Formulation:

给定一组3D点pi（expressed in a world reference frame）和投影到图像的上的2D点ui的对应关系，我们尝试得到相机相对于世界坐标系(w.r.t. the world)的姿态（R 和 t）和焦距f.。OpenCV 提供了4种不同的方法解决Perspective-*n*-Point 问题，返回值为 R 和 t 。然后使用下面的公式可以把3D的点投影到图像平面上:

s ⎡ ⎣ ⎢ ⎢ u v 1 ⎤ ⎦ ⎥ ⎥ = ⎡ ⎣ ⎢ ⎢ ⎢ f x 00 0 f y 0 c x c y 1 ⎤ ⎦ ⎥ ⎥ ⎥ ⎡ ⎣ ⎢ ⎢ r 11 r 21 r 31 r 12 r 22 r 32 r 13 r 23 r 33 t 1 t 2 t 3 ⎤ ⎦ ⎥ ⎥ ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ X Y Z 1 ⎤ ⎦ ⎥ ⎥ ⎥ ⎥

关于公式的详细文档参见：　 Camera Calibration and 3D Reconstruction .

源代码

源代码路径：(opencv 3.1.0 )　samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/

下载路径：　real_time_pose_estimation

包含两个主要的程序：

１）　Model registration

This applicaton is exclusive to whom don't have a 3D textured model of the object to be detected. 你可以使用这个代码创建自定义的textured 3D model.当前代码只对平面的(或者说面试平的)物体有效,如果需要得到一个形状复杂的模型需要使用复杂的软件创建。

当前代码的输入为一幅图像和图像对应的3D网格。我们提供相机的内参，用来校正获取的图像作为算法的输入图像.所有的文件需要通过绝对路径或者相对于工作目录的相对路径指定。如果没有指定文件，代码将会尝试使用默认参数.

程序执行，首先从输入图像中提取 ORB特征描述子，然后使用网格数据和 Möller–Trumbore intersection 算法来计算特征的3D坐标系。最后，3D坐标点和特征描述子存在YAML格式文件的不同列表中，每一行存储一个不同的点.关于文件存储和打开的技术文档，请参见File Input and Output using XML and YAML files .

２）　Model detection

本程序的目的为实时估计对象的姿态并给出其对应的3D textured model.

模型检测程序运行时，先加载YAML(structure explained in the model registration program)文件格式的3D textured model.在视觉场景中, 会检测和提取ORB特征描述子.然后使用 cv::FlannBasedMatcher 和 cv::flann::GenericIndex 进行场景中特征描述子和模型中特征描述子的匹配. 将找到的匹配关系，用 cv::solvePnPRansac 函数计算得到相机的 R和t参数。最后使用 KalmanFilter 去除错误的姿态估计.

如果在编译OpenCV的时候编译了源代码中的例子程序，可以待对应路径下找到编译好的程序： opencv/build/bin/cpp-tutorial-pnp_detection. 然后你可以运行程序，并尝试改变一些参数:

//This program shows how to detect an object given its 3D textured model. You can choose to use a recorded video or the webcam.Usage:  ./cpp-tutorial-pnp_detection -helpKeys:  'esc' - to quit.--------------------------------------------------------------------------Usage: cpp-tutorial-pnp_detection [params]  -c, --confidence (value:0.95)      RANSAC confidence  -e, --error (value:2.0)      RANSAC reprojection errror  -f, --fast (value:true)      use of robust fast match  -h, --help (value:true)      print this message  --in, --inliers (value:30)      minimum inliers for Kalman update  --it, --iterations (value:500)      RANSAC maximum iterations count  -k, --keypoints (value:2000)      number of keypoints to detect  --mesh      path to ply mesh  --method, --pnp (value:0)      PnP method: (0) ITERATIVE - (1) EPNP - (2) P3P - (3) DLS  --model      path to yml model  -r, --ratio (value:0.7)      threshold for ratio test  -v, --video      path to recorded video

例如你可以在运行程序的时候，改变pnp的方法:

./cpp-tutorial-pnp_detection --method=2

代码讲解

1) 读取3D纹理对象和对象的网格数据(Read 3D textured object model and object mesh)

为了加载textured model实现了Model类(class),其中的函数　load() 　用来打开 YAML 格式的文件，并读取存储的 3D点和相应的描述子。在OpenCV的源代码中可以找到一个3D textured model，路径samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/Data/cookies_ORB.yml.

/* Load a YAML file using OpenCV */void Model::load(const std::string path){    cv::Mat points3d_mat;    cv::FileStorage storage(path, cv::FileStorage::READ);    storage["points_3d"] >> points3d_mat;    storage["descriptors"] >> descriptors_;    points3d_mat.copyTo(list_points3d_in_);    storage.release();}

主程序中加载模型的方式如下:

Model model;               // instantiate Model objectmodel.load(yml_read_path); // load a 3D textured object model

为了加载网格模型实现了Mesh类(class),其中的函数　load() 　用来打开∗.ply 格式的文件，并读取存储的object的 3D点and also the composed。在OpenCV的源代码中可以找到一个例子，路径samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/Data/box.ply

/* Load a CSV with *.ply format */void Mesh::load(const std::string path){    // Create the reader    CsvReader csvReader(path);    // Clear previous data    list_vertex_.clear();    list_triangles_.clear();    // Read from .ply file    csvReader.readPLY(list_vertex_, list_triangles_);    // Update mesh attributes    num_vertexs_ = list_vertex_.size();    num_triangles_ = list_triangles_.size();}

主程序中网格加载的代码为:

Mesh mesh;                // instantiate Mesh objectmesh.load(ply_read_path); // load an object mesh

可以加载不同的模型和网格:

./cpp-tutorial-pnp_detection --mesh=/absolute_path_to_your_mesh.ply --model=/absolute_path_to_your_model.yml

2) 输入源为摄像头或者视频文件（Take input from Camera or Video）

To detect is necessary capture video. It's done loading a recorded video by passing the absolute path where it is located in your machine. In order to test the application you can find a recorded video insamples/cpp/tutorial_code/calib3d/real_time_pose_estimation/Data/box.mp4.

cv::VideoCapture cap;                // instantiate VideoCapturecap.open(video_read_path);           // open a recorded videoif(!cap.isOpened())                  // check if we succeeded{   std::cout << "Could not open the camera device" << std::endl;   return -1;}

Then the algorithm is computed frame per frame:

cv::Mat frame, frame_vis;while(cap.read(frame) && cv::waitKey(30) != 27)    // capture frame until ESC is pressed{    frame_vis = frame.clone();                     // refresh visualisation frame    // MAIN ALGORITHM}

You can also load different recorded video:

./cpp-tutorial-pnp_detection --video=/absolute_path_to_your_video.mp4

3) 提取ORB特征（Extract ORB features and descriptors from the scene）

The next step is to detect the scene features and extract it descriptors. For this task I implemented aclassRobustMatcher which has a function for keypoints detection and features extraction. You can find it insamples/cpp/tutorial_code/calib3d/real_time_pose_estimation/src/RobusMatcher.cpp. In yourRobusMatch object you can use any of the 2D features detectors of OpenCV. In this case I usedcv::ORB features because is based oncv::FAST to detect the keypoints andcv::xfeatures2d::BriefDescriptorExtractor to extract the descriptors which means that is fast and robust to rotations. You can find more detailed information aboutORB in the documentation.

The following code is how to instantiate and set the features detector and the descriptors extractor:

RobustMatcher rmatcher;                                                          // instantiate RobustMatchercv::FeatureDetector * detector = new cv::OrbFeatureDetector(numKeyPoints);       // instatiate ORB feature detectorcv::DescriptorExtractor * extractor = new cv::OrbDescriptorExtractor();          // instatiate ORB descriptor extractorrmatcher.setFeatureDetector(detector);                                           // set feature detectorrmatcher.setDescriptorExtractor(extractor);                                      // set descriptor extractor

The features and descriptors will be computed by the RobustMatcher inside the matching function.

4) 基于Flann算法对ORB特征描述子进行匹配（Match scene descriptors with model descriptors using Flann matcher）

It is the first step in our detection algorithm. The main idea is to match the scene descriptors with our model descriptors in order to know the 3D coordinates of the found features into the current scene.

Firstly, we have to set which matcher we want to use. In this case is usedcv::FlannBasedMatcher matcher which in terms of computational cost is faster than thecv::BFMatcher matcher as we increase the trained collectction of features. Then, for FlannBased matcher the index created isMulti-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search due toORB descriptors are binary.

You can tune the LSH and search parameters to improve the matching efficiency:

cv::Ptr<cv::flann::IndexParams> indexParams = cv::makePtr<cv::flann::LshIndexParams>(6, 12, 1); // instantiate LSH index parameterscv::Ptr<cv::flann::SearchParams> searchParams = cv::makePtr<cv::flann::SearchParams>(50);       // instantiate flann search parameterscv::DescriptorMatcher * matcher = new cv::FlannBasedMatcher(indexParams, searchParams);         // instantiate FlannBased matcherrmatcher.setDescriptorMatcher(matcher);

Secondly, we have to call the matcher by using robustMatch() or fastRobustMatch() function. The difference of using this two functions is its computational cost. The first method is slower but more robust at filtering good matches because uses two ratio test and a symmetry test. In contrast, the second method is faster but less robust because only applies a single ratio test to the matches.

The following code is to get the model 3D points and its descriptors and then call the matcher in the main program:

// Get the MODEL INFOstd::vector<cv::Point3f> list_points3d_model = model.get_points3d();  // list with model 3D coordinatescv::Mat descriptors_model = model.get_descriptors();                  // list with descriptors of each 3D coordinate

// -- Step 1: Robust matching between model descriptors and scene descriptorsstd::vector<cv::DMatch> good_matches;       // to obtain the model 3D points  in the scenestd::vector<cv::KeyPoint> keypoints_scene;  // to obtain the 2D points of the sceneif(fast_match){    rmatcher.fastRobustMatch(frame, good_matches, keypoints_scene, descriptors_model);}else{    rmatcher.robustMatch(frame, good_matches, keypoints_scene, descriptors_model);}

The following code corresponds to the robustMatch() function which belongs to theRobustMatcher class. This function uses the given image to detect the keypoints and extract the descriptors, match usingtwo Nearest Neighbour the extracted descriptors with the given model descriptors and vice versa. Then, a ratio test is applied to the two direction matches in order to remove these matches which its distance ratio between the first and second best match is larger than a given threshold. Finally, a symmetry test is applied in order the remove non symmetrical matches.

void RobustMatcher::robustMatch( const cv::Mat& frame, std::vector<cv::DMatch>& good_matches,                                 std::vector<cv::KeyPoint>& keypoints_frame,                                 const std::vector<cv::KeyPoint>& keypoints_model, const cv::Mat& descriptors_model ){    // 1a. Detection of the ORB features    this->computeKeyPoints(frame, keypoints_frame);    // 1b. Extraction of the ORB descriptors    cv::Mat descriptors_frame;    this->computeDescriptors(frame, keypoints_frame, descriptors_frame);    // 2. Match the two image descriptors    std::vector<std::vector<cv::DMatch> > matches12, matches21;    // 2a. From image 1 to image 2    matcher_->knnMatch(descriptors_frame, descriptors_model, matches12, 2); // return 2 nearest neighbours    // 2b. From image 2 to image 1    matcher_->knnMatch(descriptors_model, descriptors_frame, matches21, 2); // return 2 nearest neighbours    // 3. Remove matches for which NN ratio is > than threshold    // clean image 1 -> image 2 matches    int removed1 = ratioTest(matches12);    // clean image 2 -> image 1 matches    int removed2 = ratioTest(matches21);    // 4. Remove non-symmetrical matches    symmetryTest(matches12, matches21, good_matches);}

After the matches filtering we have to subtract the 2D and 3D correspondences from the found scene keypoints and our 3D model using the obtainedDMatches vector. For more information about cv::DMatch check the documentation.

// -- Step 2: Find out the 2D/3D correspondencesstd::vector<cv::Point3f> list_points3d_model_match;    // container for the model 3D coordinates found in the scenestd::vector<cv::Point2f> list_points2d_scene_match;    // container for the model 2D coordinates found in the scenefor(unsigned int match_index = 0; match_index < good_matches.size(); ++match_index){    cv::Point3f point3d_model = list_points3d_model[ good_matches[match_index].trainIdx ];   // 3D point from model    cv::Point2f point2d_scene = keypoints_scene[ good_matches[match_index].queryIdx ].pt;    // 2D point from the scene    list_points3d_model_match.push_back(point3d_model);                                      // add 3D point    list_points2d_scene_match.push_back(point2d_scene);

You can also change the ratio test threshold, the number of keypoints to detect as well as use or not the robust matcher:

./cpp-tutorial-pnp_detection --ratio=0.8 --keypoints=1000 --fast=false

5) 使用PnP + Ransac进行姿态估计（Pose estimation using PnP + Ransac）

Once with the 2D and 3D correspondences we have to apply a PnP algorithm in order to estimate the camera pose. The reason why we have to usecv::solvePnPRansac instead of cv::solvePnP is due to the fact that after the matching not all the found correspondences are correct and, as like as not, there are false correspondences or also calledoutliers. TheRandom Sample Consensus orRansac is a non-deterministic iterative method which estimate parameters of a mathematical model from observed data producing an aproximate result as the number of iterations increase. After appylingRansac all theoutliers will be eliminated to then estimate the camera pose with a certain probability to obtain a good solution.

For the camera pose estimation I have implemented a classPnPProblem. Thisclass has 4 atributes: a given calibration matrix, the rotation matrix, the translation matrix and the rotation-translation matrix. The intrinsic calibration parameters of the camera which you are using to estimate the pose are necessary. In order to obtain the parameters you can checkCamera calibration with square chessboard and Camera calibration With OpenCV tutorials.

The following code is how to declare the PnPProblem class in the main program:

// Intrinsic camera parameters: UVC WEBCAMdouble f = 55;                           // focal length in mmdouble sx = 22.3, sy = 14.9;             // sensor sizedouble width = 640, height = 480;        // image sizedouble params_WEBCAM[] = { width*f/sx,   // fx                           height*f/sy,  // fy                           width/2,      // cx                           height/2};    // cyPnPProblem pnp_detection(params_WEBCAM); // instantiate PnPProblem class

The following code is how the PnPProblem class initialises its atributes:

// Custom constructor given the intrinsic camera parametersPnPProblem::PnPProblem(const double params[]){  _A_matrix = cv::Mat::zeros(3, 3, CV_64FC1);   // intrinsic camera parameters  _A_matrix.at<double>(0, 0) = params[0];       //      [ fx   0  cx ]  _A_matrix.at<double>(1, 1) = params[1];       //      [  0  fy  cy ]  _A_matrix.at<double>(0, 2) = params[2];       //      [  0   0   1 ]  _A_matrix.at<double>(1, 2) = params[3];  _A_matrix.at<double>(2, 2) = 1;  _R_matrix = cv::Mat::zeros(3, 3, CV_64FC1);   // rotation matrix  _t_matrix = cv::Mat::zeros(3, 1, CV_64FC1);   // translation matrix  _P_matrix = cv::Mat::zeros(3, 4, CV_64FC1);   // rotation-translation matrix}

OpenCV provides four PnP methods: ITERATIVE, EPNP, P3P and DLS. Depending on the application type, the estimation method will be different. In the case that we want to make a real time application, the more suitable methods are EPNP and P3P due to that are faster than ITERATIVE and DLS at finding an optimal solution. However, EPNP and P3P are not especially robust in front of planar surfaces and sometimes the pose estimation seems to have a mirror effect. Therefore, in this this tutorial is used ITERATIVE method due to the object to be detected has planar surfaces.

The OpenCV Ransac implementation wants you to provide three parameters: the maximum number of iterations until stop the algorithm, the maximum allowed distance between the observed and computed point projections to consider it an inlier and the confidence to obtain a good result. You can tune these paramaters in order to improve your algorithm performance. Increasing the number of iterations you will have a more accurate solution, but will take more time to find a solution. Increasing the reprojection error will reduce the computation time, but your solution will be unaccurate. Decreasing the confidence your arlgorithm will be faster, but the obtained solution will be unaccurate.

The following parameters work for this application:

// RANSAC parametersint iterationsCount = 500;        // number of Ransac iterations.float reprojectionError = 2.0;    // maximum allowed distance to consider it an inlier.float confidence = 0.95;          // ransac successful confidence.

The following code corresponds to the estimatePoseRANSAC() function which belongs to thePnPProblem class. This function estimates the rotation and translation matrix given a set of 2D/3D correspondences, the desired PnP method to use, the output inliers container and the Ransac parameters:

// Estimate the pose given a list of 2D/3D correspondences with RANSAC and the method to usevoid PnPProblem::estimatePoseRANSAC( const std::vector<cv::Point3f> &list_points3d,        // list with model 3D coordinates                                     const std::vector<cv::Point2f> &list_points2d,        // list with scene 2D coordinates                                     int flags, cv::Mat &inliers, int iterationsCount,     // PnP method; inliers container                                     float reprojectionError, float confidence )           // Ransac parameters{    cv::Mat distCoeffs = cv::Mat::zeros(4, 1, CV_64FC1);    // vector of distortion coefficients    cv::Mat rvec = cv::Mat::zeros(3, 1, CV_64FC1);          // output rotation vector    cv::Mat tvec = cv::Mat::zeros(3, 1, CV_64FC1);          // output translation vector    bool useExtrinsicGuess = false;   // if true the function uses the provided rvec and tvec values as                                      // initial approximations of the rotation and translation vectors    cv::solvePnPRansac( list_points3d, list_points2d, _A_matrix, distCoeffs, rvec, tvec,                        useExtrinsicGuess, iterationsCount, reprojectionError, confidence,                        inliers, flags );    Rodrigues(rvec,_R_matrix);                   // converts Rotation Vector to Matrix    _t_matrix = tvec;                            // set translation matrix    this->set_P_matrix(_R_matrix, _t_matrix);    // set rotation-translation matrix}

In the following code are the 3th and 4th steps of the main algorithm. The first, calling the above function and the second taking the output inliers vector from Ransac to get the 2D scene points for drawing purpose. As seen in the code we must be sure to apply Ransac if we have matches, in the other case, the function cv::solvePnPRansac crashes due to any OpenCV bug.

if(good_matches.size() > 0) // None matches, then RANSAC crashes{    // -- Step 3: Estimate the pose using RANSAC approach    pnp_detection.estimatePoseRANSAC( list_points3d_model_match, list_points2d_scene_match,                                      pnpMethod, inliers_idx, iterationsCount, reprojectionError, confidence );    // -- Step 4: Catch the inliers keypoints to draw    for(int inliers_index = 0; inliers_index < inliers_idx.rows; ++inliers_index)    {    int n = inliers_idx.at<int>(inliers_index);         // i-inlier    cv::Point2f point2d = list_points2d_scene_match[n]; // i-inlier point 2D    list_points2d_inliers.push_back(point2d);           // add i-inlier to list}

Finally, once the camera pose has been estimated we can use the R and t in order to compute the 2D projection onto the image of a given 3D point expressed in a world reference frame using the showed formula onTheory.

The following code corresponds to the backproject3DPoint() function which belongs to thePnPProblem class. The function backproject a given 3D point expressed in a world reference frame onto a 2D image:

// Backproject a 3D point to 2D using the estimated pose parameterscv::Point2f PnPProblem::backproject3DPoint(const cv::Point3f &point3d){    // 3D point vector [x y z 1]'    cv::Mat point3d_vec = cv::Mat(4, 1, CV_64FC1);    point3d_vec.at<double>(0) = point3d.x;    point3d_vec.at<double>(1) = point3d.y;    point3d_vec.at<double>(2) = point3d.z;    point3d_vec.at<double>(3) = 1;    // 2D point vector [u v 1]'    cv::Mat point2d_vec = cv::Mat(4, 1, CV_64FC1);    point2d_vec = _A_matrix * _P_matrix * point3d_vec;    // Normalization of [u v]'    cv::Point2f point2d;    point2d.x = point2d_vec.at<double>(0) / point2d_vec.at<double>(2);    point2d.y = point2d_vec.at<double>(1) / point2d_vec.at<double>(2);    return point2d;}

The above function is used to compute all the 3D points of the object Mesh to show the pose of the object.

You can also change RANSAC parameters and PnP method:

./cpp-tutorial-pnp_detection --error=0.25 --confidence=0.90 --iterations=250 --method=3

6) 使用线性卡尔曼滤波去除错误的姿态估计（Linear Kalman Filter for bad poses rejection）

Is it common in computer vision or robotics fields that after applying detection or tracking techniques, bad results are obtained due to some sensor errors. In order to avoid these bad detections in this tutorial is explained how to implement a Linear Kalman Filter. The Kalman Filter will be applied after detected a given number of inliers.

You can find more information about what Kalman Filter is. In this tutorial it's used the OpenCV implementation of the cv::KalmanFilter based on Linear Kalman Filter for position and orientation tracking to set the dynamics and measurement models.

Firstly, we have to define our state vector which will have 18 states: the positional data (x,y,z) with its first and second derivatives (velocity and acceleration), then rotation is added in form of three euler angles (roll, pitch, jaw) together with their first and second derivatives (angular velocity and acceleration)

X = (x, y, z, x ˙, y ˙, z ˙, x ¨, y ¨, z ¨, ψ, θ, ϕ, ψ ˙, θ ˙, ϕ ˙, ψ ¨, θ ¨, ϕ ¨) T

Secondly, we have to define the number of measuremnts which will be 6: from R and t we can extract (x,y,z) and (ψ,θ,ϕ). In addition, we have to define the number of control actions to apply to the system which in this case will be zero. Finally, we have to define the differential time between measurements which in this case is 1/T, where T is the frame rate of the video.

cv::KalmanFilter KF;         // instantiate Kalman Filterint nStates = 18;            // the number of statesint nMeasurements = 6;       // the number of measured statesint nInputs = 0;             // the number of action controldouble dt = 0.125;           // time between measurements (1/FPS)initKalmanFilter(KF, nStates, nMeasurements, nInputs, dt);    // init function

The following code corresponds to the Kalman Filter initialisation. Firstly, is set the process noise, the measurement noise and the error covariance matrix. Secondly, are set the transition matrix which is the dynamic model and finally the measurement matrix, which is the measurement model.

You can tune the process and measurement noise to improve the Kalman Filter performance. As the measurement noise is reduced the faster will converge doing the algorithm sensitive in front of bad measurements.

void initKalmanFilter(cv::KalmanFilter &KF, int nStates, int nMeasurements, int nInputs, double dt){  KF.init(nStates, nMeasurements, nInputs, CV_64F);                 // init Kalman Filter  cv::setIdentity(KF.processNoiseCov, cv::Scalar::all(1e-5));       // set process noise  cv::setIdentity(KF.measurementNoiseCov, cv::Scalar::all(1e-4));   // set measurement noise  cv::setIdentity(KF.errorCovPost, cv::Scalar::all(1));             // error covariance                 /* DYNAMIC MODEL */  //  [1 0 0 dt  0  0 dt2   0   0 0 0 0  0  0  0   0   0   0]  //  [0 1 0  0 dt  0   0 dt2   0 0 0 0  0  0  0   0   0   0]  //  [0 0 1  0  0 dt   0   0 dt2 0 0 0  0  0  0   0   0   0]  //  [0 0 0  1  0  0  dt   0   0 0 0 0  0  0  0   0   0   0]  //  [0 0 0  0  1  0   0  dt   0 0 0 0  0  0  0   0   0   0]  //  [0 0 0  0  0  1   0   0  dt 0 0 0  0  0  0   0   0   0]  //  [0 0 0  0  0  0   1   0   0 0 0 0  0  0  0   0   0   0]  //  [0 0 0  0  0  0   0   1   0 0 0 0  0  0  0   0   0   0]  //  [0 0 0  0  0  0   0   0   1 0 0 0  0  0  0   0   0   0]  //  [0 0 0  0  0  0   0   0   0 1 0 0 dt  0  0 dt2   0   0]  //  [0 0 0  0  0  0   0   0   0 0 1 0  0 dt  0   0 dt2   0]  //  [0 0 0  0  0  0   0   0   0 0 0 1  0  0 dt   0   0 dt2]  //  [0 0 0  0  0  0   0   0   0 0 0 0  1  0  0  dt   0   0]  //  [0 0 0  0  0  0   0   0   0 0 0 0  0  1  0   0  dt   0]  //  [0 0 0  0  0  0   0   0   0 0 0 0  0  0  1   0   0  dt]  //  [0 0 0  0  0  0   0   0   0 0 0 0  0  0  0   1   0   0]  //  [0 0 0  0  0  0   0   0   0 0 0 0  0  0  0   0   1   0]  //  [0 0 0  0  0  0   0   0   0 0 0 0  0  0  0   0   0   1]  // position  KF.transitionMatrix.at<double>(0,3) = dt;  KF.transitionMatrix.at<double>(1,4) = dt;  KF.transitionMatrix.at<double>(2,5) = dt;  KF.transitionMatrix.at<double>(3,6) = dt;  KF.transitionMatrix.at<double>(4,7) = dt;  KF.transitionMatrix.at<double>(5,8) = dt;  KF.transitionMatrix.at<double>(0,6) = 0.5*pow(dt,2);  KF.transitionMatrix.at<double>(1,7) = 0.5*pow(dt,2);  KF.transitionMatrix.at<double>(2,8) = 0.5*pow(dt,2);  // orientation  KF.transitionMatrix.at<double>(9,12) = dt;  KF.transitionMatrix.at<double>(10,13) = dt;  KF.transitionMatrix.at<double>(11,14) = dt;  KF.transitionMatrix.at<double>(12,15) = dt;  KF.transitionMatrix.at<double>(13,16) = dt;  KF.transitionMatrix.at<double>(14,17) = dt;  KF.transitionMatrix.at<double>(9,15) = 0.5*pow(dt,2);  KF.transitionMatrix.at<double>(10,16) = 0.5*pow(dt,2);  KF.transitionMatrix.at<double>(11,17) = 0.5*pow(dt,2);       /* MEASUREMENT MODEL */  //  [1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]  //  [0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]  //  [0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]  //  [0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0]  //  [0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0]  //  [0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0]  KF.measurementMatrix.at<double>(0,0) = 1;  // x  KF.measurementMatrix.at<double>(1,1) = 1;  // y  KF.measurementMatrix.at<double>(2,2) = 1;  // z  KF.measurementMatrix.at<double>(3,9) = 1;  // roll  KF.measurementMatrix.at<double>(4,10) = 1; // pitch  KF.measurementMatrix.at<double>(5,11) = 1; // yaw}

In the following code is the 5th step of the main algorithm. When the obtained number of inliers after Ransac is over the threshold, the measurements matrix is filled and then the Kalman Filter is updated:

// -- Step 5: Kalman Filter// GOOD MEASUREMENTif( inliers_idx.rows >= minInliersKalman ){    // Get the measured translation    cv::Mat translation_measured(3, 1, CV_64F);    translation_measured = pnp_detection.get_t_matrix();    // Get the measured rotation    cv::Mat rotation_measured(3, 3, CV_64F);    rotation_measured = pnp_detection.get_R_matrix();    // fill the measurements vector    fillMeasurements(measurements, translation_measured, rotation_measured);}// Instantiate estimated translation and rotationcv::Mat translation_estimated(3, 1, CV_64F);cv::Mat rotation_estimated(3, 3, CV_64F);// update the Kalman filter with good measurementsupdateKalmanFilter( KF, measurements,              translation_estimated, rotation_estimated);

The following code corresponds to the fillMeasurements() function which converts the measured Rotation Matrix to Eulers angles and fill the measurements matrix along with the measured translation vector:

void fillMeasurements( cv::Mat &measurements,                   const cv::Mat &translation_measured, const cv::Mat &rotation_measured){    // Convert rotation matrix to euler angles    cv::Mat measured_eulers(3, 1, CV_64F);    measured_eulers = rot2euler(rotation_measured);    // Set measurement to predict    measurements.at<double>(0) = translation_measured.at<double>(0); // x    measurements.at<double>(1) = translation_measured.at<double>(1); // y    measurements.at<double>(2) = translation_measured.at<double>(2); // z    measurements.at<double>(3) = measured_eulers.at<double>(0);      // roll    measurements.at<double>(4) = measured_eulers.at<double>(1);      // pitch    measurements.at<double>(5) = measured_eulers.at<double>(2);      // yaw}

The following code corresponds to the updateKalmanFilter() function which update the Kalman Filter and set the estimated Rotation Matrix and translation vector. The estimated Rotation Matrix comes from the estimated Euler angles to Rotation Matrix.

void updateKalmanFilter( cv::KalmanFilter &KF, cv::Mat &measurement,                     cv::Mat &translation_estimated, cv::Mat &rotation_estimated ){    // First predict, to update the internal statePre variable    cv::Mat prediction = KF.predict();    // The "correct" phase that is going to use the predicted value and our measurement    cv::Mat estimated = KF.correct(measurement);    // Estimated translation    translation_estimated.at<double>(0) = estimated.at<double>(0);    translation_estimated.at<double>(1) = estimated.at<double>(1);    translation_estimated.at<double>(2) = estimated.at<double>(2);    // Estimated euler angles    cv::Mat eulers_estimated(3, 1, CV_64F);    eulers_estimated.at<double>(0) = estimated.at<double>(9);    eulers_estimated.at<double>(1) = estimated.at<double>(10);    eulers_estimated.at<double>(2) = estimated.at<double>(11);    // Convert estimated quaternion to rotation matrix    rotation_estimated = euler2rot(eulers_estimated);}

The 6th step is set the estimated rotation-translation matrix:

// -- Step 6: Set estimated projection matrixpnp_detection_est.set_P_matrix(rotation_estimated, translation_estimated);

// -- Step X: Draw posedrawObjectMesh(frame_vis, &mesh, &pnp_detection, green);                // draw current posedrawObjectMesh(frame_vis, &mesh, &pnp_detection_est, yellow);           // draw estimated posedouble l = 5;std::vector<cv::Point2f> pose_points2d;pose_points2d.push_back(pnp_detection_est.backproject3DPoint(cv::Point3f(0,0,0)));    // axis centerpose_points2d.push_back(pnp_detection_est.backproject3DPoint(cv::Point3f(l,0,0)));    // axis xpose_points2d.push_back(pnp_detection_est.backproject3DPoint(cv::Point3f(0,l,0)));    // axis ypose_points2d.push_back(pnp_detection_est.backproject3DPoint(cv::Point3f(0,0,l)));    // axis zdraw3DCoordinateAxes(frame_vis, pose_points2d);

You can also modify the minimum inliers to update Kalman Filter:

./cpp-tutorial-pnp_detection --inliers=20

结果

下面的视频是实时姿势估计的结果，使用下面的参数和前面解释的检测算法：

// Robust Matcher parametersint numKeyPoints = 2000;      // number of detected keypointsfloat ratio = 0.70f;          // ratio testbool fast_match = true;       // fastRobustMatch() or robustMatch()// RANSAC parametersint iterationsCount = 500;    // number of Ransac iterations.int reprojectionError = 2.0;  // maximum allowed distance to consider it an inlier.float confidence = 0.95;      // ransac successful confidence.// Kalman Filter parametersint minInliersKalman = 30;    // Kalman threshold updating

视频见YouTube： YouTube.

0 0