Show Posts

General / Quality Control: Image, Point Cloud, Calibration, and Python API

« on: June 28, 2016, 10:02:07 PM »

Hi everyone,

I recently started working with PhotoScan to evaluate its Structure From Motion performance (camera alignment performance) as well as its Python API. A lot has been said in this forum about these concepts and I will do my best effort in citing all the posts that I used, to give credit to the original authors. If you find any missing citations (please forgive me), I will edit the post to reflect their existence.

Scope

This post is meant to be a mix of:

A go-to post for very often asked concepts about quality SFM processing.
A place to describe SFM concepts, share opinions about it, and centralize Agisoft staff's explanations.

This post is not meant to be:

A go-to post for MVS, mesh generation, texture generation, color correction, point classification, nor ortho-rectification of images.
A one-way dissertation. Please, feel free to participate.

SFM vs MVS

I am only interested in talking about Structure From Motion (SFM) and not Multi-View Stereo (MVS). The difference, although somewhat subjective and dependent on the literature, could be summarized as follows:

SFM: SFM focuses on building a joint estimation of camera parameters, including pose and sometimes calibration, as well as world points in 3D, based on an set images and potentially initial estimates on camera calibration and point poses, markers, etc. For those coming from other fields, it can be thought of as offline visual SLAM.

MVS: MVS focuses on building a dense, colored and unified point cloud of the scene. Dense is the priority here, then coloring, and finally filtering and matching of disjoint chunks of points. In short, it can be thought of as multiple camera stereo, which it is.

Python API vs GUI

There is one simple reason why I want to include the Python API in this post. There are many types of users for this kind of software, each with a very different background. Some only swear by experience, others by theoretical definitions. I am a fervent defender of the need to lay a bridge between both in other to really, fully, understand something. And having access to those pieces of computation based on theoretical grounds which explain why one’s intuition or experience is right or wrong is uplifting.

SFM Pipeline

There may be as many variations of the SFM pipeline as people trying to implement it. To be consistent with the spirit of the post, I will try to describe the key concepts as presented in PhotoScan, which are probably known to all of you.

Feature Detection: For each image, extract singular and hopefully unique visual attributes, called key points. These key points should be scale invariant, i.e. distinguishable independently of their distance from the camera, rotation invariant, i.e. distinguishable independently of their orientation with respect to the camera, as unique as possible, as sharp and free of noise as possible, and finally as abundant as possible.

Feature Matching: This stage consists of a piece of computation per image pairs. Whether these pairs are found through brute force methods, quick and dirty matching, or using reference information is only relevant to processing time and, in some measure, to output quality. For each image pair, try and match their respective key points in a coherent way. This is better achieved with high quality key points, and can be perfected with additional information like prior information on relative camera poses and initial camera calibrations. These matches are called tie points.

Structure Estimation: This stage is where camera calibration really comes into play. Whether you have initially calibrated cameras or not, each camera will have an intrinsic matrix and intrinsic distortion coefficients associated with it. The real issue is whether you want to (re)calibrate your cameras as part of the estimation. Needless to say, including this (re)calibration not only makes it more time consuming, it also decreases quality and, in some cases, it can cause a failure or divergence of the process. This stage consists of global or bundled pieces of computation per image n-tuples. Although the basic concept is better explained using a single image pair. For each image pair, and for each tie point, we need to find their position in the world. Assuming that all tie points come from correctly matched key points, i.e. there are no mismatches or outliers, each one of them defines a geometrical constraint on both cameras, and hence a constraint on their parameters. These constraints, in conjunction with some error function, are then used to define the structure of the scene, including tie point poses in the world. When some of the tie points come from mismatched key points, additional procedures like RANSAC are used.

Structure Optimization: This stage is what its name suggests, an optimization of the scene. The process is similar to that of structure estimation, but varies in that additional intrinsic distortion coefficients can be used and tie points can be left out due to poor performance, measured using some quality control metrics. Finally it is expected that this stage builds upon the results of previously applied structure estimations and optimizations.

Quality Control Metrics

The quality control metrics mentioned earlier, some of them used in structure optimization, can and should be used to evaluate the overall performance. After all, if we are not using ground control points or markers to close the loop, they are our only references. There are five metrics such metrics in PhotoScan: image quality; image count and effective overlap; projection accuracy; reconstruction uncertainty; reprojection error. Some of these metrics are only descriptive, others are actionable. In PhotoScan, they are accessible either in the Photos pane, in the Show Info window, or in the Gradual Selection option.

Let's start with a little bit of RTFM.

Quote

Alignment Accuracy: Higher accuracy settings help to obtain more accurate camera position estimates. Lower accuracy settings can be used to get the rough camera positions in a shorter period of time. While at High accuracy setting the software works with the photos of the original size, Medium setting causes image downscaling by factor of 4 (2 times by each side), at Low accuracy source files are downscaled by factor of 16, and Lowest value means further downscaling by 4 times more. Highest accuracy setting upscales the image by factor of 4. Since tie point positions are estimated on the basis of feature spots found on the source images, it may be meaningful to upscale a source photo to accurately localize a tie point. However, Highest accuracy setting is recommended only for very sharp image data and mostly for research purposes due to the corresponding processing being quite time consuming.

Image quality: Poor input, e. g. vague photos, can influence alignment results badly. To help you to exclude poorly focused images from processing PhotoScan suggests automatic image quality estimation feature. Images with quality value of less than 0.5 units are recommended to be disabled and thus excluded from photogrammetric processing, providing that the rest of the photos cover the whole scene to be reconstructed. PhotoScan estimates image quality for each input image. The value of the parameter is calculated based on the sharpness level of the most focused part of the picture.

Image count: PhotoScan reconstruct all the points that are visible at least on two photos. However, points that are visible only on two photos are likely to be located with poor accuracy. Image count filtering enables to remove such unreliable points from the cloud.

Expected overlap: In case of aerial photography the overlap requirement can be put in the following figures: 60% of side overlap + 80% of forward overlap.

Projection Accuracy: This criterion allows to filter out points which projections were relatively poorer localised due to their bigger size.

Reconstruction uncertainty: High reconstruction uncertainty is typical for points, reconstructed from nearby photos with small baseline. Such points can noticeably deviate from the object surface, introducing noise in the point cloud. While removal of such points should not affect the accuracy of optimization, it may be useful to remove them before building geometry in Point Cloud mode or for better visual appearance of the point cloud.

Reprojection error: High reprojection error usually indicates poor localization accuracy of the corresponding point projections at the point matching step. It is also typical for false matches. Removing such points can improve accuracy of the subsequent optimization step.

Now, some explanation.

Image Quality: The first stage in the SFM pipeline consists of extracting quality features, or key points, from images. It is therefore expected that images should be sharp, textured, and free of unwanted noise. These are common problems in images which should be avoided: blur, lack of focus, noise, chromatic aberration, specularities, exposure variation, saturation, vignetting, residual distortion. In other words, they should not be blurry and all non-blurriness should come from recognizable and repeatable texture. PhotoScan matches images on different scales, forming image pyramids, to improve robustness with blurred or difficult to match images. The Accuracy parameters in the MatchPhotos method sets the minimum scale at which images are processed. Using blurred images produces a similar effect as lowering the Accuracy value.

Image Count and Effective Overlap: For a world point, the number of key points is equal to the number of camera from where it is observed. This number is by definition the image count. When the image count of a world point increases, its uncertainty tends to decrease. That’s why a 90%-overlap set of blurry images may give a low number of low quality point, whereas a 60%-overlap set of high quality images may give a high number of high quality points. When averaging the image counts over all world points, the result is an indicator of the effective overlap of the images. It’s called effective because only useful key points are considered, and not image areas. See Figure 1.

Projection accuracy: Each detected key point has a coordinate in the image and a size. Given a camera and a key point in the image plane, the error in projecting a key point into the world can be viewed as a cone whose diameter is directly proportional to the key point size. Assuming that its coordinate projects towards the exact world point pose, then the size is directly related to projection accuracy. For a world point coming from multiple key points, its projection accuracy is calculated as the mean key point size. See Figure 2.

Reconstruction uncertainty: When intersecting key point projections with differing directions and accuracies, a uncertainty volume surrounding the world point estimate can be defined using a contour surface containing equally uncertain points in the world. By approximating the uncertainty by gaussian noise the surface becomes an ellipsoid, represented by a matrix in 3D, calculated from a PCA approximation of the original uncertainty. This matrix already includes the information coming from all key points. We can then directly define the reconstruction uncertainty as the condition number of this matrix, calculated from the ratio of its largest and smallest eigenvalues. As the condition number gets larger, so does the reconstruction uncertainty. See Figure 3.

Reprojection Error: Let’s assume that we have on one hand the estimated pose of a camera as well as the estimation for one point in the world corresponding to one of the key points for this camera, and on the other hand a (u, v) coordinate for the key point in the image. We could reproject the imperfectly but optimally estimated point in the world to the image and compare its (x, y) coordinate to (u, v). The discrepancy, in the image plane, between these values is what is called reprojection error. For a world point coming from multiple key points the reprojection error for this point comes from aggregating the reprojection error of all the key points. We can then find a Max and an RMS value. Because key points are not actually points but rather areas in the image, there is an issue of normalization involved. Normalization based on key point size makes the error relative to key point sizes. See Figure 4.

Acceptable Values

Should we use a higher number of key points and then filter them out or look for a lower amount of points and hope that optimization improves the result? What should the minimum image quality be? What about the other quality control metric? This is a matter of much debate, and the most common answer probably is: it depends. Many answers are based on experienced and that’s fine by me, but I hope that this post helps enlighten this experience.

Camera Calibration

Camera calibration involves all intrinsic parameters, including distortion parameters. When running multiple projects with the same camera, its calibration parameters are re-estimated over and over, based on potentially highly varied scenes, and thus providing dissimilar results. Repeating the same project may even give different calibration outputs.
To prevent this discrepancies from occurring, it is recommended to pre-calibrate the camera as best as you can and use these constant values as fixed calibration for all your projects. This provides quality, consistency, and unity among all your projects. You can use Agisoft Lens for this purpose. Plus, using a fixed pattern provides more reliable values.

Questions

Is there a way to define a absolute quality metric for images? (i.e. not dependent on the whole image set, maybe by adding a reference image)
Could anyone give me pointers on how to implement reconstruction uncertainty using PhotoScan Python API?
Why is effective overlap computed using points marked as not valid but not so mean key point size?
What are tracks and how do they relate to points?
Once a good scene structure has been found, there is plenty new information for key point detection and matching. Nevertheless, all this information is ditched out when repeating the process. Do you plan changing this behaviour in future releases?
Is camera referencing uses in the matching stage or is it only used as an initial estimate for camera alignment?
Why does Agisoft Lens provide less distortion parameters than Agisoft PhotoScan when the process is more reliable?

Forum

Topics - jpvega

General / Quality Control: Image, Point Cloud, Calibration, and Python API