-
Notifications
You must be signed in to change notification settings - Fork 12
Lens Calibration Notes
In order to compute an LEDs real world position relative to a web camera we need a mathematical model that describes how a light ray travels from a point in the world through the camera lens and arrives at cell in the image sensor (typically a CCD chip a few mm in size). For most web cameras the pinhole camera model is used, which is a function of "focal length" and "distortion".
The "focal length" of a camera is used to relate the size of an object in the real world to the size of it's projection on the camera image sensor. The focal length of a camera is inversely proportional to it's field of view. Thus given the size of an object's projection (image size in pixels), the size of each cell in the camera CCD (in mm), and the focal length (in mm) you can solve for it size in the real world. If you know how big the object actually is (like a PSMove Controller bulb) then you can solve for distance from the camera.
However camera lenses aren't perfect and will distort the image for light rays more as they approach the edge of the lens. The amount of distortion is pretty slight for most web cameras, but it's enough that it can through of positional estimates enough that's it's worth accounting for. There are two types of distortion that can occur: radial and tangential.
Radial distortion warps the image symmetrically around the center of the camera lens. It is typically the only kind of distortion we see in most web cameras:
Tangential distortion is caused by the image sensor not being parallel to the alignment of the camera lens. This is usually not common.
If you have the full hardware spec sheet for a camera it's possible to directly calculate the focal length. This has been done in the past for the PS3Eye camera. I've personally never seen any information posted about lens properties though. In practice focal length and distortion calibration has been done using a calibration app. The OpenCV computer vision library has calibrateCamera function that uses several images of a checkerboard pattern of a known size. Since the size of the squares in the calibration mat is known the focal length can be computed. And since the grid is straight we can measure the amount of distortion we see. It's basically a non-linear optimization problem turning the focal length and distortion parameters to best fit the measured data. You can see the process in action here:
I've implemented this calibration process in the PSVRConfigTool here. You input the size of the grid square (in mm) and then take 20 snapshots of the grid pattern from various angles. It's probably apparent from the above video that this can be a very error prone process because it's so sensitive to quality of the samples that the user takes. Because of this I recommend building a database of calibrations for known cameras rather than making users also calibrate from the start. If they really want to add support for a camera they can. Also I've noticed that the calibrateCamera() method has only returned consistent and reasonable calibration results with the following flags:
- cv::CALIB_FIX_ASPECT_RATIO = The functions considers only Y focal length as a free parameter
- cv::CALIB_FIX_PRINCIPAL_POINT = The principal point is not changed during the global optimization
- cv::CALIB_ZERO_TANGENT_DIST = Tangential distortion coefficients (p1,p2) are set to zeros and stay zero
- cv::CALIB_RATIONAL_MODEL = Radial distortion coefficients k4, k5, and k6 are enabled
- cv::CALIB_FIX_K3 = Radial distortion coefficients k3 is not changed during the optimization
- cv::CALIB_FIX_K4 = Radial distortion coefficients k4 is not changed during the optimization
- cv::CALIB_FIX_K5 = = Radial distortion coefficients k5 is not changed during the optimization
In PSVRTracker all of this camera lens calibration info is stored in the resources\supported_trackers
folder. Here is what the calibration file looks like for the Sony_PS3EYE_MonoCamera.json
:
{
"version": 1,
"friendly_name": "Sony PS3EYE Camera",
"usb_product_id": 8192,
"usb_vendor_id": 5141,
"supported_modes":[
{
"mode_name": "640x480(60FPS)",
"is_frame_mirrored": false,
"is_buffer_mirrored": false,
"frame_rate": 60.0,
"frame_width": 640,
"frame_height": 480,
"buffer_format": "BEYER",
"intrinsics_type": "mono",
"hfov": 60.0,
"vfov": 45.0,
"zNear": 10.0,
"zFar": 200.0,
"camera_matrix": [ 604.1783809256024, 0.0, 320.0, 0.0, 604.1783809256024, 240.0, 0.0, 0.0, 1.0 ],
"distortion.k1": 0.015861621530855057,
"distortion.k2": 0.09840206266336657,
"distortion.k3": 0.0,
"distortion.k4": 0.0,
"distortion.k5": 0.0,
"distortion.k6": 0.4512905272272733,
"distortion.p1": 0.0,
"distortion.p2": 0.0
}
]
}
As mentioned earlier, if you have two lenses in stereo camera configuration and either the scene you are capturing is static OR if the left and right video frames are captured at the same instant, you can use the principal of image disparity to compute the depth of every pixel in the stereo image. This is done by comparing the location of a pixel in the left image with it's corresponding pixel in the right image. But for a given pixel in the left image, how do you find it's corresponding pixel in the right image?
If you know the position and rotation of the right camera relative to the left camera you can use epipolar geometry to find corresponding pixel. Imagine a chopstick pointed directly at the left camera such that it appears as a dot in the left image. On the right camera it will appear as a tilted line because it's not directly facing the right camera. The line that the chopstick forms in the right camera is called the "epipolar line" of the point in the left image. If you know position and rotation of the right camera relative to the left camera (also called the "Fundamental Matrix" or 'F') you can compute this epipolar line.
So to find a corresponding pixel in the right image you simply need to search for a matching feature (color, edge, etc) along the epipolar line for the left pixel. It turns out that when you have two perfectly horizontally aligned cameras the epipolar line in the right image is a horizontal line at the same y-coordinate as the pixel in the left image. The left and right images are said to be "rectified" in this case. Once we find the corresponding pixel in the right image the difference in the x positions left and right image combined with the focal length of the lenses allows you to compute the distance of that pixel from the camera:
You can actually combine the Fundamental Matrix with the camera lens properties into a "Stereo Reprojection Matrix" or 'Q'. This 'Q' matrix combined with the disparity you calculated for a pixel can be used to calculate the 3D location of the point that corresponds to the projected pixel location. If you are curious, the is a derivation of the 'Q' matrix and it's usage here. However the 'Q' matrix can be computed by OpenCV's stereoRectify function. An example of taking the disparity of two pixels and computing the 3d position can be found in PSVRTracker's triangulate_stereo_projection function.
As was the case for monocular web cameras, OpenCV has a function for computing part of the calibration parameters for a stereo camera. The stereoCalibrate function uses several images of the same checkerboard pattern of a known size.
I've implemented the stereo calibration process in the PSVRConfigTool here. You input the size of the grid square (in mm) and then take 20 snapshots of the grid pattern from various angles, making sure that the pattern if fully visible to both left and right cameras. It's probably apparent from the above video that this can be a very error prone process because it's so sensitive to quality of the samples that the user takes. Because of this I recommend building a database of calibrations for known stereo cameras rather than making users also calibrate from the start. If they really want to add support for a camera they can. Also I've noticed that the stereoCalibrate() method has only returned consistent and reasonable calibration results with the following flags:
- cv::CALIB_FIX_ASPECT_RATIO = The functions considers only Y focal length as a free parameter
- cv::CALIB_SAME_FOCAL_LENGTH = Assume both left and right camera have the same focal lengths
- cv::CALIB_ZERO_TANGENT_DIST = Tangential distortion coefficients (p1,p2) are set to zeros and stay zero
- cv::CALIB_RATIONAL_MODEL = Radial distortion coefficients k4, k5, and k6 are enabled
- cv::CALIB_FIX_K3 = Radial distortion coefficients k3 is not changed during the optimization
- cv::CALIB_FIX_K4 = Radial distortion coefficients k4 is not changed during the optimization
- cv::CALIB_FIX_K5 = = Radial distortion coefficients k5 is not changed during the optimization
The stereoCalibrate() will compute the focal lengths, distortion coefficients and the transform of the right camera relative to the left camera. But there is one more step needed. Recall earlier that we wanted "rectified stereo images" so that the epipolar lines were horizontal and lined up in both images? The stereoRectify is used to compute additional rectification transforms (rotation and translation) for points on the left and right images. This function will also compute the Stereo Reprojection Matrix 'Q' mentioned earlier that is used to take a disparity measurement and compute a 3d point.
In PSVRTracker all of this stereo camera lens calibration info is stored in the resources\supported_trackers
folder. Here is what the calibration file looks like for the Sony_PS4_StereoCamera.json
:
{
"version": 1,
"friendly_name": "Sony PS4 USB 3.0 Stereo Camera",
"usb_product_id": 1418,
"usb_vendor_id": 1449,
"supported_modes":[
{
"mode_name": "1748x408(60FPS)",
"is_frame_mirrored": true,
"is_buffer_mirrored": false,
"frame_rate": 60.0,
"frame_width": 640,
"frame_height": 400,
"buffer_pixel_width": 1748,
"buffer_pixel_height": 408,
"frame_sections": [{"x":48,"y":0},{"x":688,"y":0}],
"buffer_format": "YUY2",
"intrinsics_type": "stereo",
"hfov": 60.0,
"vfov": 45.0,
"zNear": 10.0,
"zFar": 200.0,
"left_camera_matrix": [ 697.9107197227819, 0.0, 253.93118993999929, 0.0, 697.9107197227819, 38.333404331472714, 0.0, 0.0, 1.0 ],
"right_camera_matrix": [ 697.9107197227819, 0.0, 150.66790851956384, 0.0, 697.9107197227819, -4.4209951004440615, 0.0, 0.0, 1.0 ],
"left_distortion_cofficients.k1": -0.2480108519723502,
"left_distortion_cofficients.k2": -0.18718527831596735,
"left_distortion_cofficients.k3": 0.1478780085676167,
"left_distortion_cofficients.p1": 0.0,
"left_distortion_cofficients.p2": 0.0,
"right_distortion_cofficients.k1": -0.3106652390551932,
"right_distortion_cofficients.k2": 0.10730969655612807,
"right_distortion_cofficients.k3": -0.033731268270381344,
"right_distortion_cofficients.p1": 0.0,
"right_distortion_cofficients.p2": 0.0,
"left_rectification_rotation": [ 0.9988758153367525, 0.046224961515971955, 0.010505163881860916, -0.04648084111511802, 0.9985915852983812, 0.025580797104410816, -0.009307896892907522, -0.026040328417913445, 0.9996175590451172 ],
"right_rectification_rotation": [ 0.9898104831718865, -0.0064323878398521135, -0.1422456740632716, 0.0027483698571475223, 0.9996560729879099, -0.02608030294146847, 0.14236451055710864, 0.025423613532858953, 0.9894877391907223 ],
"left_rectification_projection": [ 626.6889475904168, 0.0, 241.70604515075684, 0.0, 0.0, 626.6889475904168, -14.245504140853882, 0.0, 0.0, 0.0, 1.0, 0.0 ],
"right_rectification_projection": [ 626.6889475904168, 0.0, 241.70604515075684, -39408.1200582833, 0.0, 626.6889475904168, -14.245504140853882, 0.0, 0.0, 0.0, 1.0, 0.0 ],
"rotation_between_cameras": [ 0.9872448926790524, 0.04479123189539892, 0.1527784913671389, -0.05312665211981772, 0.9972887665518325, 0.05091831639070963, -0.15008357909400416, -0.05838545756279675, 0.9869478494994133 ],
"translation_between_cameras": [ -62.242314158823326, 0.40448824449349574, 8.944843566828835 ],
"essential_matrix": [ 0.4145025490060706, -8.944208238995348, -0.05624757175054573, -0.5107981528847709, -3.233395429460345, 62.79649781257273, 2.907396817923292, -62.091678241542645, -3.2310709489983704 ],
"fundamental_matrix": [ -1.5767063911319926e-06, 3.402244528496942e-05, -0.000754498027723598, 1.9430006260830975e-06, 1.2299357991674101e-05, -0.1676738143056787, -0.007472260198639971, 0.15976611514786102, 1.0 ],
"reprojection_matrix": [ 1.0, 0.0, 0.0, -241.70604515075684, 0.0, 1.0, 0.0, 14.245504140853882, 0.0, 0.0, 0.0, 626.6889475904168, 0.0, 0.0, 0.015902533454109576, -0.0 ]
}
]
}
The calibrateCamera() does a pretty decent job at computing radial distortion coefficients, but I've had a hard time getting completely accurate focal length. It gets close, but when you try use the computed focal length to estimate a distance to a known sized object I'll see errors of around a few cm at a distance of about a meter from the camera. If you are just using a single camera for tracking this isn't too much of a problem, but can become a problem when you use multiple cameras of different types for triangulation (more on that later). I think it might be better to have a separate focal length tool that measures a PSMove controller bulb (known size) at regular known distances from the camera being calibrated (along a ruler pointed away from the camera).