Skip to content

Photogrammetry Methods

Rob Holman edited this page Jan 24, 2017 · 3 revisions

The connection between the locations of objects in the world and their corresponding location in an image is described by photogrammetric relationships. The form of these relationships that we use is taken from Hartley and Zisserman [2003] and uses the concept of homogeneous coordinates. It is well described in several references but is summarized here for completeness and since many of the implementation details require this knowledge. Much of this content is taken from a UAV methods paper by Holman, Brodie and Spore [2017].

By convention, objects in the world are described by the 3D coordinates, [x, y, z] (cross-shore, longshore, vertical), while their image locations are described by the 2D coordinates, [U, V] (both are right hand coordinate systems). In a homogenous formulation, the two are related through a 3 by 4 projective transformation matrix, P, such that

(1)

The normal 2 and 3D vectors are each augmented by an additional coordinate, set to the value of 1. Thus, for any particular world location, if P is known, the image location is found by the multiplication in equation (1). In homogeneous coordinates, the answer on the left is considered to be known to a multiplicative constant. That means that the literal product of the multiplication in (1) will yield a non-unitary last component, but this is logically equivalent to what you would get by dividing by the last value, in which case the first two components are the image coordinates of the object. Thus, computation of image coordinates requires first the multiplication, then the normalization to make the last element equal to 1. There are many benefits to make up for the inconvenience of the second step.

The projective matrix is composed of three factor matrices,

(2)

K contains the intrinsic parameters of the camera, those that convert from angle away from the center of view into camera coordinates. R is the rotation matrix describing the 3D viewing direction of the camera compared to the world coordinates system. The final bracketed term is a 3 by 3 identity matrix, I, augmented by C, a 3 by 1 vector of the camera location in world coordinates. Taking the multiplication (equation 1) in steps, first multiplying the bracketed term in (2) by the object world coordinates causes subtraction of the camera location from the object location, effectively putting the object in camera-centric coordinates. Then multiplying by the rotation matrix rotates into directions relative to the camera look direction. Finally, multiplying by K, the intrinsic matrix, converts into pixel units for the particular lens and sensing chip. The steps that convert from world to image coordinates are well illustrated in Hartley and Zisserman [2003] on pages 153-157.

The intrinsic parameter matrix, K, is a function of the camera lens and chip and is not a function of the specific installation, i.e. the camera location and viewing angles. As a consequence, the parameters in K are found during a lens calibration prior to camera installation. We use the excellent Caltech calibration package (http://www.vision.caltech.edu/bouguetj/calib_doc/). Lens calibration is described here.

(3)

Here fU and fV are the focal lengths in the U and V directions, expressed in pixels, U0 and V0 are the coordinates of the principal point (geometric image center), and s is the image skewness (cosine of the angle between the U and V axes) and is assumed to be 0.0. K has 5 degrees of freedom (DOF) with values returned during the calibration process. Because the number of degrees of freedom will be important to the following discussions, we will use numbers rather than words to enumerate them.

Note that the lens calibration process also computes estimates of lens distortion parameters, used to convert between image locations from the camera and those that would have been returned from a perfect camera with no lens distortion. Some cameras such as those with fish eye lenses exhibit severe barrel distortion, for example a highly curved horizon that must be corrected for. But even fairly accurate lenses requires calibration and distortion removal. This process is always used but is not described further in the discussion below (see the Caltech toolbox).

The rotation matrix, R, represents the 3D rotation between world and camera coordinates. There are 3 degrees of freedom, the azimuth (taken here as the compass-like rotation clockwise from the positive y-axis), the tilt (zero at nadir, rising to 90° at the horizon), and roll (rotation about the look direction, positive in the counter-clockwise direction as viewed from the camera). The details can be found on page 612 in Wolf [1983].

Finally, the camera location, C, has 3 degrees of freedom, its 3D world location.

Thus, there are 11 total unknowns of which 5 can be solved during calibration and 6 must be found after camera placement (the 3 coordinates of the camera location and the 3 rotation angles). In general, these values are found using Ground Control Points (GCPs), points whose world coordinates are known by survey and whose image coordinates can be accurately digitized from an image. Combining equations (1) and (2) and applying these to a set of such points, the only unknowns will be the 6 camera parameters so these can be found by a standard nonlinear solver (comparing measured and predicted image coordinates for a guess at the 6 unknowns then searching for their optimum values that minimize the squares of their differences).

Since there are 6 unknowns, we need at least 6 knowns for a solution. Each control point contributes 2 values (U and V coordinates) so at least three points are needed. We prefer to be over-determined so will use at least four points in the following tests. For terrestrial applications it is typically easy to find or place an abundance of GCPs throughout the view to allow solution of camera extrinsic geometries. This is at the heart of Structure from Motion algorithms like those from Agisoft or Pix4D. However, surf zone images usually contain only a minimum amount of land by design, so GCP options are often limited and poorly distributed over the image, often lying in a line along the dune crest, a configuration that makes the inverse solution ill-posed. For these cases, common for nearshore studies, we must rely on alternate sources of information to reduce the number of degrees of freedom and the requirements on GCP layout.

For a fixed station, the locations of each camera are usually known by survey so that only the 3 rotation angles are still unknown. These can be solved by nonlinear fit to at least two, but preferably more GCPs.

For a UAV, discussed in Holman, Brodie and Spore [2017], it is rare to find sufficiently accurate information of the azimuth and tilt of an airborne camera so these variables almost always must be solved for. However the camera location is often available in the imagery, based on an onboard GPS system, and can be extracted, for example by using exiftool or other image information packages. Vertical position is also often returned in the image metadata and could be used if no better GCPs are available. However, it is usually less accurate than horizontal position data. For example, altitude may be expressed relative to the takeoff point, rather than in a global coordinate system or it may be a low-quality uncorrected GPS measurement. Finally, it is reasonable to assume for a good stabilized gimbal as on the Phantom 3 that roll is stable and can perhaps taken as equal to 0° for a reasonable approximation. Thus it is possible to reduce to as low as two unknowns which can be solved in a least squares sense with just two GCPs anywhere on the image or in a non-least squares sense with just a single GCP. The relative accuracy of these alternate assumptions is tested in the Holman et al UAV paper.

Clone this wiki locally