When you submit your work, follow the instructions on the submission workflow page scrupulously for full credit.
Instead of handing out sample exams in preparation for the midterm and final, homework assignments include several problems that are marked Exam Style . These are problems whose format and contents are similar to those in the exams, and are graded like the other problems. Please use these problems as examples of what may come up in an exam.
As a further exercise in preparation for exams, you may want to think of other plausible problems with a similar style. If you collect these problems in a separate notebook as the course progresses, you will have a useful sample you can use to rehearse for the exams.
Is this assignment properly formatted in the PDF submission, and did you tell Gradescope accurately where each answer is?
This question requires no explicit answer from you, and no space is provided for an answer in the template file.
However, the rubric entry for this problem in Gradescope will be used to decuct points for poor formatting of the assignment (PDF is not readable, code or text is clipped, math is not typeset, ...) and for incorrect indications in Gradescope for the location of each answer.
Sorry we have to do this, but poor formatting and bad answer links in Gradescope make our grading much harder.
Given an integer $k$ with $0 \leq k < 8$, let $\mathbf{P}_k = (X_k, Y_k, Z_k)^T$ be the bits in the binary representation of $k$, listed from most significant to least significant. Thus, $\mathbf{P}_0 = (0, 0, 0)^T, \mathbf{P}_1 = (0, 0, 1)^T, \ldots, \mathbf{P}_7 = (1, 1, 1)^T$. All world coordinates are in meters.
The eight points $\mathbf{P}_k$ are the corners of a cube in a world reference system that has origin $\mathbf{P}_0$ and unit axes $\mathbf{P}_4$, $\mathbf{P}_2$, $\mathbf{P}_1$. This reference system is system number 0.
The $Z$ axis of the world reference system is vertical and points up. That is, it points in the direction opposite to gravity.
Consider now an ideal pinhole camera with camera reference system number 1. The center of projection is at point $\mathbf{t}_1 = (10.5, 11.5, 2.5)^T$ in the world reference frame. The camera fixates the centroid of the cube. This means that the centroid $\mathbf{C} = (1/2, 1/2, 1/2)^T$ projects to the principal point of the image.
In addition, the $X_1$ axis of this reference system is horizontal relative to gravity. The $X_1$ axis points to the right when looking along the positive direction of the $Z_1$ axis.
All coordinates in the camera reference system are in meters.
Give the numerical values of the rotation matrix $R_1$ of the camera reference system written in the world reference system. Show your derivation and the result with three decimal digits after the period.
The pixels in the camera are square and are arranged in an array of height $4.5$ millimeters and width $6$ millimeters with 1200 columns. The focal distance is $f = 35$ millimeters. The principal point is exactly in the center of the image.
Compute the image (column, row) coordinates in pixels of the vertex $\mathbf{P}_5$ of the cube. Show your derivation and give the approximate numerical result with three decimal digits after the period.
It's (column, row), not (row, column).
Write code that computes and displays the image of the eight corners of the cube in the scenario above.
matplotlib.pyplot.text
to place the number of each point at its image position. For instance, instead of plotting the image of $\mathbf{P}_5$ as a dot, put the number 5 there. Use arguments ha='center', va='center', fontsize=18
to matplotlib.pyplot.text
for proper positioning and visibility.A certain camera pair $(a, b)$ has essential matrix
$$ E = \left[\begin{array}{rrr} 0 & 1 & 2 \\ -1 & 0 & -1 \\ -2 & 1 & 0 \end{array}\right] $$and the focal lengths of the cameras are 3 units of measure. All lengths are in the same unit of measure.
Give one rotation matrix $R$ and all possible translation vectors $\mathbf{t}$ consistent with $R$ and the essential matrix $E$ given above. Justify your answer.
$R$ and $\mathbf{t}$ are abbreviations for ${^a\!R_b}$ and ${^a\mathbf{t}_b}$.
Does the format of $E$ look familiar?
Give the epipoles ${^a\mathbf{e}_b}$ and ${^b\mathbf{e}_a}$ for that camera pair, expressed in canonical image coordinates (not in canonical camera coordinates).
Remember that ${^a\mathbf{e}_b}$ is a point in image $a$.
To simplify the mechanics of writing algebra in LaTeX, let $(x_a, y_a)$ be canonical image coordinates for camera $a$ in the camera pair with the essential matrix $E$ given above, and let $(x_b, y_b)$ be canonical image coordinates for camera $b$. In other words, we use right subscripts to denote the reference system, rather than left superscripts. This should not be confusing, since we won't need to refer to multiple points in each image.
Let $\ell_a$ be an epipolar line in image $a$ with equation
$$ u_a x_a + v_a y_a + w_a f = 0 $$where $f$ is the focal distance common to the two cameras.
Give an equation for the epipolar line $\ell_b$ in image $b$ that corresponds to $\ell_a$. That is, if $\ell_b$ has equation
$$ u_b x_b + v_b y_b + w_b f = 0\;, $$what are values of $u_b, v_b, w_b$ in terms of $u_a, v_a, w_a$ for the specific values of $E$ and $f$ given above?
You may assume that $v_a \neq 0$, that is, that $\ell_a$ is not vertical.
Show your reasoning.
What would an equation for $\ell_b$ be if you knew the canonical image coordinates $(x_a, y_a)$ of any point on $\ell_a$ other than the epipole? Can you come up with such a point given $\ell_a$?
The camera reference system of camera 0 in a certain camera pair is chosen as the world reference system.
The camera reference system of camera 1 in the pair has its origin and unit axes identified by the transformation
$$ \mathbf{t} = \left[\begin{array}{c} 3\\4\\0 \end{array}\right] \;\;\;\text{and}\;\;\; R = \frac{1}{2} \left[\begin{array}{rrr} \sqrt{3} & 1 & 0 \\ -1 & \sqrt{3} & 0 \\ 0 & 0 & 2 \end{array}\right] $$where $\mathbf{t}$ and $R$ are abbreviations for $\mathbf{t}_1$ and $R_1$.
Write an essential matrix $E$ for this camera pair with all entries given as exact algebraic expressions. Then also write $E$ with entries that are approximate numerical values with three digits after the decimal period.
The focal distance of both cameras in the pair above is $f = 50$ millimeters. A point in the world projects to points $\mathbf{p}$ and $\mathbf{q}$ in the images taken by the two cameras. The point in image 0, expressed in coordinates of its own camera canonical image reference system, is
$$ \mathbf{p} = (20, 10) \;\;\;\text{millimeters.} $$Write an equation of the epipolar line $\ell$ of $\mathbf{p}$ in image 1 in the form
$$ ax + by + c = 0 $$where $a, b, c$ are given first as exact algebraic expressions and then as approximate numerical values with three digits after the period.
Consider the points in image 1 of the pair above that have coordinates
$$ \mathbf{q}_1 = \left[\begin{array}{c} 150 \sqrt{3} - 200 \\ 200 \frac{\sqrt{3}}{3} + 50 \end{array}\right] \approx \left[\begin{array}{c} 59.808 \\ 165.470 \end{array}\right] \;\;\;\text{and}\;\;\; \mathbf{q}_2 = \left[\begin{array}{c} 200 \frac{\sqrt{3}}{3} + 50 \\ 150 \sqrt{3} - 200 \end{array}\right] \approx \left[\begin{array}{c} 165.470 \\ 59.808 \end{array}\right]\;\;\;\text{,}\;\;\; $$in millimeters and in the canonical image reference frame of camera 1.
For each of these two points, state whether it is possible for it to correspond to the point $\mathbf{p}$ in image 0 whose coordinates were given in the previous problem. Justify your answer.
If you determined that it is possible for a point (either one or both) to correspond to $\mathbf{p}$, does that point correspond to $\mathbf{p}$ with certainty? Justify your answer.
Two images of a certain scene were taken with two identical cameras with the following parameters:
A set of $n = 108$ pairs of corresponding points were found in the two images, and the coordinates of these points in the pixel reference system of each of the cameras can be retrieved as follows:
import urllib.request
import ssl
from os import path as osp
import shutil
def retrieve(file_name, semester='spring23', homework=5):
if osp.exists(file_name):
print('Using previously downloaded file {}'.format(file_name))
else:
context = ssl._create_unverified_context()
fmt = 'https://www2.cs.duke.edu/courses/{}/compsci527/homework/{}/{}'
url = fmt.format(semester, homework, file_name)
with urllib.request.urlopen(url, context=context) as response:
with open(file_name, 'wb') as file:
shutil.copyfileobj(response, file)
print('Downloaded file {}'.format(file_name))
import pickle
corr_file = 'correspondences.pkl'
retrieve(corr_file)
with open(corr_file, 'rb') as file:
pixel_points = pickle.load(file)
Downloaded file correspondences.pkl
Specifically, the resulting variable pixel_points
names a list of two numpy
arrays of shape $(2, n)$ corresponding to the points in image 0 and image 1. Column $k$ of each array has the (column, row) coordinates (note the ordering) of a point in image pixel coordinates. Point pixel_points[0][:, k]
in image 0 corresponds to point pixel_points[1][:, k]
in image 1.
Write functions with headers
def pixel_to_canonical(pixels, f, s, pp):
def essential_matrix(points):
The function pixel_to_canonical
takes a floating-point numpy
array pixels
such as pixel_points[0]
, a floating-point focal distance f
, a floating-point numpy
vector s =
$(s_x, s_y)$ with the two pixel scale factors, and a floating-point numpy
vector pp =
$(\xi_0, \eta_0)$ with the column, row coordinates of the principal point.
The function pixel_to_canonical
outputs a a floating-point numpy
array with the same shape as pixels
with the $(x, y)$ canonical image coordinates of each point. These coordinates are in units of focal distance. That is, both the canonical camera reference system and the canonical image coordinate system are scaled so that one unit is equal to one focal distance. In other words, divide all coordinates by $f$.
The function essential_matrix
takes a list of two floating-point numpy
arrays of image points in the two canonical image reference systems, such as those produced by pixel_to_canonical
, and uses the eight-point algorithm to compute the essential matrix $E$ for the camera pair.
Show your code and the essential matrix it produces from pixel_points
and the camera parameters given above. Print the essential matrix with five decimal digits after the period.
The function split
below takes an essential matrix E
and returns a list
[(t_0, R_0), (t_1, R_1), (t_2, R_2), (t_3, R_3)]
of rigid transformations (translation, rotation). Each translation vector has unit norm, and each transformation is compatible with E
, as discussed in the class notes. The function cross
is a helper function.
def cross(t):
return np.array(((0, -t[2], t[1]),
(t[2], 0, -t[0]),
(-t[1], t[0], 0)))
def split(E):
VET = np.linalg.svd(E)[2]
t = VET[2, :]
tx = cross(t)
UF, _, VFT = np.linalg.svd(np.dot(E, tx))
R1 = np.dot(UF, VFT)
R1 *= np.linalg.det(R1)
UF[:, 2] = -UF[:, 2]
R2 = np.dot(UF, VFT)
R2 *= np.linalg.det(R2)
return [(t, R1), (t, R2), (-t, R1), (-t, R2)]
The function triangulate
below takes a list points
like the image_points
list you produced earlier, a translation vector t
, and a rotation matrix R
.
The function computes a pair P, Q
of 3D points consistent with points
, t
, and R
. The arrays P
and Q
have shape $3\times n$.
def triangulate(points, t, R):
p, q = points
n = p.shape[1]
assert n == q.shape[1], 'p and q must have the same number of columns'
P = np.zeros((3, n))
i, j, k = R[0], R[1], R[2]
kt = np.dot(k, t)
proj = np.vstack((i, j))
proj_t = np.dot(proj, t)
C = np.array(((1, 0, 0), (0, 1, 0), (0, 0, 0), (0, 0, 0))).astype(float)
c = np.zeros(4)
for m in range(n):
C[:2, 2] = -p[:2, m]
C[2:, :] = np.outer(q[:2, m], k) - proj
c[2:] = kt * q[:2, m] - proj_t
x = np.linalg.lstsq(C, c, rcond=None)
P[:, m] = x[0]
Q = np.dot(R, P - np.outer(t, np.ones((1, n))))
return P, Q
As explained in the class notes, only one of the four choices in the list returned by split
produces mostly points that are in front of both cameras. The others produce some points that are behind at least one of the cameras. Use this observation to write a function with header
def pick(xforms, points):
that takes the output xforms
from split
and points
as in image_points
and returns the correct translation t
, rotation R
, and points P
, Q
corresponding to them.
Show your code and print the translation t
and R
chosen by pick
. Show five digits after the decimal point.