assignment04

Note on Exam-Style Problems¶

Instead of handing out sample exams in preparation for the midterm and final, homework assignments include several problems that are marked Exam Style . These are problems whose format and contents are similar to those in the exams, and are graded like the other problems. Please use these problems as examples of what may come up in an exam.

As a further exercise in preparation for exams, you may want to think of other plausible problems with a similar style. If you collect these problems in a separate notebook as the course progresses, you will have a useful sample you can use to rehearse for the exams.

Problem 0 (3 points)¶

Is this assignment properly formatted in the PDF submission, and did you tell Gradescope accurately where each answer is?

Note¶

This question requires no explicit answer from you, and no space is provided for an answer in the template file.

However, the rubric entry for this problem in Gradescope will be used to decuct points for poor formatting of the assignment (PDF is not readable, code or text is clipped, math is not typeset, ...) and for incorrect indications in Gradescope for the location of each answer.

Sorry we have to do this, but poor formatting and bad answer links in Gradescope make our grading much harder.

Part 1: Orthogonality¶

Problem 1.1 (Exam Style)¶

Consider the following matrices:

$$ A = \left[\begin{array}{rrr} 1 & 0 & 1 \\ 0 & 1 & 0 \end{array}\right] \;\;\;\;\;\; B = \frac{1}{2} \left[\begin{array}{rrr} \sqrt{3} & 0 & 1 \\ 0 & 2 & 0 \end{array}\right] \;\;\;\;\;\; C = \frac{\sqrt{2}}{2} \left[\begin{array}{rr} 1 & 1 \\ -1 & 1 \end{array}\right] \;\;\;\;\;\; D = \frac{1}{2}\left[\begin{array}{rrrr} 1 & -1 & 1 & 1 \end{array}\right] \;\;\;\;\;\; $$

and their transposes $A^T$, $B^T$, $C^T$, $D^T$.

List all and only the orthogonal matrices among these eight matrices according to the definition of orthogonality given in the class notes.

Problem 1.2 (Exam Style)¶

Let $V$ be an orthogonal matrix with 3 rows and 2 columns. We saw in class that this implies that $V$ has a left inverse, that is,

$$ V^TV = I_2 $$

where $I_2$ denotes the $2\times 2$ matrix.

We also saw that $V$ cannot have a right inverse, because such a mtrix would have to be the solution $X$ to the system

$$ VX = I_3 $$

where $X$ is $2\times 3$ and $I_3$ is the $3\times 3$ identity matrix. The solution $X$ cannot exist because $V$ has rank 2 and therefore the product $VX$ cannot have a rank greater than 2, while $I_3$ has rank 3.

Carlo now argues that perhaps $V$ has a "pseudo right inverse," that is, a $2\times 2$ matrix $Y$ such that

$$ VY = I_{32} \;\;\;\text{where}\;\;\; I_{32} = \left[\begin{array}{cc} 1 & 0\\ 0 & 1\\ 0 & 0 \end{array}\right] $$

is the "$3\times 2$ identity."

Carlo's argument is that since $V$ is orthogonal and has a left inverse $V^T$, we can multiply both sides of the equation above by $V^T$ from the left to obtain

$$ V^TVY = V^T I_{32} $$

that is,

$$ Y = V^T I_{32} $$

because $V^TV = I_2$.

$$ V = \left[\begin{array}{cc} v_{11} & v_{12} \\ v_{21} & v_{22} \\ v_{31} & v_{32} \end{array}\right] $$

then direct calculation shows that

$$ Y = \left[\begin{array}{cc} v_{11} & v_{21} \\ v_{12} & v_{22} \end{array}\right] \;. $$

In other words, the "pseudo right inverse" is formed by the first two columns of $V^T$.

(a) Compute $Y$ for the orthogonal matrix

$$ V = \frac{1}{2} \left[\begin{array}{rr} \sqrt{3} & 0 \\ 1 & 0 \\ 0 & 2 \end{array}\right] $$

(b) Compute the exact value of $VY$. As you will see, this is not equal to $I_{32}$, and this contradicts Carlo's assertion.

Part 2: Projections¶

The problems in this part refer to the situation shown in the figure below. The point $\mathbf{c}$ is an arbitrary point on the line $\ell$ on the plane, and the vector $\mathbf{v}$ is an arbitrary but nonzero vector parallel to $\ell$.

The point $\mathbf{q}$ is an arbitrary point on the plane and the point $\mathbf{p}$ is the point on $\ell$ that is closest to $\mathbf{q}$ in Euclidean distance. The point $\mathbf{p}$ is called the projection of $\mathbf{q}$ onto $\ell$.

In some problems below, but not all, the line $\ell$ goes through the origin and $\mathbf{c}$ is then set to be the origin itself, for convenience. Also, in some problems below, but not all, the vector $\mathbf{v}$ has unit norm.

Problem 2.1 (Exam Style)¶

A line $\ell$ on the plane has parametric equation

$$ \mathbf{y}(\alpha) = \mathbf{c} + \mathbf{v} \alpha $$

where $\mathbf{c}, \mathbf{v}$ are fixed vectors in $\mathbb{R}^2$, $\mathbf{v}$ is not the zero vector, and $\alpha$ is real-valued variable. The vectors $\mathbf{c}$ and $\mathbf{v}$ are not unique given $\ell$.

This equation means that $\ell$ is the set of all points $\mathbf{y}(\alpha)$ obtained as $\alpha$ varies in $\mathbb{R}$.

As mentioned above, given a point $\mathbf{q}$ on the plane, the projection of $\mathbf{q}$ onto $\ell$ is the point $\mathbf{p}$ of $\ell$ that is closest to $\mathbf{q}$ in Euclidean distance.

Use calculus to derive an expression for the vector $\mathbf{p}$ as functions of $\mathbf{c}$, $\mathbf{v}$, $\mathbf{q}$. Show your derivation and use vector notation throughout (that is, do not "spell out" vectors in terms of their coordinates).

Here, "use calculus" means to use standard methods based on derivatives to find the minimum of a function. This requirement rules out using intuitions such as "this is perpendicular to that." While these intuitions may be correct, they need to be proven, so we cannot rely on them. There are proofs with a geometric flavor, but you are asked for one based on differentiation and minimization.

Hints¶

It is easier to work with the squared distance rather than with the distance.
The squared norm $\|\mathbf{w}\|^2$ of a vector $\mathbf{w}$ can be written as $\mathbf{w}^T\mathbf{w}$.

Problem 2.2 (Exam Style)¶

Consider the expression for $\mathbf{p}$ you found in problem 1.2 in the special case in which the line $\ell$ passes through the origin and $\mathbf{c}$ is the zero vector. In this case, it turns out that you can write

$$ \mathbf{p} = P \mathbf{q}\;. $$

Write an expression for the matrix $P$, which is called the projection matrix onto $\ell$ (where it is assumed that $\ell$ goes through the origin).

Problem 2.3 (Exam Style)¶

Prove that the projection matrix $P$ you found above is symmetric, has rank 1, and is idempotent. A matrix $P$ is idempotent if

$$ P = P^2 $$

(and therefore $P = P^k$ for every positive integer $k$).

Problem 2.4 (Exam Style)¶

Give the simplest possible expression for $P$ in the special case in which $\mathbf{v}$ is a unit vector.

Part 3: Principal Component Analysis¶

Suppose that you are given $n$ points $\mathbf{q}_0,\ldots, \mathbf{q}_{n-1}$ on the plane arranged into the rows of an $n\times 2$ matrix $Q$:

$$ Q = \left[\begin{array}{c} \mathbf{q}_0^T \\ \vdots \\ \mathbf{q}_{n-1}^T \end{array}\right] \;. $$

If you suspect that the points are not too far from being aligned, it makes sense to look for a line that fits the points as well as possible. Using linear regression would write the line in the form

$$ a y_0 + b = y_1 $$

where a point $\mathbf{y}$ on the plane has coordinates $y_0, y_1$, and then find coefficients $a$ and $b$ that minimize the residual of the following over-constrained linear system of $n$ equations in 2 unknowns:

$$ \left[\begin{array}{cc} \boldsymbol{\gamma}_0 & \mathbf{1} \end{array}\right] \left[\begin{array}{cc}a\\b \end{array}\right] = \boldsymbol{\gamma}_1 \;. $$

In this expression, $\boldsymbol{\gamma}_0$ and $\boldsymbol{\gamma}_1$ are the columns of $Q$ and $\mathbf{1}$ is a column vector of $n$ ones.

There are at least two limitations to this approach:

A vertical line cannot be represented in the form above, so linear regression cannot give the correct solution whenever the points in $Q$ are either aligned along or close to a vertical line. You could switch the roles of $y_0$ and $y_1$, but you don't know ahead of time if the line is vertical.
More importantly, linear regression can only find lines on the plane or hyperplanes in more dimensions. For instance, in three dimensions linear regression can only find a plane, which would have the form

$$ a y_0 + b y_1 + c = y_2 \;. $$

For instance, linear regression would not be able to fit a line to points in three dimensions.

A more general approach is Principal Component Analysis (PCA), which addresses the following problem:

Given $n$ points in $\mathbb{R}^d$ as the rows of an $n\times d$ matrix $Q$ and an integer dimension $e$ with $0 < e < d$, fit an $e$-dimensional affine subspace $\cal{A}$ of $\mathbb{R}^d$ to the points in $Q$.

An $e$-dimensional affine subspace $\cal{A}$ of $\mathbb{R}^d$ can be represented in parametric form as follows:

$$ \mathbf{y} = \mathbf{c} + V \boldsymbol{\alpha} $$

where

$\mathbf{y}\in\mathbb{R}^d$ is a point that varies on $\cal{A}$.
$\mathbf{c}\in\mathbb{R}^d$ is an unknown but fixed point on $\cal{A}$.
$V$ is an unkown but fixed orthogonal matrix with $d$ rows and $e$ columns.
$\boldsymbol{\alpha}$ is a vector of variables in $\mathbb{R}^e$.

Thus, all points in $\cal{A}$ are equal to $\mathbf{c}$ plus a linear combination (with coefficients in $\boldsymbol{\alpha}$) of the columns of $V$, which form an orthonormal basis for $\cal{A}$.

Please spend a minute to verify that when $d = 2$ and $e = 1$ the affine space $\cal{A}$ is the same as the line $\ell$ in Part 2 of this assignment if the vector $\mathbf{v}$ defined there has unit norm. Also note that even if $\mathbf{y}$ happens to be on an $e$-dimensional space, we still need $d$ coordinates for it when we express it in the reference system for $\mathbb{R}^d$.

An algorithm for computing $\mathbf{c}$ and $V$, that is for fitting an affine $e$-dimensional space to a set of $d$-dimensional points, is as follows (given without proof).

PCA Algorithm¶

The point $\mathbf{c}$ is the centroid of all the points in $Q$:

$$ \mathbf{c} = \frac{1}{n} \sum_{i=0}^{n-1} \mathbf{q}_i \;. $$

Subtract the centroid from every row of $Q$ to obtain the matrix of centered points

$$ \tilde{Q} = Q - \mathbf{1} \mathbf{c}^T $$

where $\mathbf{1}$ is a column vector of $n$ ones. (Of course, in Python you would just say $\tilde{Q} = Q - \mathbf{c}$ because of the broadcasting rules of numpy).

Let

$$ \tilde{Q} = \tilde{U} \tilde{\Sigma} \tilde{V}^T $$

be the SVD of $\tilde{Q}$. Then, an orthonormal basis for the affine space $\cal{A}$ is given by the matrix

$$ V = \tilde{V}[:,\ :\!e] $$

made of the first $e$ columns of $V$.

Fitting Points with an Affine Space with the PCA¶

The points $\mathbf{c}$ and matrix $V$ computed by PCA are the elements of a parametric representation of the affine $e$-dimensional space that best fits the points in $Q$:

$$ \mathbf{y}(\boldsymbol{\alpha}) = \mathbf{c} + V\boldsymbol{\alpha} $$

where $\alpha$ is an $e$-dimensional vector of parameters.

Compare this equation with the parametric equation of the line $\ell$ in Part 2 of this assignment. However, in this new equation, $\mathbf{c}$ and $V$ are no longer arbitrary: $\mathbf{c}$ is the centroid of the points in $Q$ and $V$ is orthonormal.

Just as you did in Part 2, but now in more dimensions, if you have $\mathbf{c}$ and $V$ you can also find the projection $\mathbf{p}$ of a point $\mathbf{q}\in\mathbb{R}^d$ as follows:

$$ \mathbf{p} = \mathbf{c} + VV^T (\mathbf{q} - \mathbf{c}) \;. $$

Incidentally, this may be a good time to check that the results you found in Part 2 are consistent with this equation when $d=2$, $e=1$, and $\|\mathbf{v}\| = 1$.

Some Useful Software and Data¶

You will need some utility functions and some data for this Part. To this end, run the two code cells below.

In [1]:

import urllib.request
import ssl
from os import path as osp
import pickle
import shutil


def retrieve(file_name, semester='spring23', homework=4):
    if osp.exists(file_name):
        print('Using previously downloaded file {}'.format(file_name))
    else:
        context = ssl._create_unverified_context()
        fmt = 'https://www2.cs.duke.edu/courses/{}/compsci527/homework/{}/{}'
        url = fmt.format(semester, homework, file_name)
        with urllib.request.urlopen(url, context=context) as response:
            with open(file_name, 'wb') as file:
                shutil.copyfileobj(response, file)
        print('Downloaded file {}'.format(file_name))

In [2]:

retrieve('hw4_utils.py')
mnist_file = 'mnist_subset.pkl'
retrieve(mnist_file)
with open(mnist_file, 'rb') as file:
    digits = pickle.load(file)

Downloaded file hw4_utils.py
Downloaded file mnist_subset.pkl

Write a function with header

def pca(q, e):

that takes an $n\times d$ numpy array q of floating point numbers and an integer e between 0 and $d$ exclusive and returns a dictionary with keys 'centroid' and 'basis'. The values corresponding to these keys are the centroid $\mathbf{c}$ (a numpy vector with $d$ entries) and the $d\times e$ matrix $V$ of the PCA of q.

Then test your code by fitting a line to a set of points on the plane. To this end, call

In [3]:

from hw4_utils import noisy_plane_points

q0, c0, v0, rect = noisy_plane_points(30, sigma=0.1)

The values returned by noisy_plane_points are as follows:

A $30\times 2$ matrix q0 representing 30 points on the plane. These are noisy versions of points that, in the absence of noise, would be on a vertical line. In this way, you can verify that PCA can handle vertical lines with no problem.
The parameters c0 and v0 (both numpy 2-vectors) that represent the ideal line.
A dictionary rect with keys left, right, bottom, top that specify the sides of a rectangular box to be used to display the points. Each value is a coordinate.

Then, call your pca function with appropriate parameters to compute the PCA of q0, so that the PCA represents a line.

Finally, make a figure that displays the following:

The original, ideal line in black.
The 30 noisy points as red dots.
The line found by pca in blue.
The projections of the 30 points onto the PCA line as blue dots.

Make the dots large enough to be visible, especially the blue ones.

To draw the lines, you can say

from hw4_utils import plot_line

and then

plot_line(c0, v0, rect)

for the ideal line, and a similar command for the line found by PCA.

Notes¶

The point here is to help you understand the basic steps of the PCA algorithm, so you may not use any pre-made PCA function such as sklearn.decomposition.PCA. Instead, used numpy.linalg.svd.
Keep in mind that numpy.linalg.svd returns the transpose of $V$, not $V$ itself. Also you don't need $U$ or $\Sigma$.
For safety, add an assert statement that checks that e is within proper bounds.
The function plot_line does not make a figure or call plt.show() or plt.draw(). That call is up to you.
To make a figure with a reasonable aspect ratio, you can run the code cell below to get the aspect ratio of rect.
In any case, make sure that horizontal and vertical coordinates in your plot have the same scale (for instance, a square would have to look like a square, not a rectangle).

In [4]:

from hw4_utils import box_aspect

aspect = box_aspect(rect)

Besides fitting affine subspaces, another common use of PCA is data compression. To understand this concept, let us go back to the projection equation:

$$ \mathbf{p} = \mathbf{c} + VV^T (\mathbf{q} - \mathbf{c}) \;. $$

We can break down this equation into two steps:

$$ \begin{eqnarray*} \text{Compression:} & \phantom{wide} & \boldsymbol{\alpha} = V^T (\mathbf{q} - \mathbf{c}) \\ \text{Embedding:} & & \mathbf{p} = \mathbf{c} + V\boldsymbol{\alpha} \end{eqnarray*} $$

which we can understand intuitively as follows.

Compression¶

Note that $\mathbf{q}$, $\mathbf{c}$, and $\mathbf{p}$ are in $\mathbb{R}^d$ (the large space) while $\boldsymbol{\alpha}$ is in $\mathbb{R}^e$ (the small space).

Specifically, the compression equation above shows that entry $\alpha_j$ of $\boldsymbol{\alpha}$ is the inner product of $\tilde{\mathbf{q}} = \mathbf{q} - \mathbf{c}$ with column $j$ of $V$ (or row $j$ of $V^T$). That is, $\alpha_j$ is the $j$-th coordinate of $\tilde{\mathbf{q}}$ in the orthonormal reference frame represented by the columns of $V$.

In this sense, this mapping compresses a $d$-dimensional point $\mathbf{q}$ to an approximate representation $\boldsymbol{\alpha}$ of it in the lower-dimensional space $\cal{A}$ of dimension $e$, for which a parametric representation $(\mathbf{c}, V)$ is known.

For instance, in Problem 3.1, the blue points are approximations to the red points. They have two coordinates ($\mathbf{p}$) because they are points on the plane. However, if we know that they are on the blue line, for which we have computed a parametric representation, a single number per point ($\alpha$) is sufficient.

Embedding¶

The vector of coordinates $\boldsymbol{\alpha}$ represents the projection $\mathbf{p}$ of $\mathbf{q}$ onto $\cal{A}$, not $\mathbf{q}$ itself. This is why $\boldsymbol{\alpha}$ is an approximate representation.

That projection can be expressed in the original reference system for $\mathbb{R}^d$ (the large space) by the change of basis given in the embedding equation. We do not call this transformation a "decompression" because the final vector $\mathbf{p}$ is not equal to $\mathbf{q}$ but rather to the projection of $\mathbf{q}$ onto $\cal{A}$.

Therefore, $\mathbf{p}$ is merely an approximation to $\mathbf{q}$, and the approximation is good only to the extent that $\mathbf{q}$ is not too far from $\cal{A}$.

For instance, in Problem 3.1, the blue dots are good approximations of the red dots only to the extent that the red dots are approximately aligned to each other (and therefore to the PCA line) to begin with.

Compression for Visualization¶

An extreme version of compression arises when $d$ is, say, an MNIST digit image with $d = 784$ dimensions (recall that MNIST images have $28\times28$ pixels) and $e = 2$. Then, the vectors $\alpha$ have two coordinates each, and can be plotted on the plane to give a very crude appoximation of the relative arrangement of the original $784$-dimensional points. Let us see how this works.

In the preamble to this part of the assignment, you ran a code cell that retrieved a small portion of the MNIST data set and loaded it into the dictionary digits.

The value for digits['x'] is an numpy floating-point array $Q$ of shape $n\times d = 2000\times 784$. Each row of $Q$ is a flattened MNIST image. As usual, digit['y'] is a numpy array of $n$ integers that specify the true label for each image.

Your Task¶

Write a function with header
```
  def compress(q, space):
```
that applies the compression operation described above to the rows of q (an $n\times d$ numpy array) and produces the coordinates alpha (a $n\times e$ numpy array).

The argument space is a PCA dictionary returned by your pca function.
Use your pca function to compute the PCA of digits['x'] for $e = 2$ dimensions and use the result to compress the array digits['x'] itself.
Make a plot of all the alpha points, one dot per point. Each dot should have one of ten distinct colors that denotes what digit that point represents (that is, its true label). Your plot should include a legend that shows which color corresponds to which digit.
Your picture will show a rather disorganized jumble of points, although some of the digits appear to cluster together. Answer the following questions:
- Which digit forms the tightest and most well-separated cluster in your plot?
- Most of the dots for different digits seem to be interspersed with each other. Why is your plot misleading in this regard? That is, why is it still possible that the actual data form more well-separated clusters than the plot seems to suggest?

Note¶

Since the points $\mathbf{q}_i$ are in the rows of $Q$, we rewrite the compression and embedding (for later reference) equations by transposing them and collecting them into matrix form:

$$ \begin{eqnarray*} \text{Compression:} & \phantom{wide} & A = (Q - \mathbf{1}\mathbf{c}^T) V \\ \text{Embedding:} & & P = \mathbf{1} \mathbf{c}^T + A V^T \end{eqnarray*} $$

where $\mathbf{1}$ is a vector of $n$ ones. In Python, you can essentially ignore this vector, given standard numpy broadcasting rules. Remember to use matrix multiplication (@) where appropriate.

We now explore how well the compressed images above approximate the true ones.

Write a function with header

def embed(alpha, space):

that implements the embedding transformation. The argument alpha is the array output by your compress function and space has the same meaning as before.

Then select one image and its corresponding label out of digits for each label in $\{0, \ldots, 9\}$ (your choice) and use these ten digit samples to make a figure with a $2\times 10$ grid of subplots. Subplot (1, k) has the true digit image for label k from your sample, with the true label as title. Subplot (2, k) has the digit image reconstructed by embed.

Some digit approximations are recognizable, while others are more confused. This is not surprising, since we are compressing 784 dimensions down to 2. To visualize where the confusion comes from, note that the embedding equation

$$ \mathbf{p} = \mathbf{c} + V\boldsymbol{\alpha} $$

for this example can be rewritten as follows:

$$ \mathbf{p} = \mathbf{c} + \alpha_0 \mathbf{v}_0 + \alpha_1 \mathbf{v}_1 $$

where $\mathbf{v}_0$ and $\mathbf{v}_1$ are the two columns of $V$.

Thus, the approximate image $\mathbf{p}$ is the centroid $\mathbf{c}$ of all the images plus a linear combination of two "singular images" $\mathbf{v}_0$ and $\mathbf{v}_1$.

All these vectors have 784 entries, so they can all be reshaped into $28\times 28$ images.

Your Task¶

Display the vectors $\mathbf{c}$, $\mathbf{v}_0$, and $\mathbf{v}_1$ as $28\times 28$ images in a figure with a 1 by 3 grid of subplots. Put a title above each image.

Notes¶

The vectors $\mathbf{v}_0$ and $\mathbf{v}_1$ may have negative values. Take the reshaped versions of these two vectors and pass them to the function signed_image imported from hw4_utils to get a color image. In this image, positive values are left as is (gray) and negative values are shown in red.
Use a gray color map to display $\mathbf{c}$ and no color map for the other two images, since they are color images already.
Every image in the second row of your figure in the previous problem must be a linear combination of these three pictures (with the coefficient for $\mathbf{c}$ equal to one). So if anything it is quite surprising that these approximations, poor as they are, are not worse.

Redraw the figure you made in Problem 3.3, but using $e=30$ instead of $e=2$. Of course, projecting to a larger subspace reduces the approximation error.

Part 4: The Aperture Problem¶

Conceptually, the aperture problem in computer vision is the fact that the motion between two images can be determined through local analysis at a given point only if the image texture in the near vicinity of that point is rich enough.

Mathematically, in a differential description of the relation between image intensities and motion, the aperture problem refers to the fact that the Brightness Change Constraint Equation (BCCE)

$$ \mathbf{u}^T \nabla_{\mathbf{x}} I + I_t = 0 $$

is a single scalar equation in the two components of the velocity vector $\mathbf{u}$, and therefore cannot constrain $\mathbf{u}$ uniquely.

If on the other hand motion is described as a finite displacement $\mathbf{d}$, then the aperture problem refers to the fact that at a single point $\mathbf{x}$ there may be multiple connected local minima of the image residual

$$ \ell(\mathbf{x}, \mathbf{d}) = [J(\mathbf{x} + \mathbf{d}) - I(\mathbf{x})]^2 $$

as a function of $\mathbf{d}$. Specifically, the two scalar equations provided by the vector normal equation

$$ \nabla_{\mathbf{d}}\ell(\mathbf{x}, \mathbf{d}) = \mathbf{0} $$

may be linearly dependent.

Because of the aperture problem, the differential version is typically formulated as a global problem in which the motion field $\mathbf{u}(\mathbf{x})$ is required to be smooth, and the finite-displacement version is typically formulated under the assumption that the displacement vector $\mathbf{d}$ is constant in a small window around $\mathbf{x}$. If image texture is rich enough, these additional assumptions provide new constraints that pin down a solution.

This part of this assignment explores different facets of the aperture problem. All images considered here are black-and-white.

Problem 4.1 (Exam Style)¶

If the spatial gradient $\nabla_\mathbf{x} I$ of image intensity $I$ is nonzero at a certain point $\mathbf{x}$ and the unit vector along that gradient is denoted as

$$ \mathbf{g} = \frac{\nabla_\mathbf{x} I}{\|\nabla_\mathbf{x} I\|} $$

then the normal component of the image motion $\mathbf{u}$ at $\mathbf{x}$ is the velocity

$$ v = \mathbf{g}^T \mathbf{u} \;. $$

Give the values of $\mathbf{u}_0$ and $\mathbf{b}$ in a parametric representation

$$ \mathbf{u} = \mathbf{u}_0 + \alpha \mathbf{b} $$

of the line of all motions $\mathbf{u}$ that are consistent with a given normal component $v$ and unit image gradient $\mathbf{g}$. In your representation, $\mathbf{u}_0$ should have the smallest norm possible and $\mathbf{b}$ must be one of two possible unit vectors.

Problem 4.2 (Exam Style)¶

A certain telescope with an electronic image sensor is pointed at a star in the night sky, and the image of the star is at the origin of a reference system on the sensor. The telescope is slightly out of focus, so the irradiance produced by the light from the star on the sensor plane is an isotropic Gaussian function of the position $\mathbf{x}$ on the sensor:

$$ I(\mathbf{x}) = e^{-\frac{\|\mathbf{x}\|^2}{\sigma^2}} $$

for $\sigma = 7$ millimeters.

Two images $I_0$ and $I_1$ are taken a minute apart, and as a result the peak of the star image and the Gaussian around it translates from position $(0, 0)$ to a new position $\mathbf{d}$ on the sensor.

The images turn out to be saturated (too bright), and therefore measurements of the position of the peak of the Gaussian are unreliable. Instead, measurements are taken at positions away from the peak, where both images are better exposed, and these measurements reveal the following:

At position $\mathbf{p} = (3, 4)$ millimeters in the first image $I_0$ the normal component of the displacement $\mathbf{d}$ between the two images $I_0$ and $I_1$ is found to be $d = -0.2$ millimeters.
At position $\mathbf{q} = (0, 4)$ millimeters in the first image $I_0$ the normal component of the displacement $\mathbf{d}$ between the two images $I_0$ and $I_1$ is found to be $e = 2$ millimeters.

Give the exact numerical value of the displacement $\mathbf{d}$ in millimeters. Show your derivation.

Notes¶

The normal component of the displacement is defined similarly to the normal component of the velocity vector, with the spatial gradient $\nabla_\mathbf{x} I$ being measured in the first image $I_0$.
Not all the information given may be relevant to the problem.

The figure below shows six different $31\times 31$ windows in the two images fauci_0.png and fauci_1.png retrieved and loaded in the code cell below. Each window has a name, and the names and center coordinates of the windows are stored in the dictionary points imported from hw4_utils below. The variable win_h_size defined below is bound to 15, which is half the window size (excluding the midpoint).

In [5]:

import skimage as sk
import numpy as np

names = ['fauci_0.png', 'fauci_1.png']
for name in names:
    retrieve(name)
images = [sk.io.imread(name) for name in names]

from hw4_utils import points

win_half_size = 15

Downloaded file fauci_0.png
Downloaded file fauci_1.png

You can compute the gradient of the image images[0] as follows, using a Gaussian kernel with a half-width of 5 pixels:

In [6]:

from hw4_utils import gradient

gauss_half_size = 5

grad_r, grad_c = gradient(images[0], gauss_half_size)

The two numpy arrays grad_r and grad_c are the vertical and horizontal components of the gradient, and have the same shape as images[0] and images[1].

You can also make a $31\times 31$ array of Gaussian weights as follows:

In [7]:

from hw4_utils import gauss

gauss_vector = gauss(win_half_size)[0]
gauss_weights = np.outer(gauss_vector, gauss_vector)

Write a function with header

def gramian(gr, gc, p, w):

to be called as follows:

a = gramian(grad_r, grad_c, point, gauss_weights)

where all variables other than point were defined above. The variable point is one of the values in the dictionary points imported above.

Your function gramian should return a numpy array of floats representing the $2\times 2$ matrix $A$ defined in equation (15) of the class notes on image motion.

Then write code that produces a printout and two figures.

For each item in points, the printout should contain a single line with the name of that point and the eigenvalues $\lambda$s (use np.linalg.eigvals) of the corresponding matrix $A$ sorted in decreasing order.
The first figure should contain a square plot with one text string for each item in points, produced with a plt.text instruction. The string is that point's name and the center of the string should be at coordinates $\log(\lambda_0), \log(\lambda_1)$ of the plot. Label the axes. Since $\lambda_0 \geq \lambda_1$, all points are under the identity function. For reference, also draw the line for the identity function on the plot, and make sure that the scaling of the two axes is the same (plt.axis('equal')).
The second figure should be a $2\times 3$ grid of sub-plots with the six $31\times 31$ windows of images[0] centered at the values specified in points. Use plt.imshow with parameters cmap=plt.cm.gray, vmin=0, vmax=255 for proper display. Give each subplot a title with the name of the window (a key from points) and use commands plt.axis('image') and plt.axis('off').

Problem 4.4 (Exam Style)¶

The point names in your plot should be arranged roughly in three groups:

jacket close to the bottom left of the plot;
background and tie close to each other somewhere near the middle of the plot;
the remaining names more or less vertically aligned close to the right side of the plot.

Explain this arrangement. Specifically,

What information do $\lambda_0$ and $\lambda_1$ (or equivalently their logarithms) provide about the the distribution of gradients in a window? Specifically, what does this distribution look like, qualitatively and in general, when
- $\lambda_0$ and $\lambda_1$ are both large;
- $\lambda_0$ is large and $\lambda_1$ is small; and
- $\lambda_0$ and $\lambda_1$ are both small?
Also tie your explanations to the aperture problem.
What explains the division into three groups based on $\lambda_0$ (or equivalently $\log(\lambda_0)$) in terms of the appearance of the specific windows in your last figure? Explain what is going on in each group.
If you are told that Lucas-Kanade can track four of these six windows reliably, which four would you pick, and why? What numerical criterion could you use to tell them apart from the other two?

COMPSCI 527 Homework 4¶

Homework Submission Workflow¶

Note on Exam-Style Problems¶

Problem 0 (3 points)¶

Note¶

Part 1: Orthogonality¶

Problem 1.1 (Exam Style)¶

Problem 1.2 (Exam Style)¶

Part 2: Projections¶

Problem 2.1 (Exam Style)¶

Hints¶

Problem 2.2 (Exam Style)¶

Problem 2.3 (Exam Style)¶

Problem 2.4 (Exam Style)¶

Part 3: Principal Component Analysis¶

PCA Algorithm¶

Fitting Points with an Affine Space with the PCA¶

Some Useful Software and Data¶

Problem 3.1¶

Notes¶

Problem 3.2¶

Compression¶

Embedding¶

Compression for Visualization¶

Your Task¶

Note¶

Problem 3.3¶

Problem 3.4¶

Your Task¶

Notes¶

Problem 3.5¶

Part 4: The Aperture Problem¶

Problem 4.1 (Exam Style)¶

Problem 4.2 (Exam Style)¶

Notes¶

Problem 4.3¶

Problem 4.4 (Exam Style)¶