COMPSCI 527 Homework 1¶

Homework Submission Workflow¶

When you submit your work, follow the instructions on the submission workflow page scrupulously for full credit.

Note on Exam-Style Problems¶

Instead of handing out sample exams in preparation for the midterm and final, homework assignments include several problems that are marked Exam Style . These are problems whose format and contents are similar to those in the exams, and are graded like the other problems. Please use these problems as examples of what may come up in an exam.

As a further exercise in preparation for exams, you may want to think of other plausible problems with a similar style. If you collect these problems in a separate notebook as the course progresses, you will have a useful sample you can use to rehearse for the exams.

Problem 0 (3 points)¶

Is this assignment properly formatted in the PDF submission, and did you tell Gradescope accurately where each answer is?

Note¶

This question requires no explicit answer from you, and no space is provided for an answer in the template file.

However, the rubric entry for this problem in Gradescope will be used to decuct points for poor formatting of the assignment (PDF is not readable, code or text is clipped, math is not typeset, ...) and for incorrect indications in Gradescope for the location of each answer.

Sorry we have to do this, but poor formatting and bad answer links in Gradescope make our grading much harder.

Part 1: Correlation and Convolution Basics¶

Let

$$ I = \left[\begin{array}{rrrrr} 1 & 0 & -3 & 2 & 1 & 4 \end{array}\right] \;\;\;\text{,}\;\;\; H = \left[\begin{array}{ccc} -2 & 1 & 3 \end{array}\right] $$

be an "image" and a kernel respectively.

In the problems that follow, pay attention to the difference between convolution and correlation. When padding is needed, pad with zeros distributed symmetrically around $I$. Do all calculations by hand in this part.

Note¶

It may be tempting to use software for these calculations. This choice has two issues:

It constitutes cheating, because the problem asks you to do the calculations by hand.
You would not be prepared to answer similar or related questions in exams.

The calculations may be tedious, but the point is to get you to do enough of them so that they sink in permanently. If you do this, you won't have any residual doubts as to what needs to be done.

After completing the calculations by hand, you may check the results with software and correct them if needed. Do not hand in your checks.

Matrices in LaTeX¶

To render

$$ X = \left[\begin{array}{rrr} 1 & -2 & 3 \\ -4 & 5 & -6 \end{array}\right] $$

in math mode (that is, between single or double dollar signs), you write

X = \left[\begin{array}{rrr}
    1 & -2 & 3 \\
    -4 & 5 & -6
\end{array}\right]

When matrices are row vectors, you can choose this same format, or just use brackets (or parentheses) and commas. That is, all of the following formats are acceptable in this Part of the assignment:

$$ Z = \left[\begin{array}{rrr} 1 & -2 & 3 \end{array}\right] \;\;\;,\;\;\; Z = [1, -2, 3] \;\;\;,\;\;\; Z = (1, -2, 3) $$

Problem 1.1 (Exam Style)¶

Write the numerical values of the following arrays:

The array $F$ resulting from the 'full' correlation of $I$ and $H$.
The array $V$ resulting from the 'valid' correlation of $I$ and $H$.
The array $S$ resulting from the 'same' correlation of $I$ and $H$.

Hints¶

You obtain the same result regardless of whether you consider these to be correlations in one dimension or in two.
Once you have $F$, little extra effort is needed to find $V$ and $S$.

Problem 1.2 (Exam Style)¶

Write the numerical values of the following arrays:

The array $F$ resulting from the 'full' convolution of $I$ and $H$.
The array $V$ resulting from the 'valid' convolution of $I$ and $H$.
The array $S$ resulting from the 'same' convolution of $I$ and $H$.

Problem 1.3 (Exam Style)¶

Compute the 'same' convolution $C$ of

$$ A = \left[\begin{array}{rrrrr} 1 & 2 & 3 & 0 & 2\\ 4 & 0 & 5 & 1 & 3 \\ 2 & 3 & 1 & 5 & 0 \end{array}\right] $$

with

$$ B = \left[\begin{array}{rr} 1 & 3 & 0\\ 2 & 0 & 4\\ 0 & 1 & 2 \end{array}\right] \;. $$

'Same' here means that $A$ and $C$ have the same shape.

Part 2: Normalized Cross-Correlation¶

The following code downloads, reads, and displays a few black-and-white images from the class web site.

In [1]:

import urllib.request
from os import path as osp


def retrieve(file_name, semester='spring23', homework=1):
    if osp.exists(file_name):
        print('Using previously downloaded file {}'.format(file_name))
    else:
        fmt = 'https://www2.cs.duke.edu/courses/{}/compsci527/homework/{}/{}'
        url = fmt.format(semester, homework, file_name)
        urllib.request.urlretrieve(url, file_name)
        print('Downloaded file {}'.format(file_name))

In [2]:

for name in ('coins', 'nickel', 'dime', 'quarter'):
    retrieve(name + '.png')

Downloaded file coins.png
Downloaded file nickel.png
Downloaded file dime.png
Downloaded file quarter.png

In [3]:

from matplotlib import pyplot as plt
%matplotlib inline
import skimage.io as io

coins = io.imread('coins.png')

In [4]:

plt.figure(figsize=(10, 6.5), tight_layout=True)
plt.imshow(coins, cmap=plt.cm.gray)
plt.axis('off')
plt.show()

In [5]:

coin_info = {
    'nickel': {'color': 'yellow', 'value': 5},
    'dime': {'color': 'orange', 'value': 10},
    'quarter': {'color': 'red', 'value': 25},
}

In [6]:

plt.figure(figsize=(10, 3.8), tight_layout=True)
for plot, name in enumerate(coin_info.keys()):
    template = io.imread(name + '.png')
    plt.subplot(1, 3, plot + 1)
    plt.imshow(template, cmap=plt.cm.gray)
    plt.axis('off')
    plt.title(name)
plt.show()

All images are black-and-white and are stored as numpy arrays with values of type numpy.uint8 (unsigned bytes).

All four images were taken with the same camera and settings, including the distance between the camera and the table that supported the coins. The camera looks straight down, perpendicularly to the table surface.

All four images were blurred to removes irrelevant detail. This blurring ends up making the task in this assignment easier.

The dictionary coin_info defined above associates a value and a color to each of the three types of coin (nickels, dimes, and quarters). The meaning of the value is obvious. The color will be used in some of the plots below.

The task we explore in this part is to detect and count nickels, dimes, and quarters by matching the template images with these names with windows in the bigger coins image. Your code will eventually report the total money amount in dollars and cents. The goal of this set of problems is to explore template matching and also get accustomed to working with images in Python.

We will perform the task in steps, to make sure we are all on the same page in terms of the overall strategy.

Problem 2.1 (Exam Style)¶

Give at least two reasons why this template matching task, with the images given above, is easier than the denticle-matching task discussed in class and in the notes.

Problem 2.2¶

The main workhorses of template matching for this assignment are the two functions match_template and peak_local_max imported from skimage.feature, a module of the scikit-image Python package.

Read the manual pages for these two functions online. After our discussion of template matching, it should be quite obvious what these functions do.

Once you understand how these functions work, write code that does the following for each template (nickel, dime, quarter):

Read the template from the appropriate .png file, similarly to how coins was read from coins.png above.
Compute a normalized cross-correlation array ncc between coins and the template using match_template.
Find local maxima in ncc by using peak_local_max. This yields an $n\times 2$ numpy array of integer (row, column) locations of the maxima.
Of the local maxima found, only retain those with a cross-correlation value greater than $1/2$.
Use the function show_ncx_maxima provided below to display the ncc array and the maxima. The first argument to show_ncx_maxima is ncc and the second is the array of locations returned by peak_local_max. The function show_ncx_maxima overlays red dots at the locations of the local maxima. The argument title is bound to the name of the coin (nickel, dime, or quarter), which the function uses as the title.

Show code and outputs.

In [7]:

import numpy as np
from matplotlib import pyplot as plt
from matplotlib.colors import ListedColormap
%matplotlib inline

In [8]:

def show_ncx_maxima(ncx, max_points, title):
    plt.imshow(ncx)
    plt.plot(max_points[:, 1], max_points[:, 0], '.r')
    plt.axis('off')
    plt.title(title)

Programming Notes¶

Organize your code whichever way you like, but please use modular programming, that is, break up your code into meaningful units. If something needs to be done more than once, that should be a for loop or a sequence of function calls. In other words, do not program by cutting and pasting. If you find yourself repeating code in later problems, come back to this problem and wrap the relevant code into a function, which you can call later without rewriting code.
Use an IDE (such as PyCharm) for programming and debugging. Once everything works, cut and paste your code into several small cells in your notebook. We will not help you debug your code if all you have is Jupyter notebooks. This is because staring at code is a very bad debugging strategy.
Store your three normalized cross-correlation arrays, as you will need them in later problems.
Only change the value of pad_input to True in match_template, in order to ensure that the output image has the same size as the input image. Leave all other keyword arguments to their default values.
Set min_distance in peak_local_max to a single, appropriate value to be used for all templates. This value depends on the sizes of the coins (in pixels), and it is OK to hard-wire this number into your code. All three templates have 149 rows and 149 columns. A good choice for min_distance will simplify the rest of this task. You should be able to find a good value by reasoning, but trial-and-error is fine as well. The value of min_distance is not critical, in that something a bit bigger or a bit smaller than optimal will work as well. However, very poor choices will lead to trouble. It is not necessary to justify your choice.
Leave all other keyword arguments of peak_local_max to their default values.
Your three figures should be in a plt.subplot(1, 3, k) (where plt is matplotlib.pyplot). The function show_ncx_maxima does not create a figure or any subplots, but merely draws it into the current axes. It is your task to create figure and subplots, and then call plt.show().

Problem 2.3¶

Since the templates for the three coin denominations are very similar, a point that is a local maximum for, say, the nickels NCC array is also likely to be a local maximum for the NCC arrays for other types of coins. The objective of this problem is to sort out which maximum to associate to which type of coin.

It may be tempting to first compute the values of the local maxima in the three images and then reason about both locations and values of the local maxima for the three coin types to sort this out.

However, this approach is tricky to implement, because the location of a local maximum for, say, nickels may be very close to one for, say, dimes, but not quite the same. Reasoning about these approximate equalities is not straightforward.

A better approach is to take the pixelwise maximum of the three NCC arrays. Let us call this the ncc_max array. Each pixel in ncc_max is the maximum of the three corresponding pixels in the initial NCC arrays. We also make an array ncc_arg that for each entry of ncc_max remembers which of the three types of coin the corresponding maximum value comes from.

Write code that computes ncc_max and ncc_arg and then uses peak_local_max to compute the local maxima with the same criteria you used in the previous problem. Use the integer values 0, 1, 2 to correspond to nickel, dime, and quarter in ncc_arg.

Finally, use the function show_overall_maxima provided below to display these arrays. The first two arguments correspond to ncc_max and ncc_arg, the third is the array of peak locations, and the info argument is bound to the coin_info dictionary defined in the preamble to this Part of the assignment.

Show code and outputs.

Programming Notes¶

ncc_max is an array of float values, while ncc_arg is an array of integers. This happens automatically if you use numpy.max and numpy.argmax on a properly constructed three-dimensional array that contains the three starting NCC arrays. The function numpy.stack can be used to create this array.
In problem 2.2 you determined the locations of the local maxima, but not their values. Depending on how you organized your code, it may be easiest to modify your code in problem 2.2 to include the computation of the values of the normalized cross-correlation, and then use those values here. If this makes no sense to you, it may be because yor code organization is different. Feel free to ignore this bullet in that case.
The function show_overall_maxima makes the figure and subplots, and calls plt.show(), so all you need to do to display the figure is to call the function with appropriate arguments.

In [9]:

def show_max_args(args, colors, title):
    ax = plt.gca()
    colormap = ListedColormap(colors)
    ax.pcolor(args, cmap=colormap)
    plt.axis('image')
    plt.axis('off')
    ax.invert_yaxis()
    plt.title(title)

In [10]:

def show_overall_maxima(ncx_max, ncx_arg, max_points, info):
    plt.figure(figsize=(10, 3.5), tight_layout=True)
    title = 'maximum correlation over coin types'
    plt.subplot(1, 2, 1)
    show_ncx_maxima(ncx_max, max_points, title)
    plt.subplot(1, 2, 2)
    title = '; '.join(['{}: {}'.format(i['color'], n)
                       for n, i in info.items()])
    colors = [i['color'] for i in info.values()]
    show_max_args(ncx_arg, colors, title)
    plt.show()

Problem 2.4¶

Write code that displays the original image coins and overlays a string nickel, dime, or quarter on each coin detected by your method. Each string is lowercase, written in the color specified in coin_info, in bold font, and centered on top of the corresponding coin.

Then write code that prints out the total amount of money in the image in dollars and cents, as well as the number of each type of coins.

Show code and outputs.

Programming Notes¶

Display the image with plt.imshow(coins, cmap=plt.cm.gray). This will also make the y axis of the figure point downwards for subsequent plots.
The locations of the local NCC maxima, if computed correctly, will be at (or close enough to) the center of each coin.
Use the function plt.text to place the strings at those locations. The string is centered at the specified location if you use the keyword arguments ha='center' and va='center' for plt.text.
Color is specified by color=c where c is one of the color strings in coin_info.
The string is in bold font if you specify weight='bold'.
peak_local_max returns an array with its two columns corresponding to image row and column coordinates respectively. In contrast, plt.text takes x, y arguments where x varies horizontally and y vertically, so you need to bind the column to x and the row to y.
Turn off the axis labels with plt.axis('off').

Part 3: Smoothing and Differentiation¶

The following code retrieves two Python files that contain some basic utility routines to make convolution kernels and plot one-dimensional signals. Feel free to peruse those files to understand what they do.

In [11]:

retrieve('kernels.py')
retrieve('plot.py')

Downloaded file kernels.py
Downloaded file plot.py

A 1D Gaussian Kernel¶

The following code uses the code in kernels.py and plot.py to make a discrete one-dimensional Gaussian kernel. The kernel is properly normalized, as discussed in the class notes, and can be used to smooth an image thanks to the separability of a 2D Gaussian kernel.

In [12]:

from kernels import reals, make_samples, discrete_gaussian_kernel
from plot import plot_signal

In [13]:

sigma = 2.5
g_kernel, g_grid = discrete_gaussian_kernel(sigma)

Now let us plot the discrete kernel g_kernel using the plot_signal function from plot.py.

In [14]:

title = 'truncated, sampled, normalized Gaussian kernel'
plot_signal(g_kernel, g_grid, title=title)

A Sample Image¶

For this Part of the assignment we will also need an image. Consider the following function of the two real-valued variables $x$ and $y$:

$$ f(x, y) = \frac{1}{1 + e^{-x}} \;. $$

This is called the logistic function in the variable $x$. Since $y$ does not appear in the right-hand side, the function $f(x, y)$ is constant with respect to $y$.

Here is a plot of the logistic function itself.

In [15]:

def logistic(x):
    return 1. / (1. + np.exp(-x))

We can use plot_signal also to plot a continuous function. Of course, nothing is continuous on a computer, so we first use make_samples from kernels.py to obtain a dense sampling x out of the reals between -5 and 5 and evaluate logistic on those samples. The values are in z. The function plot_signal knows what to do with a "continuous" signal if you set its optional argument discrete to False.

In [16]:

z, x = make_samples(logistic, -5., 5., reals)

In [17]:

plot_signal(z, x, discrete=False, title='the logistic function')

The function $f(x, y)$ can then be displayed as an image, as done below. Specifically, the vector z above has 101 samples of the logistic. If we use 50 pixels in the row direction ($r$, increasing vertically down), we obtain a $50\times 101$ image with floating-point pixel values:

In [18]:

sigmoid = z.reshape((1, -1)).repeat(50, axis=0)

In [19]:

plt.figure(figsize=(10, 5), tight_layout=True)
plt.imshow(sigmoid, cmap=plt.cm.gray)
plt.axis('off')
plt.show()
print('This image has {} rows and {} columns'.format(sigmoid.shape[0], sigmoid.shape[1]))

This image has 50 rows and 101 columns

The row index $r$ in the discrete image corresponds to the variable $y$ (pointing downwards) of $f$ and the column index $c$ corresponds to $x$ (pointing to the right). Thus, if the image sigmoid is $S$, we have

$$ S(r, c) = f\left(\frac{c}{10}, r\right) $$

where the number 10 comes from how densely the logistic was sampled to construct the image (see the reals function in kernels.py). Note the switch in the order of arguments between $f$ and $S$.

Problem 3.1¶

Write a function with header

convolve2d(image, v, h)

which takes a 2D numpy array image and two 1D numpy vectors v and h and uses the function scipy.signal.convolve2d to convolve image vertically with v and horizontally with h.

Then convolve the image sigmoid with both v and h set to a Gaussian kernel with a (rather extreme value of) sigma = 10. This will create a smooth version of the image. Let us call that smooth.

Use plot_signal to show a single plot with sigmoid and smooth on it. This plot can be obtained with the following calls (these are available, commented out, in your homework template file):

middle_row = sigmoid.shape[0] // 2
plot_signal(np.stack((sigmoid[middle_row], smooth[middle_row]), axis=0), x,
            discrete=False, label=('sigmoid', 'smooth'))

Programming Notes¶

Important: To get credit for this problem you must exploit separability. If you make a 2D kernel and convolve the image with it you get the same result but much less efficiently, and you get no credit.
Use mode='same', boundary='symm' in scipy.signal.convolve2d. The first option makes sure that the output has the same shape as the input. The second pads the image with values that come from the input image itself, so things degrade more gracefully near the image boundaries.
You need to make a new kernel g_kernel and corresponding sampling grid g_grid with the new sigma using discrete_gaussian_kernel, as done above for the smaller value of sigma.
Since the image is 2D, scipy.signal.convolve2d reqires the kernel to be 2D as well. So your function must convert the input 1D vectors into 2D arrays of shape $(1, n)$ or $(n, 1)$ as appropriate.

Problem 3.2 (Exam Style)¶

If you did things correctly, the smooth signal should be a bit shallower than the input, that is, it varies a bit more slowly. (With a smaller sigma, this effect would be harder to notice from the plots.)

Explain briefly and in intuitive terms why the smooth signal is shallower than the input.

Hint¶

You may want to think separately of what happens on convex and on concave parts of the function, and remember that a smoothing filter averages values in a neighborhood of each pixel.

Problem 3.3 (Exam Style)¶

Give an algebraic expression for the (exact) gradient $\nabla f(x, y)$ of $f(x, y)$, where the function $f$ was defined above.

Problem 3.4¶

If the Python function discrete_gaussian_kernel imported from kernels is given the optional argument derivative=True, then instead of returning a Gaussian kernel and its sampling grid it returns the kernel for the derivative of a Gaussian, properly normalized, and its sampling grid:

In [20]:

sigma = 2.5
dg_kernel, dg_grid = discrete_gaussian_kernel(sigma, derivative=True)

In [21]:

title = 'truncated, sampled, normalized derivative of a Gaussian'
plot_signal(dg_kernel, dg_grid, title=title)

Write a function with header

def gradient(image, sigma):

that uses discrete_gaussian_kernel to make suitable kernels and then uses your function convolve2d to compute the gradient of the image.

Then

Compute the gradient of sigmoid with sigma=2.5.
Use plot_signal to plot the two components of the gradient along the middle row of the image in a single plot.
Use plot_signal to plot both your expression for the derivative of the logistic and the horizontal component of the gradient computed by gradient along the middle row of the image. See "Comparing Gradients" below on how to scale the two curves for a fair comparison.

Comparing Gradients¶

Derivatives, including gradients, have units. For instance, speeds (derivatives of distance with respect to time) can be measured in kilometers per hour or in meters per second. A speed of 36 km/h is exactly the same as a speed of 10 m/s, even if 36 and 10 are different numbers.

When you computed the derivative of the logistic function $f(x)$, you used the units of ordinate change per unit of $x$. On the other hand, the differentiating kernel returned by discrete_gaussian_kernel with derivative=True is normalized so as to compute derivatives in ordinate change (or image brightness change) per pixel.

Thus, to compare the derivatives in the array grad returned by your function grad with the derivatives returned by your formula for $f'(x)$ you need to multiply grad by the number of pixels per unit change in $x$. In this experiment, this number is 10.

Programming Notes¶

Again, to get credit for this problem you must exploit separability.
The output of gradient should be a numpy array grad of shape (2, rows, columns) where rows and columns are the height and width of the image. Use numpy.stack to make this array.
grad[0] should be the derivative in the vertical image direction, with positive derivatives for images that increase in value top-down.
grad[1] should be the derivative in the horizontal image direction, with positive derivatives for images that increase in value left-to-right.
Use plot_signal with options discrete=False, label=('vertical', 'horizontal') to plot the two components of the gradient. The second argument to plot_signal is the same x defined earlier.
If your Python function that computes the derivative of the logistic is called logistic_derivative, you can make the samples for plotting that function with the command
```
  dy, _ = make_samples(logistic_derivative, -5., 5., reals)
```
You don't need the second output from make_samples (the sampling grid), because that was already computed above as x.
Use plot_signals with discrete=False, label=('analytical', 'convolution') for the comparison of your formula and gradient. Keep in mind the need to normalize the output from gradient appropriately, as discussed above. The second argument to plot_signals is still x.
You may use np.stack to create the first argument to plot_signal, which is an array of shape (2, columns).

Problem 3.5 (Exam Style)¶

A simple and popular way to estimate, say, the horizontal derivative of an image (that is, the component of the gradient in the direction of the column index, which varies along the rows) is to use what is called a Sobel filter as the convolution kernel. This kernel is defined as follows:

$$ H = \frac{1}{8} \left[\begin{array}{ccc} 1 & 0 & -1 \\ 2 & 0 & -2 \\ 1 & 0 & -1 \end{array}\right] $$

As you see, it averages vertically for a modicum of noise suppression and differentiates horizontally (remember that kernels must be flipped for convolution).

The vertical derivative of the image can be estimated with the transpose of this array as the kernel.

Is the Sobel filter separable? If yes, show how. If not, explain why not.

Problem 3.6¶

Here is a noisy version of sigmoid:

In [22]:

noise_sigma = 0.01
noisy_sigmoid = sigmoid + noise_sigma * np.random.randn(*sigmoid.shape)
middle_row = noisy_sigmoid.shape[0] // 2

In [23]:

plot_signal(noisy_sigmoid[middle_row, :], x, discrete=False)

Make a function with header

def sobel(image):

with the same behavior as gradient except that sobel uses Sobel kernels (which do not require specifying a sigma).

Then use plot_signal to compare the behavior of their horizontal components on the middle row of noisy_sigmoid. The function gradient should use sigma=2.5.

Use options discrete=False, label=('Gauss', 'Sobel') in plot_signal for a proper legend.