When you submit your work, follow the instructions on the submission workflow page scrupulously for full credit.
Instead of handing out sample exams in preparation for the midterm and final, homework assignments include several problems that are marked Exam Style . These are problems whose format and contents are similar to those in the exams, and are graded like the other problems. Please use these problems as examples of what may come up in an exam.
As a further exercise in preparation for exams, you may want to think of other plausible problems with a similar style. If you collect these problems in a separate notebook as the course progresses, you will have a useful sample you can use to rehearse for the exams.
Is this assignment properly formatted in the PDF submission, and did you tell Gradescope accurately where each answer is?
This question requires no explicit answer from you, and no space is provided for an answer in the template file.
However, the rubric entry for this problem in Gradescope will be used to decuct points for poor formatting of the assignment (PDF is not readable, code or text is clipped, math is not typeset, ...) and for incorrect indications in Gradescope for the location of each answer.
Sorry we have to do this, but poor formatting and bad answer links in Gradescope make our grading much harder.
Let
$$ I = \left[\begin{array}{rrrrr} 1 & 0 & -3 & 2 & 1 & 4 \end{array}\right] \;\;\;\text{,}\;\;\; H = \left[\begin{array}{ccc} -2 & 1 & 3 \end{array}\right] $$be an "image" and a kernel respectively.
In the problems that follow, pay attention to the difference between convolution and correlation. When padding is needed, pad with zeros distributed symmetrically around $I$. Do all calculations by hand in this part.
It may be tempting to use software for these calculations. This choice has two issues:
The calculations may be tedious, but the point is to get you to do enough of them so that they sink in permanently. If you do this, you won't have any residual doubts as to what needs to be done.
After completing the calculations by hand, you may check the results with software and correct them if needed. Do not hand in your checks.
To render
$$ X = \left[\begin{array}{rrr} 1 & -2 & 3 \\ -4 & 5 & -6 \end{array}\right] $$in math mode (that is, between single or double dollar signs), you write
X = \left[\begin{array}{rrr}
1 & -2 & 3 \\
-4 & 5 & -6
\end{array}\right]
When matrices are row vectors, you can choose this same format, or just use brackets (or parentheses) and commas. That is, all of the following formats are acceptable in this Part of the assignment:
$$ Z = \left[\begin{array}{rrr} 1 & -2 & 3 \end{array}\right] \;\;\;,\;\;\; Z = [1, -2, 3] \;\;\;,\;\;\; Z = (1, -2, 3) $$Write the numerical values of the following arrays:
Write the numerical values of the following arrays:
Compute the 'same' convolution $C$ of
$$ A = \left[\begin{array}{rrrrr} 1 & 2 & 3 & 0 & 2\\ 4 & 0 & 5 & 1 & 3 \\ 2 & 3 & 1 & 5 & 0 \end{array}\right] $$with
$$ B = \left[\begin{array}{rr} 1 & 3 & 0\\ 2 & 0 & 4\\ 0 & 1 & 2 \end{array}\right] \;. $$'Same' here means that $A$ and $C$ have the same shape.
The following code downloads, reads, and displays a few black-and-white images from the class web site.
import urllib.request
from os import path as osp
def retrieve(file_name, semester='spring23', homework=1):
if osp.exists(file_name):
print('Using previously downloaded file {}'.format(file_name))
else:
fmt = 'https://www2.cs.duke.edu/courses/{}/compsci527/homework/{}/{}'
url = fmt.format(semester, homework, file_name)
urllib.request.urlretrieve(url, file_name)
print('Downloaded file {}'.format(file_name))
for name in ('coins', 'nickel', 'dime', 'quarter'):
retrieve(name + '.png')
Downloaded file coins.png Downloaded file nickel.png Downloaded file dime.png Downloaded file quarter.png
from matplotlib import pyplot as plt
%matplotlib inline
import skimage.io as io
coins = io.imread('coins.png')
plt.figure(figsize=(10, 6.5), tight_layout=True)
plt.imshow(coins, cmap=plt.cm.gray)
plt.axis('off')
plt.show()
coin_info = {
'nickel': {'color': 'yellow', 'value': 5},
'dime': {'color': 'orange', 'value': 10},
'quarter': {'color': 'red', 'value': 25},
}
plt.figure(figsize=(10, 3.8), tight_layout=True)
for plot, name in enumerate(coin_info.keys()):
template = io.imread(name + '.png')
plt.subplot(1, 3, plot + 1)
plt.imshow(template, cmap=plt.cm.gray)
plt.axis('off')
plt.title(name)
plt.show()
All images are black-and-white and are stored as numpy
arrays with values of type numpy.uint8
(unsigned bytes).
All four images were taken with the same camera and settings, including the distance between the camera and the table that supported the coins. The camera looks straight down, perpendicularly to the table surface.
All four images were blurred to removes irrelevant detail. This blurring ends up making the task in this assignment easier.
The dictionary coin_info
defined above associates a value and a color to each of the three types of coin (nickels, dimes, and quarters). The meaning of the value
is obvious. The color
will be used in some of the plots below.
The task we explore in this part is to detect and count nickels, dimes, and quarters by matching the template images with these names with windows in the bigger coins
image. Your code will eventually report the total money amount in dollars and cents. The goal of this set of problems is to explore template matching and also get accustomed to working with images in Python.
We will perform the task in steps, to make sure we are all on the same page in terms of the overall strategy.
Give at least two reasons why this template matching task, with the images given above, is easier than the denticle-matching task discussed in class and in the notes.
The main workhorses of template matching for this assignment are the two functions match_template
and peak_local_max
imported from skimage.feature
, a module of the scikit-image Python package.
Read the manual pages for these two functions online. After our discussion of template matching, it should be quite obvious what these functions do.
Once you understand how these functions work, write code that does the following for each template (nickel, dime, quarter):
.png
file, similarly to how coins
was read from coins.png
above.ncc
between coins
and the template using match_template
.ncc
by using peak_local_max
. This yields an $n\times 2$ numpy
array of integer (row, column) locations of the maxima.show_ncx_maxima
provided below to display the ncc
array and the maxima. The first argument to show_ncx_maxima
is ncc
and the second is the array of locations returned by peak_local_max
. The function show_ncx_maxima
overlays red dots at the locations of the local maxima. The argument title
is bound to the name of the coin (nickel
, dime
, or quarter
), which the function uses as the title.Show code and outputs.
import numpy as np
from matplotlib import pyplot as plt
from matplotlib.colors import ListedColormap
%matplotlib inline
def show_ncx_maxima(ncx, max_points, title):
plt.imshow(ncx)
plt.plot(max_points[:, 1], max_points[:, 0], '.r')
plt.axis('off')
plt.title(title)
for
loop or a sequence of function calls. In other words, do not program by cutting and pasting. If you find yourself repeating code in later problems, come back to this problem and wrap the relevant code into a function, which you can call later without rewriting code.pad_input
to True
in match_template
, in order to ensure that the output image has the same size as the input image. Leave all other keyword arguments to their default values.min_distance
in peak_local_max
to a single, appropriate value to be used for all templates. This value depends on the sizes of the coins (in pixels), and it is OK to hard-wire this number into your code. All three templates have 149 rows and 149 columns. A good choice for min_distance
will simplify the rest of this task. You should be able to find a good value by reasoning, but trial-and-error is fine as well. The value of min_distance
is not critical, in that something a bit bigger or a bit smaller than optimal will work as well. However, very poor choices will lead to trouble. It is not necessary to justify your choice.peak_local_max
to their default values.plt.subplot(1, 3, k)
(where plt
is matplotlib.pyplot
). The function show_ncx_maxima
does not create a figure or any subplots, but merely draws it into the current axes. It is your task to create figure and subplots, and then call plt.show()
.Since the templates for the three coin denominations are very similar, a point that is a local maximum for, say, the nickels NCC array is also likely to be a local maximum for the NCC arrays for other types of coins. The objective of this problem is to sort out which maximum to associate to which type of coin.
It may be tempting to first compute the values of the local maxima in the three images and then reason about both locations and values of the local maxima for the three coin types to sort this out.
However, this approach is tricky to implement, because the location of a local maximum for, say, nickels may be very close to one for, say, dimes, but not quite the same. Reasoning about these approximate equalities is not straightforward.
A better approach is to take the pixelwise maximum of the three NCC arrays. Let us call this the ncc_max
array. Each pixel in ncc_max
is the maximum of the three corresponding pixels in the initial NCC arrays. We also make an array ncc_arg
that for each entry of ncc_max
remembers which of the three types of coin the corresponding maximum value comes from.
Write code that computes ncc_max
and ncc_arg
and then uses peak_local_max
to compute the local maxima with the same criteria you used in the previous problem. Use the integer values 0, 1, 2 to correspond to nickel
, dime
, and quarter
in ncc_arg
.
Finally, use the function show_overall_maxima
provided below to display these arrays. The first two arguments correspond to ncc_max
and ncc_arg
, the third is the array of peak locations, and the info
argument is bound to the coin_info
dictionary defined in the preamble to this Part of the assignment.
Show code and outputs.
ncc_max
is an array of float
values, while ncc_arg
is an array of integers. This happens automatically if you use numpy.max
and numpy.argmax
on a properly constructed three-dimensional array that contains the three starting NCC arrays. The function numpy.stack
can be used to create this array.show_overall_maxima
makes the figure and subplots, and calls plt.show()
, so all you need to do to display the figure is to call the function with appropriate arguments.def show_max_args(args, colors, title):
ax = plt.gca()
colormap = ListedColormap(colors)
ax.pcolor(args, cmap=colormap)
plt.axis('image')
plt.axis('off')
ax.invert_yaxis()
plt.title(title)
def show_overall_maxima(ncx_max, ncx_arg, max_points, info):
plt.figure(figsize=(10, 3.5), tight_layout=True)
title = 'maximum correlation over coin types'
plt.subplot(1, 2, 1)
show_ncx_maxima(ncx_max, max_points, title)
plt.subplot(1, 2, 2)
title = '; '.join(['{}: {}'.format(i['color'], n)
for n, i in info.items()])
colors = [i['color'] for i in info.values()]
show_max_args(ncx_arg, colors, title)
plt.show()
Write code that displays the original image coins
and overlays a string nickel
, dime
, or quarter
on each coin detected by your method. Each string is lowercase, written in the color specified in coin_info
, in bold font, and centered on top of the corresponding coin.
Then write code that prints out the total amount of money in the image in dollars and cents, as well as the number of each type of coins.
Show code and outputs.
plt.imshow(coins, cmap=plt.cm.gray)
. This will also make the y axis of the figure point downwards for subsequent plots.plt.text
to place the strings at those locations. The string is centered at the specified location if you use the keyword arguments ha='center'
and va='center'
for plt.text
.color=c
where c
is one of the color strings in coin_info
.weight='bold'
.peak_local_max
returns an array with its two columns corresponding to image row and column coordinates respectively. In contrast, plt.text
takes x, y arguments where x varies horizontally and y vertically, so you need to bind the column to x and the row to y.plt.axis('off')
.The following code retrieves two Python files that contain some basic utility routines to make convolution kernels and plot one-dimensional signals. Feel free to peruse those files to understand what they do.
retrieve('kernels.py')
retrieve('plot.py')
Downloaded file kernels.py Downloaded file plot.py
The following code uses the code in kernels.py
and plot.py
to make a discrete one-dimensional Gaussian kernel. The kernel is properly normalized, as discussed in the class notes, and can be used to smooth an image thanks to the separability of a 2D Gaussian kernel.
from kernels import reals, make_samples, discrete_gaussian_kernel
from plot import plot_signal
sigma = 2.5
g_kernel, g_grid = discrete_gaussian_kernel(sigma)
Now let us plot the discrete kernel g_kernel
using the plot_signal
function from plot.py
.
title = 'truncated, sampled, normalized Gaussian kernel'
plot_signal(g_kernel, g_grid, title=title)
For this Part of the assignment we will also need an image. Consider the following function of the two real-valued variables $x$ and $y$:
$$ f(x, y) = \frac{1}{1 + e^{-x}} \;. $$This is called the logistic function in the variable $x$. Since $y$ does not appear in the right-hand side, the function $f(x, y)$ is constant with respect to $y$.
Here is a plot of the logistic function itself.
def logistic(x):
return 1. / (1. + np.exp(-x))
We can use plot_signal
also to plot a continuous function. Of course, nothing is continuous on a computer, so we first use make_samples
from kernels.py
to obtain a dense sampling x
out of the reals between -5 and 5 and evaluate logistic
on those samples. The values are in z
. The function plot_signal
knows what to do with a "continuous" signal if you set its optional argument discrete
to False
.
z, x = make_samples(logistic, -5., 5., reals)
plot_signal(z, x, discrete=False, title='the logistic function')
The function $f(x, y)$ can then be displayed as an image, as done below. Specifically, the vector z
above has 101 samples of the logistic. If we use 50 pixels in the row direction ($r$, increasing vertically down), we obtain a $50\times 101$ image with floating-point pixel values:
sigmoid = z.reshape((1, -1)).repeat(50, axis=0)
plt.figure(figsize=(10, 5), tight_layout=True)
plt.imshow(sigmoid, cmap=plt.cm.gray)
plt.axis('off')
plt.show()
print('This image has {} rows and {} columns'.format(sigmoid.shape[0], sigmoid.shape[1]))
This image has 50 rows and 101 columns
The row index $r$ in the discrete image corresponds to the variable $y$ (pointing downwards) of $f$ and the column index $c$ corresponds to $x$ (pointing to the right). Thus,
if the image sigmoid
is $S$, we have
where the number 10 comes from how densely the logistic was sampled to construct the image (see the reals
function in kernels.py
). Note the switch in the order of arguments between $f$ and $S$.
Write a function with header
convolve2d(image, v, h)
which takes a 2D numpy
array image
and two 1D numpy
vectors v
and h
and uses the function scipy.signal.convolve2d
to convolve image
vertically with v
and horizontally with h
.
Then convolve the image sigmoid
with both v
and h
set to a Gaussian kernel with a (rather extreme value of) sigma = 10
. This will create a smooth version of the image. Let us call that smooth
.
Use plot_signal
to show a single plot with sigmoid
and smooth
on it. This plot can be obtained with the following calls (these are available, commented out, in your homework template file):
middle_row = sigmoid.shape[0] // 2
plot_signal(np.stack((sigmoid[middle_row], smooth[middle_row]), axis=0), x,
discrete=False, label=('sigmoid', 'smooth'))
mode='same', boundary='symm'
in scipy.signal.convolve2d
. The first option makes sure that the output has the same shape as the input. The second pads the image with values that come from the input image itself, so things degrade more gracefully near the image boundaries.g_kernel
and corresponding sampling grid g_grid
with the new sigma
using discrete_gaussian_kernel
, as done above for the smaller value of sigma
.scipy.signal.convolve2d
reqires the kernel to be 2D as well. So your function must convert the input 1D vectors into 2D arrays of shape $(1, n)$ or $(n, 1)$ as appropriate.If you did things correctly, the smooth signal should be a bit shallower than the input, that is, it varies a bit more slowly. (With a smaller sigma
, this effect would be harder to notice from the plots.)
Explain briefly and in intuitive terms why the smooth signal is shallower than the input.
You may want to think separately of what happens on convex and on concave parts of the function, and remember that a smoothing filter averages values in a neighborhood of each pixel.
Give an algebraic expression for the (exact) gradient $\nabla f(x, y)$ of $f(x, y)$, where the function $f$ was defined above.
If the Python function discrete_gaussian_kernel
imported from kernels
is given the optional argument derivative=True
, then instead of returning a Gaussian kernel and its sampling grid it returns the kernel for the derivative of a Gaussian, properly normalized, and its sampling grid:
sigma = 2.5
dg_kernel, dg_grid = discrete_gaussian_kernel(sigma, derivative=True)
title = 'truncated, sampled, normalized derivative of a Gaussian'
plot_signal(dg_kernel, dg_grid, title=title)
Write a function with header
def gradient(image, sigma):
that uses discrete_gaussian_kernel
to make suitable kernels and then uses your function convolve2d
to compute the gradient of the image.
Then
sigmoid
with sigma=2.5
.plot_signal
to plot the two components of the gradient along the middle row of the image in a single plot.plot_signal
to plot both your expression for the derivative of the logistic and the horizontal component of the gradient computed by gradient
along the middle row of the image. See "Comparing Gradients" below on how to scale the two curves for a fair comparison.Derivatives, including gradients, have units. For instance, speeds (derivatives of distance with respect to time) can be measured in kilometers per hour or in meters per second. A speed of 36 km/h is exactly the same as a speed of 10 m/s, even if 36 and 10 are different numbers.
When you computed the derivative of the logistic function $f(x)$, you used the units of ordinate change per unit of $x$. On the other hand, the differentiating kernel returned by discrete_gaussian_kernel
with derivative=True
is normalized so as to compute derivatives in ordinate change (or image brightness change) per pixel.
Thus, to compare the derivatives in the array grad
returned by your function grad
with the derivatives returned by your formula for $f'(x)$ you need to multiply grad
by the number of pixels per unit change in $x$. In this experiment, this number is 10.
Again, to get credit for this problem you must exploit separability.
The output of gradient
should be a numpy
array grad
of shape (2, rows, columns)
where rows
and columns
are the height and width of the image
. Use numpy.stack
to make this array.
grad[0]
should be the derivative in the vertical image direction, with positive derivatives for images that increase in value top-down.
grad[1]
should be the derivative in the horizontal image direction, with positive derivatives for images that increase in value left-to-right.
Use plot_signal
with options discrete=False, label=('vertical', 'horizontal')
to plot the two components of the gradient. The second argument to plot_signal
is the same x
defined earlier.
If your Python function that computes the derivative of the logistic is called logistic_derivative
, you can make the samples for plotting that function with the command
dy, _ = make_samples(logistic_derivative, -5., 5., reals)
You don't need the second output from make_samples
(the sampling grid), because that was already computed above as x
.
Use plot_signals
with discrete=False, label=('analytical', 'convolution')
for the comparison of your formula and gradient
. Keep in mind the need to normalize the output from gradient
appropriately, as discussed above. The second argument to plot_signals
is still x
.
You may use np.stack
to create the first argument to plot_signal
, which is an array of shape (2, columns)
.
A simple and popular way to estimate, say, the horizontal derivative of an image (that is, the component of the gradient in the direction of the column index, which varies along the rows) is to use what is called a Sobel filter as the convolution kernel. This kernel is defined as follows:
$$ H = \frac{1}{8} \left[\begin{array}{ccc} 1 & 0 & -1 \\ 2 & 0 & -2 \\ 1 & 0 & -1 \end{array}\right] $$As you see, it averages vertically for a modicum of noise suppression and differentiates horizontally (remember that kernels must be flipped for convolution).
The vertical derivative of the image can be estimated with the transpose of this array as the kernel.
Is the Sobel filter separable? If yes, show how. If not, explain why not.
Here is a noisy version of sigmoid
:
noise_sigma = 0.01
noisy_sigmoid = sigmoid + noise_sigma * np.random.randn(*sigmoid.shape)
middle_row = noisy_sigmoid.shape[0] // 2
plot_signal(noisy_sigmoid[middle_row, :], x, discrete=False)
Make a function with header
def sobel(image):
with the same behavior as gradient
except that sobel
uses Sobel kernels (which do not require specifying a sigma
).
Then use plot_signal
to compare the behavior of their horizontal components on the middle row of noisy_sigmoid
. The function gradient
should use sigma=2.5
.
Use options discrete=False, label=('Gauss', 'Sobel')
in plot_signal
for a proper legend.