When you submit your work, follow the instructions on the submission workflow page scrupulously for full credit.
Instead of handing out sample exams in preparation for the midterm and final, homework assignments include several problems that are marked Exam Style . These are problems whose format and contents are similar to those in the exams, and are graded like the other problems. Please use these problems as examples of what may come up in an exam.
As a further exercise in preparation for exams, you may want to think of other plausible problems with a similar style. If you collect these problems in a separate notebook as the course progresses, you will have a useful sample you can use to rehearse for the exams.
Is this assignment properly formatted in the PDF submission, and did you tell Gradescope accurately where each answer is?
This question requires no explicit answer from you, and no space is provided for an answer in the template file.
However, the rubric entry for this problem in Gradescope will be used to decuct points for poor formatting of the assignment (PDF is not readable, code or text is clipped, math is not typeset, ...) and for incorrect indications in Gradescope for the location of each answer.
Sorry we have to do this, but poor formatting and bad answer links in Gradescope make our grading much harder.
Let four adjacent pixels in a black-and-white image have the following values
$$ \begin{eqnarray*} I[10, 20] = 60 &\phantom{wide space}& I[10, 21] = 100 \\ I[11, 20] = 80 &\phantom{wide space}& I[11, 21] = 40 \end{eqnarray*} $$In these expressions, the first index for $I$ is the row index and the second is the column index.
Give the value of $I[u, v]$ obtained by bilinear interpolation for row coordinate $u = 10.4$ and column coordinate $v = 20.8$. Show your calculations.
Write a Python function with header
def interpolate(i, r, c):
that takes an array i
of shape (rows, columns)
and two non-empty arrays r
c
of floating-point row, column coordinates respectively. The two arrays have arbitrary shapes, except that the following predicate must be true:
r.shape == c.shape`
The entries in these arrays must satisfy the following predicate:
np.all(r >= 0) and np.all(r < rows - 1) and \
np.all(c >= 0) and np.all(c < columns - 1)
so that they can be interpreted as legitimate row, column indices into i
.
The function interpolate
returns a single array j
of the same shape as r
(and therefore c
) with the values of i
obtained by bilinear interpolation at the (floating-point) positions specified by r
and c
.
Follow the programming notes below to get credit for your code.
Then print the two arrays that interpolate
returns when called with the following image img
and (i) row, column arrays r1, c1
and then (ii) row, column arrays r2, c2
.
import numpy as np
def ramp(rows, columns, r_slope, c_slope):
return np.outer(np.ones(rows), c_slope * np.arange(columns)) + \
np.outer(r_slope * np.arange(rows), np.ones(columns))
img = ramp(5, 5, 2., 3.)
r1 = np.linspace(0.2, 3.8, num=4)
c1 = np.linspace(2., 3.5, num=4)
r2 = np.array([[0.4, 3.1, 1.9], [0.0, 2.1, 3.7], [3.5, 1.0, 0.3]])
c2 = np.array([[2.0, 1.3, 3.2], [0.9, 1.4, 2.9], [1.8, 2.0, 0.7]])
numpy
for this problem. No other packages are allowed. In addition, you may not use any interpolation functions provided in numpy
. In other words, you must write your function from scratch.assert
statements that check for the conditions on r
and c
given above.i
to float
inside the function (using i.astype(float)
). It may already have this format, but it is safer to convert anyway in case it is not.r
and c
. A particularly simple technique is to first ravel
all the arrays and then reshape
the result to the desired shape. If you use this method, it is cleaner to define a tiny helper function that given two integer row, column coordinates returns the corresponding index in the raveled array. However, the technique you use is up to you.for
loops, but use numpy
array operations instead.numpy.floor
, keep in mind that numpy.floor(3.2)
is conceptually an integer but is still represented as a float
. If you use that number for indexing it must be an int
, so you should say numpy.floor(3.2).astype(int)
.Let us now use the function interpolate
developed above to carve a disk out of an image. A disk is a filled circle.
To this end, first write a function with the following header:
def disk(center, radius, num_rho, num_theta):
This function takes a numpy
vector center
with two floating-point numbers; a floating-point number radius
; and two integers num_rho
and num_theta
. The arguments center
and radius
represent a circle and center
is in row, column coordinates.
The function disk
first makes a grid $(\rho, \theta)$ of radii and angles. Each of $\rho$ and $\theta$ is an array of shape (num_rho, num_theta)
and the two arrays together represent a grid of points that cover the given circle in polar coordinates and with num_rho
equally spaced samples in the $\rho$ coordinate (varying from $0$ to radius
) and num_theta
equally spaced samples in the $\theta$ coordinate (varying from $0$ to $2\pi$).
For instance, disk((3, 2), 1.5, 7, 12)
would make two arrays that represent the nodes of the following mesh:
Note that there are 11 spokes are visible even if the last argument is 12. This is because the first value of $\theta$ is 0 and the last one is $2\pi$, so the two directions coincide. The arrays have 12 columns so that the circle is "closed."
The function disk
then converts these polar $(\rho, \theta)$ coordinates to Cartesian $(r, c)$ coordinates, with the row coordinate $r$ pointing vertically down and the column coordinate $c$ pointing horizontally to the right. The function returns these Cartesian coordinates, stored in two arrays each of the same shape as that of $\rho$ or $\theta$.
The following code retrieves a black-and-white image cafe.png
and a Python file display.py
with a few display utility functions.
import urllib.request
from os import path as osp
def retrieve(file_name, semester='spring23', homework=2):
if osp.exists(file_name):
print('Using previously downloaded file {}'.format(file_name))
else:
fmt = 'https://www2.cs.duke.edu/courses/{}/compsci527/homework/{}/{}'
url = fmt.format(semester, homework, file_name)
urllib.request.urlretrieve(url, file_name)
print('Downloaded file {}'.format(file_name))
retrieve('cafe.png')
retrieve('display.py')
Downloaded file cafe.png Downloaded file display.py
Show your code and the result of the following calls (un-commented). These calls are also included in the homework template file.
from matplotlib import pyplot as plt
from display import show_image_excerpt
%matplotlib inline
# img = plt.imread('cafe.png')
# center = [img.shape[0] // 2, img.shape[1] // 2]
# radius = img.shape[0] // 4
# r_disk, c_disk = disk(center, radius, 100, 300)
# d = interpolate(img, r_disk, c_disk)
# show_image_excerpt(img, r_disk, c_disk, d)
The code below retrieves a Python file smoothing.py
that contains some utilities to work with Gaussian pyramids. That file also contains a function match_smoothers
which is run below.
retrieve('smoothing.py')
Downloaded file smoothing.py
from smoothing import match_smoothers
img = plt.imread('cafe.png')
down_sampling_rounds, max_blurring_rounds = 3, 40
match_smoothers(img, down_sampling_rounds, max_blurring_rounds)
best match occurs after 24 rounds of blurring
Here is an explanation of how this plot is made. Feel free to inspect the code in smoothing.py
to understand any unclear part of this explanation.
Suppose that we want to down-sample an image by a factor $\phi = 1/2$, so that the resulting image has half the rows and half the columns as the original one. We will keep this factor fixed throughout this part.
For down-sampling, the down
function you studied in the class notes first blurs an image with a Gaussian to prevent aliasing to occur. The function then takes every other pixel from the blurred image in each direction, an operation called subsampling. In code (taken straight out of smoothing.py
):
def down(x):
smooth = blur(x)
return smooth[::2, ::2]
Please pay attention to the difference between "subsampling" and "down-sampling." We have
down-sampling = blurring + subsampling
Thus, small = down(img)
has a quarter of the pixels in img
(half in each dimension). Because of smoothing, small
also contains less detail than img
.
The image small
can be made to have the same size as img
by running
p_blurred_1 = up(small)
The up
operator, which you also saw in the class notes and is implemented in smoothing.py
, restores resolution (in the sense of "number of pixels") by bilinear interpolation. However, up
cannot reintroduce the detail that blurring removed. Thus, p_blurred_1
is still a blurred version of img
, even if it has the same number of pixels as img
.
If you now want to make a version of img
that is even more blurred, you can apply down
a number of times, that is, traverse several levels of a pyramid, and then apply up
to the result the same number of times, that is, traverse the pyramid in the reverse direction. The prefix p
in p_blurred_1
stands for "pyramid." The suffix 1
means that we blur just once. If we go through n
levels we obtain p_blurred_n
. The image below is a block diagram for p_blurred_3
. The meaning of the symbols should be obvious.
A different way of blurring an image is more straightforward: Just apply blur
as many times as desired. In this way, there is no change in resolution (number of pixels), so we call the sequence of increasingly blurred versions of the image a stack rather than a pyramid.
We call the resulting blurred image s_blurred_k
if we apply blur
k
times. The prefix s
can stand for "stack" (as opposed to "pyramid") or for "straightforward" if you prefer.
The function match_smoothers
is called above on the image cafe.png
you saw earlier. It first produces p_blurred_3
, because the argument down_sampling_rounds
is set to 3. It then computes s_blurred_k
for k
ranging between 1 and 40 (the value of the argument max_blurring_rounds
).
For each value of k
, a central portion of p_blurred_3
is compared with the corresponding portion of s_blurred_k
by taking the 2-norm of the difference between these two portions. The reason for only looking at a central portion is to stay away from image boundaries, where padding may cause unwanted artifacts that would complicate the analysis.
The plot above is the value of this 2-norm as a function of k
.
The conclusion we can draw from the plot above is as follows:
It takes about 24 rounds of
blur
to achieve approximately the same effect as 3 rounds ofdown
and 3 rounds ofup
.
The value of the minimum is not zero mainly because the bilinear interpolation performed by up
does not cancel subsampling (that is, taking every other pixel in each dimension) exactly.
If it did, then it would take 21 rounds of blur
rather than 24.
Your job here is to explain the relationship between the numbers 3 and 24 (or rather 21) in the conclusion above. In other words, why do you need to blur an image about 21 times to achieve an effect that is similar to that of (i) blurring and subsampling 3 times, and then (ii) interpolating your way back up through the 3 levels of the pyramid?
up
were to reverse subsampling exactly is 21, so your explanation should come up with this value instead of 24.up(blur(i[::2, ::2]))
where the Gaussian in blur
has parameter $\sigma$ is approximately equivalent to just blurring i
with a Gaussian with parameter $2\sigma$. Assume that this equivalence is exact: subsampling + blurring with $\sigma$ + up-sampling = blurring with $2\sigma$.p_blurred_3
shown earlier and reason your way towards the left to transform the diagram into a chain of blur
operators only.Successive layers in a Gaussian pyramid contain coarser and coarser image features because of the blur
operator in down
. Another way of saying this is to say that Gaussian smoothing eliminates high spatial frequencies from the image, in the sense that sharp details involve rapid transitions of image brightness from one pixel to the next (high spatial frequencies), and these are reduced by Gaussian blurring.
Because of this behavior, the Gaussian filter (convolution with a Gaussian kernel) is called a low-pass filter: It stops some high frequencies and lets lower frequencies (more slowly varying signals) go through to the output unharmed.
In some scenarios it is also useful to look at the part of the image that a Gaussian filter blocks, that is, the higher spatial frequencies that do not make it through the filter.
As an example, the Figure below shows
img
,low = up(up(up(down(down(down(img))))))
(this is the same transformation that produced p_blurred_3
in the previous Part of this assigment), andhigh = img - low
.By-and-large, the clouds dominate the low-pass image, while the finer detail on the ground dominates the high-pass image.
While the separation is not perfect, this example is meant to suggest that looking at the contents of an image in different frequency bands may be useful in various scenarios.
If the town in the satellite image above were on rolling hills, it is conceivable that the shading the sun creates on the hills would lead to image features at intermediate frequencies: Neither as high as the buildings nor as low as the clouds. The hill image would be a band-pass image.
To create these band-pass images one can start from a Gaussian pyramid and compute appropriate differences. For instance, with three levels:
low_1 = down(img)
band_1 = img - up(low_1)
low_2 = down(low_1)
band_2 = low_1 - up(low_2)
low_3 = down(low_2)
band_3 = low_2 - up(low_3)
The original image img
together with the three low-pass images low_1
, low_2
, low_3
form the Gaussian pyramid. The image band_1
is what we called the "high-pass image." We call it band_1
simply for notational consistency, so we can easily write a for
loop over the levels when we have an arbitrary number of levels in the pyramid instead of three.
The three images band_1
, band_2
, band_3
encode image details at three different levels of detail (think of buildings, hills, and clouds).
The three-level pipeline above involves seven images: img
, low_1
, low_2
, low_3
, band_1
, band_2
, band_3
.
Show that it is possible to reconstruct img
exactly (up to rounding errors) if only the four images band_1
, band_2
, band_3
, low_3
are available.
This is a simple high-school algebra problem: Solve the equations above in reverse order:
low_2
in terms of images that are available (and using available operators).low_1
in terms of images that are available at this point (so low_2
is included).img
in terms of images that are available at this point.Just show the resulting equations in the order in which you solved them:
low_2 = ...
low_1 = ...
img = ...
A Laplacian pyramid is a pyramid of the type above that contains only the images that are necessary to reconstruct the input image img
. In the three-level case, these images are band_1
, band_2
, band_3
, low_3
.
More generally, one can keep creating $k$ levels in the pyramid until down-sampling is no longer possible because the image low_k
is too small.
To make this problem as simple as possible, we assume that images have shape $(2^k, 2^k)$, so that the pyramid has exactly $k$ levels.
To check for this condition you can import the function check_image_size
from smoothing
(you retrieved smoothing.py
earlier).
The resulting pyramid contains the band-pass images band_1
$\ldots$ band_k
as well as the low-pass image low_k
.
Since each application of down
shrinks the image by half, the low-pass image low_k
contains a single pixel. Also, since all Gaussians in down
are properly normalized, the value of low_k
is the average of all pixel values in the original image.
If we put all these images into a Python list, we can compute the Laplacian pyramid as follows.
from smoothing import check_image_size, up, down
def pyramid(x):
d = check_image_size(x)[0]
p = []
for level in range(d):
lower = down(x)
p.append(x - up(lower))
x = lower
p.append(x)
return p
The function composite
imported from display
places all the images in a pyramid into a single composite image for display with show_image
, which can also be imported from display
(you retrieved display.py
earlier). A function overlay_grid
(also from display
) adds some yellow lines to make the separation between the various images in the pyramid clearer. For example:
from display import composite, show_image, overlay_grid
img = plt.imread('cafe.png')
pyr = pyramid(img)
com = composite(pyr)
plt.figure(figsize=(12, 8), tight_layout=True)
show_image(com)
overlay_grid(com.shape[0])
plt.show()
The lower-left corner of the composite image is painted with the value of the single pixel in low_k
.
Write a Python function with header
def reconstruct(p):
that takes a Laplacian pyramid as constructed with pyramid
and reconstructs the original image.
Keep your code as simple as posible. Show your code and the result of the following calls (after uncommenting). These calls are also available in the homework template file.
#from display import show_image
#
# img = plt.imread('cafe.png')
# pyr = pyramid(img)
# rec = reconstruct(pyr)
# show_image(rec, fig=True)
Let $\mathbf{z} = (x, y)$ and
$$ f(\mathbf{z}) = 2 x^2 + xy + 3 y^2 $$and define
$$ \mathbf{z}_0 = (1, 0) \;. $$Write an algebraic expression in terms of $x$ and $y$ for the gradient
$$ \mathbf{g}(\mathbf{z}) = \nabla \mathbf{f}(\mathbf{z}) $$and give the numerical value of $\mathbf{g}(\mathbf{z}_0)$.
Suppose that gradient descent is performed on $f$ starting at $\mathbf{z}_0$. The first iteration of gradient descent determines the next point $\mathbf{z}_1$ by line search.
Give an algebraic expression in terms of $\alpha\in\mathbb{R}^+$ for the function $h(\alpha)$ that is minimized for that line search. Simplify your expression as much as possible.
Then, assume that line search is continued until the difference between the third and the first item of the bracketing triple is at most $10^{-6}$. Give the numerical value $\alpha_0$ that line search would return in that case rounding to four decimal digits after the period. Also give the numerical values of the point $\mathbf{z}_1$ and of the value $f(\mathbf{z}_1)$, rounding to the same number of decimal digits for each.
Important: Do not actually perform line search to answer this question. It is possible to answer the question by calculus alone. Show your calculations.
Line search during minimization of a certain function (unrelated to those in the previous problems) leads to minimization of the function
$$ h(\alpha) = 2 \alpha^3 - \alpha^2 -\alpha + 1 \;. $$The first bracketing triple is found to be
$$ a, b, c = 0, \frac{1}{4}, 1 \;. $$Give the second bracketing triple. Show your reasoning.
The class notes on optimization give pseudo-code for the procedure that shrinks a bracketing triple once (page 9). Turn that pseudo-code into a Python function with header
def shrink_triple(a, b, c, h):
where h
is the function $h(\cdot)$.
The paragraph at the top of page 9 describes a way to initialize the bracketing triple. Turn that description into a Python function with header
def initial_triple(h, step=1.e-4, min_step=1.e-12, max_step=1.e12):
In this header:
step
is the initial step, which is denoted by $\alpha_1$ in the notes.
For the initialization to work, it is necessary that $h(\alpha_1) < h(\alpha_0)$. If on the other hand $h(\alpha_1) \geq h(\alpha_0)$, try again with step
divided by 10. This can be achieved by a recursive call initial_triple(h, step / 10.)
. Stop these attempts if step
goes below min_step
. In that case return the triple None, None, None
to indicate failure.
If $h(\alpha_1) < h(\alpha_0)$, we are in business. Then, as long as $h(\alpha_i) \leq h(\alpha_{i-1})$, step
is multiplied by 10 and $\alpha_{i+1} = \alpha_i + $ step
, and the proper conditions for success are checked. This search succeeds if these conditions are met before step
exceeds max_step
. In that case, return the bracketing triple found.
If the conditions are not met by the deadline (that is, for step
no greater than max_step
), return None, None, min_c
, where min_c
is the smallest value $h(\alpha_i)$ encountered up to that point: This means that the search for a bracketing triple has failed, but the value min_c
is still an acceptable value for $\alpha$, because $h($ min_c
$) < h(0)$ and so some descent has been ensured in the value of the function.
You now have all the ingredients for line search. Write a Python function with header
def line_search(h, precision=1.e-6):
that uses initial_triple
and shrink_triple
and returns a value of $\alpha$ within precision
from a local minimum, or returns $\alpha = $c
if initial_triple
returns None, None, c
.
To return an appropriate value once $c - a$ is at most equal to precision
, just return b
, the second value of the bracketing triple.
If no value of $\alpha$ can be found, line_search
returns None
.
Write the functions above, show your code, and the values of $\alpha$ returned by the calls below (uncommented). These calls are also available in the homework template file.
One of the three calls should fail outright (return None
), one should return a value resulting from failed initialization (when initial_triple
returns None, None, c
), and one should succeed with an appropriate value. Think about which is which, to check your code. No need to let us know.
Do not use numpy
or any other import for this code, just straight Python. Leave all keyword arguments to their default values in all functions.
# print(line_search(lambda x: (x - 1) ** 2))
# print(line_search(lambda x: x ** 2))
# print(line_search(lambda x: -x))
In this part you will write code that uses either line search (which you developed in Part 4) or a fixed learning rate to minimize a function $f(\mathbf{z})$ of two variables (that is, $\mathbf{z}\in\mathbb{R}^2$) by gradient descent.
Restricting this exercise to a two-dimensional domain lets you plot the path $\mathbf{z}_0, \mathbf{z}_1, \ldots $ traversed during descent, and also the sequence of function values $f(\mathbf{z}_0), f(\mathbf{z}_1), \ldots $ (of course, this last sequence can be plotted regardless of the dimensionality of the domain).
Code will also keep track of how many times the function being minimized is called, to determine the cost of each full run of gradient descent. This monitoring of a function's behavior is implemented here through Python decorators and generators. Don't worry if you don't know much about these concepts, as most of the work on those aspects of the code is done for you.
While there is a bit of reading involved here, you will learn about programming techniques that come in handy in a variety of situations.
One way to keep track of descent paths and function call counts would be to include code that does that in your implementation of gradient descent. However, this would clutter the code and make it difficult to read and debug. Instead, Python decorators and generators let you separate what the function is supposed to do from the code that monitors any aspects of its behavior.
As an example, the function you will minimize is decorated with the @Count
decorator as follows, to monitor how many times it is called. Remember that you already retrieved display.py
, from which these decorators are imported.
from display import Display, Count, print_outcome
@Count
def rosenbrock(z):
a, b = z[1:] - z[:-1] ** 2, 1 - z[:-1]
f = np.sum(100. * a ** 2 + b ** 2, axis=0)
g = np.zeros(z.shape)
g[:-1] = - 400. * z[:-1] * a - 2 * b
g[1:] += 200 * a
return f, g
You can then simply reset the call counter by saying
rosenbrock.reset_counter()
and then call the function a few times:
for _ in range(5):
rosenbrock(np.random.rand(2))
To find out how many times the function was called do
print(rosenbrock.calls())
5
Note that the definition of rosenbrock
knows nothing about reset_counter
or calls
. All of that is done for you in the decorator @Count
. Thus, the code of rosenbrock
is kept clean of anything that does not pertain to the function itself.
You won't even have to worry about calling reset_counter
or calls
yourself, since display.py
also includes a print_outcome
function, discussed later, that both prints the number of calls and resets the call counter. So all you have to do is to call print_outcome
, as you will be instructed to do later.
The description of the @Count
decorator above is just to let you know what is going on. Of course, you can always peruse display.py
to see how this is all implemented.
@Display
Decorator and Python Generators¶The @Display
decorator plots the gradient descent path and function values, superimposed on a plot of the iso-contours of the function in some rectangle of the (two-dimensional) domain. This decorator does require a bit of collaboration on your part, as discussed next.
You will be asked to write a function gd
that searches for a local minimum of a function by gradient descent. The @Display
decorator for gd
needs to know which function to draw the iso-contours of, and in which rectangle.
Because of this, before you invoke gd
on a new function you have to say
gd.set_function(fct, box)
where fct
is the function to be minimized and box
is a list of the left, right, lower, and upper bounds of the rectangle. More on this later.
Again, it is not your job to write set_function
, that is done for you in the decorator, which is defined in display.py
.
The @Display
decorator also needs to be given the opportunity to temporarily stop execution of gd
, see the current status of the search (that is, be told the values of $\mathbf{z}_k$ and $f(\mathbf{z}_k)$ which the decorator needs in order to plot the search path), and then resume execution of gd
.
To this end, you will be shown how to transform your gd
, which is initially a function, into a generator, which lets @Display
intrude as described. This transformation is simple and merely involves the following:
return
in your definition of gd
into yield
. Yielding differs from returning in that control is yielded only temporarily to the caller (which is the @Display
decorator). While the caller takes over, gd
still remembers all of its state, so that when the caller calls gd
again this function can just continue execution as if nothing had happened.yield
statements wherever your code changes the values of $\mathbf{z}_k$ or $f(\mathbf{z}_k)$, so that the caller knows to record these values.Write a generator with decorated header
@Display
def gd(f, z, learning_rate=None, min_step=1.e-6,
min_drop=1.e-8, max_iter=100000):
that performs gradient descent on the function f
starting at point z
.
Instructions are given under the Programming Notes below on how to make the generator. You will first write a regular function, then transform it to a decorated generator. So don't worry about the decorator and generator aspects for now. Just start by writing a regular function that satisfies the following specifications.
If learning_rate
is None
, your function performs line search at each iteration, calling the function line_search
you developed earlier.
Otherwise, learning_rate
must be a positive floating point number, and gd
will use that as the learning rate, to be kept fixed throughout all iterations.
The function runs at most max_iter
iterations, and it will therefore contain a loop like this:
for k in range(max_iter):
where k
is the iteration number.
break
out of the loop (do not return
from inside the loop) if eithermin_step
, ormin_drop
, orline_search
fails to find an acceptable value.gd
returns None, None
.gd
returns the pair fz, z
, where z
is the current position $\mathbf{z}_k$ and fz
is the corresponding value $f(\mathbf{z}_k)$.max_iter
, min_step
, max_drop
when you run gd
in this assignment.To write the gd
generator, first comment out the decoration:
# @Display
then write gd
as a function (that is, use return
rather than yield
).
When done, debug your function carefully and test it.
When you write gd
make sure that it has the following structure. This will make it straightforward to turn the function into a generator later.
def gd(...)
# Compute the initial value of the function and its gradient
# and make a copy of z to prevent side effects.
(fz, gz), z = f(z), z.copy()
for k in range(max_iter):
...
# Compute the descent step
...
# Take the step
z -= step
# Check if done
if ...:
# IMPORTANT: Do not return here.
# Instead, break out of the loop
break
# This should be the ONLY return in the entire function
return fz, z
Once you are happy with how gd
works as a function, transform it into a generator by changing return
to yield
and adding extra yield
statements as specified below.
return
to yield
in the last line.yield fz, z
as follows:for
loop).if
block that terminates the for
loop.That's all there is to it. In this way, your code yields to the @Display
decorator every time z
and fz
change, so that @Display
can keep track of these values and then plot them when gd
is done.
To decorate your function, just uncomment the line
# @Display
that you commented out. If you run the function again as shown below, the decorator will display two plots that show how $\mathbf{z}_k$ and $f(\mathbf{z}_k)$ change over time.
gd
¶Here is the code for calling gd
. This code is also available in the homework template file (also commented out).
Rather than using a for
loop on the different learning rates, the calls are separated out into different notebook cells for easier formatting of the PDF file.
# # Part of the function domain to plot contours in
# box = [-1, 2, -1, 2]
# # Tell the decorator which function to use and in what box
# gd.set_function(rosenbrock, box)
# # Starting point for gradient descent
# z0 = np.array([-1 / 2, 1])
# # Run gd with line search
# value, position = gd(rosenbrock, z0)
# print_outcome('line_search', position, value, rosenbrock)
## Run gd again with three different, fixed learning rates
# rate = 0.00001
# value, position = gd(rosenbrock, z0, learning_rate=rate)
# print_outcome('learning rate {}'.format(rate),
# position, value, rosenbrock)
# rate = 0.001
# value, position = gd(rosenbrock, z0, learning_rate=rate)
# print_outcome('learning rate {}'.format(rate),
# position, value, rosenbrock)
# rate = 0.002
# value, position = gd(rosenbrock, z0, learning_rate=rate)
# print_outcome('learning rate {}'.format(rate),
# position, value, rosenbrock)
# rate = 0.003
# value, position = gd(rosenbrock, z0, learning_rate=rate)
# print_outcome('learning rate {}'.format(rate),
# position, value, rosenbrock)
The function rosenbrock
defined earlier takes a numpy
array z
of shape (2,)
(that is, a two-dimensional vector) and returns both the value fz
of the function and its gradient gz
at z
.
The function rosenbrock
has a unique minimum (which is therefore global) at $\mathbf{z}^\ast = (1, 1)$, so you'll know when your implementation of gd
succeeds.
To be precise, the definition of rosenbrock
above takes inputs z
of any dimensionality. However, we will only use it with two-dimensional vectors z
.
gd
correctly and then transform it to a generator.Answer the following questions about the results you obtained in the previous problem. Please number your answers and keep them in the same order as the questions.
Note that for all successful runs of gd
above you can determine the number of iterations, at least approximately, by looking at the abscissa of the function value plots. The number of function calls, on the other hand, is printed out explicily.
Explain why the search with the smallest fixed learning rate above failed.
Explain why the search with the greatest fixed learning rate above failed.
How could the code for gd
be improved so it can recover from the problem that occurred with an excessively large fixed learning rate? (No need to implement this improvement.)
What is the average number of function calls per iteration, approximately, for the run with line search above? An iteration corresponds to one increment of the for
loop variable k
.
What is the average number of function calls per iteration, approximately, in the successful runs with a fixed learning rate above?
What are the main trade-offs in selecting a good learning rate when the rate is fixed?
Since line search does not always lead to the most efficient search, what could be a reason for preferring that method to a fixed learning rate? This question is a bit open-ended, and different valid answers are possible.