COMPSCI 371 Homework 0 (Prerequisites)¶

This homework assignment revisits some concepts from calculus, linear algebra, and probability that you have already encountered in the past. It is also designed to help you get accustomed with Jupyter and Python 3, if you don't know these already, and with the process required to submit your work on Gradescope.

The assignment is due before this semester's add/drop deadline, so you can gauge your familiarity with the prerequisites for this course and decide accordingly.

Homework Submission Workflow¶

When you submit your work, follow the instructions on the submission workflow page scrupulously for full credit.

Anaconda¶

Before you start, install the Anaconda Python distribution, which contains essentially all of the code you will need for this course. Please do not use other installations of Python, and install the version for Python 3, not Python 2. Also, if you install new Python packages later on, do not mix package installation methods, unless you know very thoroughly what you are doing (for instance, do not use pip to install new software, only use conda). This will save you big headaches as the semester progresses.

A conda Environment¶

If you use Python for more than this course this semester, you are strongly advised to create a conda environment for the class, and do all your work for it in that environment. This will prevent interference with other installations. The easiest way to create such an environment is to run the following command in a terminal:

conda env create --file environment.yml

Here is the file environment.yml . Running this file may take some time, but you only need to do this once.

The name of the resulting environment is compsci371 . If you are in the conda base environment, you can activate teaching by the command

conda activate compsci371

When you are done using the environment, you can go back to the base environment, if desired, with the command

conda deactivate

Editing a Notebook¶

Homework assignments in this course, including this one, are made available as HTML files on the class homework page. You can only read and print these files, not edit them. A separate template file is also made available for each assignment. The template is a Jupyter notebook, a file called homework0n.ipynb, where n is the homework number. This is a file you can also edit, and will become the notebook you submit for grading.

To edit a notebook, start the Anaconda Navigator and launch Jupyter Notebook from the Navigator. This will open a new page in your web browser with a listing of your home directory. Navigate to the directory where you put the file homework0n.ipynb and click on it to open it. The file will open in the notebook editor in a new tab of your web browser.

To edit some text or code in the notebook, double-click on it. This will change the display of that cell to edit mode. Try this now.

Working with Cells¶

A cell is a block of content. There are a few different types of cell, but we will be concerned mainly with Markdown cells and Code cells. The former contain text, including mathematics, and the latter contain snippets of code (Python 3 for us, although notebooks can deal with any programming language).

When you run a Markdown cell, its contents are formatted and displayed. A Markdown cell can also contain straight HTML code (with some restrictions). However, there won't be any need for this in this course.

When you run a Code cell, the code in it is executed in the current kernel (a process that runs in the background), and any output from it is displayed after the cell. Any object created during execution (variables, function definitions,...) remains alive in the kernel, so you can assign a value to a variable in one cell and access it in a later cell. You can do various things to the kernel, such as restart it, interrupt it, and so forth. Please browse the Kernel menu at the top of the page.

To make, edit, delete, run, and do several other things to and with cells, you use the commands in the Edit, View, Insert, and Cell menus. Look at these menus and try out some of the commands. The icons just below the menus are shortcuts for some of the more common commands.

In Markdown cells, all text except mathematics is formatted with Markdown syntax, a simplified document annotation language. The mathematics is enclosed between dollar signs when it is inline with the text like the equation $\frac{de^x}{dx} = e^x$, and between double dollar signs when it is displayed in the middle of a line on its own, like this:

$$\int_0^{\infty} e^{x} dx = 1\;.$$

For your reference, here is a list of Markdown syntax elements

For your reference, here is a list of LaTeX commands for mathematics

You may want to bookmark these pages in your browser for later reference.

After you make changes to a cell, you can see the results of your changes by selecting Cell->Run Cells or Cell->Run All. The first command runs only the cell you are in, or the cells you select. The second command runs all the cells in the notebook. The Run button in the menu bar is a shortcut for Cell->Run Cells.

A safer way to test the entire notebook is to run Kernel->Restart and Run All. This is a mandatory step before you submit your work. After running this command, check the entire document to make sure that everything displays as expected.

To make pagination easier when you produce your PDF file for submission, make sure that all cells in your notebook are short. If you write several long paragraphs, make each paragraph a separate cell, so that the conversion programs will have an easy time finding adequate page break points.

Debugging Code¶

Code can in principle be debugged in Jupyter notebooks by using Python's own pdb module, or by using more visual debuggers such as the one provided with PixieDust.

However, using a programming IDE on your computer is much easier and provides more powerful debugging tools. My absolute favorite is PyCharm, which is insanely well conceived and implemented. If you request it from an .edu email account, you can even get the professional edition for free.

You are strongly encouraged to write and test your code in your favorite IDE first, and then copy and paste it into notebook cells. Debugging a five-line function in Jupyter may be OK, but it is virtually impossible to figure out what is wrong with more complex code without a good debugger. If you do not heed this advice you will likely waste large amounts of time during the semester.

The point made earlier on keeping cells short for good pagination applies here as well: Keep functions (and individual lines of code) short by good code structuring, and put each function in its own cell.

Starting your Work¶

To work on this assignment, open the HTML file homework00.html from the class homework page (that's what you are reading now) in one browser window, and open the notebook homework00.ipynb in the notebook editor (in another browser window) next to it.

As you work on homework00.ipynb, add as many cells as you need to answer the questions (use the Insert->Cell Below menu or the + icon in the menu bar). When you do so, make sure you select the proper cell type (Markdown for text, Code for Python code) from the menu, or else your formatting of text will not work, and your code will not run.

Again, keep your cells small in both dimensions for easier pagination.

Part 1: Algebra and Multivariate Calculus¶

The code below displays a contour plot of the following function $\mathbb{R}^2\rightarrow\mathbb{R}$:

$$ f(x, y) = x^3 + y^3 + 3 x y^2 - 15 x - 15y\;. $$

We will use this function and its Python implementation f(x) given below throughout this assignment. In the implementation, the two arguments $x, y$ are packaged into a single numpy vector x.

If this is the first time you use either numpy or matplotlib, study the code below carefully. You may also try some variants on your own (do not hand these in). We will use these packages extensively in this course.

Note¶

In your answers to the problems in this part show your work only if you are asked for it or if you are unsure of your results (for possible partial credit).

Programming Notes¶

  • The directive

      %matplotlib inline
    
    

    makes sure that plotting works correctly in a Jupyter notebook when using matplotlib. It is just needed once per notebook, before the first plot. Remember this for future assignments.

  • Even this very simple piece of code is broken down into small cells, to make pagination easier when the notebook is converted to PDF.

  • Feel free to use the function decorate_axes defined below for your own plots. This function lets you add well-visible labels to the axes and makes the tick marks easily readable.

In [1]:
%matplotlib inline

import numpy as np
from matplotlib import pyplot as plt


def f(x):
    return x[0] ** 3 + x[1] ** 3 + 3. * x[0] * x[1] ** 2 - 15. * x[0] - 15. * x[1]
In [2]:
def decorate_axes(x_label=None, y_label=None, fontsize=16):
    plt.xticks(fontsize=fontsize)
    plt.yticks(fontsize=fontsize)
    if x_label is not None:
        plt.xlabel(x_label, fontsize=fontsize)
    if y_label is not None:
        plt.ylabel(y_label, fontsize=fontsize)
In [3]:
def draw_contours(fct, x_range=(-3., 3.), y_range=(-3., 3.), levels=20):
    step = 0.01
    x = np.arange(x_range[0], x_range[1], step)
    y = np.arange(y_range[0], y_range[1], step)
    xg, yg = np.meshgrid(x, y)
    zg = fct((xg, yg))
    plt.figure(figsize=(10, 10), tight_layout=True)
    cp = plt.contour(xg, yg, zg, levels=levels)
    plt.clabel(cp, inline=True, fontsize=16)
    decorate_axes()
    plt.show()
In [4]:
draw_contours(f)

Problem 1.1¶

Give exact expressions for the two components $f_x(x, y)$ and $f_y(x, y)$ of the gradient

$$ \nabla f(x, y) = \left[\begin{array}{c} f_x(x, y) \\ f_y(x, y) \end{array}\right] $$

of the function $f$ above at an arbitrary point $(x, y)$.

Problem 1.2¶

Write a system of equations whose solutions are critical points (local maxima, local minima, saddles) of the function $f$ above ("critical point" and "stationary point" are used as synonyms in this course). Then use elementary algebra to find exact numerical expressions for all the critical points of $f$.

Note¶

An exact numerical expression can contain algebraic operations but no variable names or approximations. For instance, $\sqrt{3}/2$ is such an expression but $\sqrt{2}x$ or $1.732$ are not (unless $1.732$ happens to be the exact value rather than an approximation for $\sqrt{3}$).

Problem 1.3¶

Give exact expressions for the three distinct components $f_{xx}(x, y)$, $f_{yy}(x, y)$, and $f_{xy}(x, y)$ of the Hessian

$$ H_f(x, y) = \left[\begin{array}{cc} f_{xx}(x, y) & f_{xy}(x, y) \\ f_{yx}(x, y) & f_{yy}(x, y) \end{array}\right] $$

of the function $f$ above at an arbitrary point $(x, y)$. Since all partial derivatives involved are continuous everywhere, we have $f_{xy}(x, y) = f_{yx}(x, y)$.

Problem 1.4¶

Write exact numerical expressions for the Hessian $H_f$ at the critical points you found above. Specify which Hessian is for which point.

Problem 1.5¶

It's easy to make mistakes in hand calculations, so we'll now check our results numerically. We have

$$ f_x(x, y) = \frac{\partial f}{\partial x}(x, y) = \lim_{\delta\rightarrow 0} \frac{f(x + \delta, y) - f(x - \delta, y)}{2\delta} $$

and an analogous expression for $f_y(x, y)$. We could have written

$$ \lim_{\delta\rightarrow 0} \frac{f(x + \delta, y) - f(x, y)}{\delta} $$

instead, but the first fraction above converges more rapidly: The two limits are the same, but the first fraction, which is called the central difference, approaches the limit asymptotically faster than the second, so it is more suitable for numerical calculations.

Of course we cannot compute limits numerically, but we can approximate the limit by using a small value of $\delta$:

$$ f_x(x, y) \approx \frac{f(x + \delta, y) - f(x - \delta, y)}{2\delta} \;. $$

Write a function with header

def gradient(fct, x0, delta):

that takes a function fct from $\mathbb{R}^n$ (not just from from $\mathbb{R}^2$) to $\mathbb{R}$, a numpy vector x0 that represents a point in $\mathbb{R}^n$, and a small positive value delta and uses the central difference approximation with the given delta to compute the gradient of fct at x0.

Then write a function with header

def hessian(fct, x0, delta):

that takes the same arguments as gradient. This function uses gradient to compute the Hessian of fct at x0.

Show your code and the Hessians it computes at the critical points of the function $f$ you found above when delta is 1.e-5.

You may want to check your results against the ones you found by hand, but do not show your check.

Programming Notes¶

  • Actually use gradient to simplify your work here, do not reapply the approximate definition of derivative from scratch. Hint: How does the Hessian relate to the gradient? You don't need to know this, it's easy to figure it out with a small example.

  • Your printout should used three decimal digits after the period (unless fewer digits are sufficient to display a value exactly). This can be achieved with the format specification '{:.3f}' for individual floating-point numbers of with the following context manager for an entire numpy array:

      with np.printoptions(precision=3, suppress=True):

Problem 1.6¶

Compute exact numerical expressions for the eigenvalues of the Hessian(s) you found above and use the results to categorize your critical points by type (local maximum, local minimum, saddle point) whenever possible. Justify your answer.

Notes¶

  • Again, you may want to check your results numerically. However, your eigenvalues need to be exact (no approximations). Do not hand in your checks.
  • While checking your results visually against the contour plot given earlier is a good idea, answers based on eyeballing that plot are not acceptable.

Problem 1.7¶

The function $g(x, y) = x^2 + y^3$ has zero gradient at the origin $(0, 0)$, which is therefore a critical point. The Hessian there is

$$ H_g(0, 0) = \left[\begin{array}{cc} 2 & 0 \\ 0 & 0 \end{array}\right] \;. $$

The eigenvalues of $H_g(0, 0)$ are on its diagonal, and are therefore $\lambda_1 = 2$ and $\lambda_2 = 0$.

If the eigenvalues of $H_g(0, 0)$ are enough to determine the type of the origin as a critical point (local minimum, local maximum, saddle point), state the type and justify your answer.

If the eigenvalues of $H_g(0, 0)$ are not enough, first state why not and then determine the type of critical point by other means. Explain your answer.

Problem 1.8¶

Write code to plot the restriction of the function $f$ defined above to the line $\ell$ that goes through the point $\mathbf{x}_0 = (-1, 0)$ of the $(x, y)$ plane and is tilted 30 degrees away from the $x$ axis in the counter-clockwise direction.

Your plot should show this restriction for points on $\ell$ that are within 4 units of distance from the point $\mathbf{x}_0 = (-1, 0)$ of the $(x, y)$ plane.

Give a brief description of how your code works and show your code and the plot it makes. No need to label the axes, but feel free to.

Part 2: Probability¶

The topic of this part is probability, but these problems also encourage you to program efficiently in numpy.

Let's still use the function $f$ from Part 1, for which code was provided as well. The plot below shows the rectangle $R = [0, 3]\times [0, 2]$ and the single contour curve $f(x, y) = -26$ of that function.

In [5]:
draw_contours(f, x_range=(0. ,3), y_range=(0., 2.), levels=(-26,))

Problem 2.1¶

Let $a$ be the area of the bean-shaped region $A$ in the interior of the closed contour curve shown in the plot above. If you pick a point uniformly at random in the rectangle, what is the probability $p$ that the point will fall inside $A$? Write your answer as an expression in terms of $a$ and $r$, the area of rectangle $R$.

Problem 2.2¶

Write code that uses the fact you established in the previous problem to estimate the area $a$ of region $A$. You can do so by picking $n$ points uniformly at random. You can then estimate $p$ as the fraction of points that fall inside $A$, and from there you can infer $a$, since $r$ is known.

The estimate of $a$ is random, and its quality depends on $n$. For each value of $n$ in the vector

ns = np.logspace(2, 5.5, 50, dtype=int)

estimate the area $a$ a number reps$=30$ of times and compute mean and standard deviation of the 30 values. Then use matplotlib.pyplot.errorbar to plot the mean as a function of $n$, using the standard deviations to plot the error bars.

It's up to you how you organize your code into functions. However, you must comply with the programming notes below for full credit. Show your code and the resulting plot. Label the axes meaningfully.

Hint¶

How is the region $A$ defined in terms of $f$ and the number value$= -26$?

Programming Notes¶

  • If you go wild with nested for loops, your code will take forever to run. Instead, your code should contain a single explicit Python for loop, the one that loops over the vector ns. All other loops should be implicit, that is, performed internally by some numpy function. This is because the numpy library is implemented in C, a compiled language, and C loops are orders of magnitude faster than loops in Python, which is an interpreted language.
  • To understand how to do this, suppose that you create numpy arrays x and y of shape (n, reps) of random numbers using just two calls of numpy.Generator.random followed by appropriate scaling and shifting. These two calls will perform C loops internally to fill the arrays and are very fast.
  • You can then collect x and y into a single numpy array of shape (2, n, reps) and call the function f once with this array as its argument. The function f is written in vectorized form. If you study it, you will see that if called with this array it returns a single array of shape (n, reps) with the desired values: A single Python call will cause very fast C loops to be executed.
  • Everything else can be done in this way, operating on arrays rather than on scalars.
  • My code takes about 4 seconds on an iMac. Yours may take a bit more or a bit less, but if your code takes minutes (or longer) rather than seconds then you are doing something wrong.

Problem 2.3¶

In this problem and the next two we reconcile the empirical observations made in the previous problem with theory.

Let $\sigma_a(n)$ the true (as opposed to empirical) standard deviation of the area $a$ if it is computed with a certain number $n$ of samples. The values you found in the previous problem are empirical estimates of $\sigma_a(n)$, so we call those $\hat{\sigma}_a(n)$.

Given a certain value of $n$, let $k$ be the number of points selected independently, uniformly, and at random in the rectangle that fall within $A$. Thus, $k$ is a random variable that can take on values between 0 and $n$ inclusive. What is the theoretical probability distribution $p_n(k)$?

Specifically, name the distribution and give a formula for its values $p_n(k)$ for $k=0,\ldots, n$. If this distribution has one or more parameters, what are the values of the parameter(s) in terms of $a$ and $r$?

Problem 2.4¶

What is the (theoretical) mean $m_k(n)$ of the random variable $k$? What is the (theoretical) standard deviation $\sigma_k(n)$ of $k$? Feel free to look up these formulas.

Given $m_k(n)$ and $\sigma_k(n)$, what are the (theoretical) mean $m_a(n)$ and standard deviation $\sigma_a(n)$ of the random variable $a$ (the area of region $A$)?

Write your answers as expression involving $a, r, n, k$. Note that we are asking for standard deviations, not variances.

Finally, is the procedure for estimating the area $a$ described earlier biased? How can you tell?

Problem 2.5¶

For the values of $n$ in the vector ns given earlier, write code that creates a single diagram with two plots, one of the theoretical standard deviation values $\sigma_a(n)$ and the other of the empirical standard deviation values $\hat{\sigma}_a(n)$ you found earlier. Rather than using matplotlib.pyplot.plot, use the function matplotlib.pyplot.semilogy to plot $y$ values on a logarithmic scale. This is because the values span a wide range, and they become more easily visible on this type of plot.

For your theoretical plots you need the true value of $a$. Take that to be the last mean value you found in Problem 2.2. This is the mean of 30 estimates computed with more than 300,000 samples each, so it is probably very close to the true value.

Show your code and the diagram it generates. Label the axes meaningfully and add a legend to the diagram that shows which plot is for which quantity. Also print the approximate value of $a$ you use in your calculations, using three decimal digits after the period.

In [ ]: