Homework 9¶

Homework Submission Workflow¶

When you submit your work, follow the instructions on the submission workflow page scrupulously for full credit.

Important: Failure to do any of the following will result in lost points:

Submit one PDF file and one notebook per group
Enter all the group members in your PDF Gradescope submission, not just in your documents. Look at this Piazza Post if you don't know how to do this
Match each answer (not question!) with the appropriate page in Gradescope
Avoid large blank spaces in your PDF file

Part 1: Losses¶

Linear SVMs have some advantages over logistic regression classifiers in some scenarios. However, these advantages are difficult to demonstrate in an assignment, because they involve data spaces $X$ of high dimensionality $d$. While it is straightforward to generate synthetic data sets with large $d$, it is difficult to obtain stable estimates of performance, because of the curse of dimensionality: To test a classifier and obtain consistent (that is, low-variance) error rates would require a prohibitively large test set $S$. Because of this difficulty, we will not explore these situations.

Instead, we examine why for more common scenarios binary SVMs and binary logistic regression classifiers perform quite similarly to each other. The main reason for this is that the loss functions for the two classifiers are not drastically different from each other.

Recall that for logistic regression classifiers the two labels are usually called $0$ and $1$, while for SVMs they are called $-1$ and $1$, and you need to pay attention to using the appropriate values in your formulas.

Problem 1.1¶

Use the formulas in the class notes to write functions with headers

def regressionLoss(z, delta):
def hingeLoss(y, delta):

that take a label $z\in Z = \{0, 1\}$ or $y\in Y = \{-1, 1\}$ and a value $\delta$ for the signed distance from the separating hyperplane and compute the logistic-regression loss and the hinge loss as functions of label and $\delta$. Keep in mind that the logistic-regression loss is a composition of logistic function and cross-entropy loss.

Also write a function with header

    def plotLosses(y):

that takes a label $y\in Y = \{-1, 1\}$ and plots the two loss functions for $-3\leq \delta \leq 3$. Show your code and the two plots that result from calling plotLosses, first with argument $1$ and then with argument $-1$.

Programming Notes¶

Use the formulas from the class notes, not "equivalent" expressions you may find on the web.
Use the base 2 logarithm for the regression loss. Correct plots have a loss of 1 on the decision boundary.
The function plotLosses needs to convert y to z before it calls regressionLoss.
The main point of this exercise is to get the formulas right. So write your code cleanly and with meaningful variable names, and double-check everything.
Plot the regression loss in blue and the hinge loss in red. Add a legend to your plots to show which curve is for which loss. Also add a title that shows the value of y.
Remember to add the magic

%matplotlib inline

Problem 1.2¶

Suppose that there is a single data outlier that is misclassified by a very large (negative) margin. Referring to the plots in Problem 1.1, which of the two losses is more sensitive to that outlier, and why?

"More sensitive" here means that the decision boundary moves more as a result of adding the outlier.

Part 2: SVMs¶

The key advantage of SVMs over logistic-regression classifiers is the ability of the former to model nonlinear boundaries through the introduction of kernels. In some cases, on the other hand, transforming or augmenting the data with appropriate features puts logistic-regression classifier back in the race, as we will see below.

The following code creates training and test data sets for a circle of positive samples surrounded by a ring of negative ones. The code also plots the test set $S$.

import numpy as np
import matplotlib.pyplot as plt
import sklearn.datasets as ds
from sklearn.model_selection import train_test_split
%matplotlib inline
ns = 300
data = {}
data['x'], data['y'] = ds.make_circles(n_samples=ns,
    noise=0.2, factor=0.3, random_state=1)
testFraction = 0.6
T, S = {}, {}
T['x'], S['x'], T['y'], S['y'] = train_test_split(data['x'], data['y'],
 test_size=testFraction, random_state=0)
p, n = S['x'][S['y']==1], S['x'][S['y']!=1]
plt.figure(figsize=(10,10))
plt.plot(p[:, 0], p[:, 1], '.b', fillstyle='none')
plt.plot(n[:, 0], n[:, 1], '.r', fillstyle='none')
plt.axis('equal')
plt.axis('off')

(-1.409118932486272,
 1.4707802536886163,
 -1.509294151919646,
 1.4493604864382867)

Programming Notes¶

The problems in this part ask you to evaluate and plot the performance of different classifiers. Because of this, it will save you a lot of time and mistakes if you write helper functions for repetitive tasks. One way to do this is to first write code in answer to the first problem, and then encapsulate that code into suitable functions for later reuse.
It may be helpful to use and possibly adapt the function evaluate from homework 8 to compute error rates.
Instructions on plot format are given in the first problem. Plots for the other problems in this part are in the same format, except that no support vectors are of course to be plotted for the linear-regression classifier.

Problem 2.1¶

The data in $T$ and $S$ is obviously not linearly separable, so a linear classifier should not be expected to do well. To verify this, train sklearn.svm.SVC with arguments kernel='linear', C=1 on $T$, show its zero-one training and test error rates (on $S$) as percentages with two decimal digits after the period, and plot the data in $T$ and decision regions. Warning: Most points wil be support vectors for this first plot. This is OK.

Plot Format¶

Your plot should be properly scaled, be legible, and have no axis tick marks or labels. Look at the code above for sizing the figure, scaling the axes equally, and turning axes off.
The figure background should show the decision regions: Positive region colored in blue and negative region colored in red. Use a transparency value alpha around 0.3 for good visibility. These colors can be drawn by predicting the label of every point on a fine grid: Find the minimum and maximum coordinates for each axis, add 0.5 to the maximum and subtract 0.5 to the minimum to create a bit of margin, use numpy.meshgrid to make a grid, and then color the grid, perhaps with matplotlib.pyplot.contourf. I separated the grid points by 0.02 units.
There are six categories of data points, listed below together with the marker and fill style to use for each. The last two categories are not used for logistic-regression classifiers. Use default marker sizes.
- True positives (classified as positive and actually positive), '.b', 'none'
- True negatives (classified as negative and actually negative), '.r', 'none'
- False positives (classified as positive and actually negative), 'xr', 'none'
- False negatives (classified as negative and actually positive), 'xb', 'none'
- Support vectors for data points with positive label, 'sb', None
- Support vectors for data points with negative label, 'sr', None

(Warning: 'none' and None are not the same.)

Pay attention to these definitions and the corresponding styles, and make sure you get it right.

Problem 2.2¶

Do the same with kernel='rbf' (same value for C as before).

Problem 2.3¶

Each data point in $T$ and $S$ has two features, $x_1$ and $x_2$. Augment these by adding the following redundant features \begin{eqnarray*} x_3 &=& x_1 ^ 2\\ x_4 &=& x_2 ^ 2\\ x_5 &=& x_1 x_2 \end{eqnarray*} Then repeat the experiment above with SVC(kernel='linear', C=1) on the augmented data. Of course, plot just $x_1$ and $x_2$, not the other features, unless you have a 5D printer handy.

Problem 2.4¶

Explain carefully why this type of data augmentation works well for this specific data set.

Problem 2.5¶

Try the experiment in Problem 2.3 with sklearn.linear_model.LogisticRegression. Use parameters C=1e5, solver='lbfgs', multi_class='multinomial', random_state=0.