Homework 5: Classification

Which Digit?
Which are Faces?

Many thanks to Dan Klein and John DeNero for providing this framework.

Classification Warm Up

Question 1 (10 points) For this question, you will prove that a network of perceptrons can compute any logical function. All that is necessary to prove this is to show that you can implement AND, OR, and NOT in perceptrons since any logical function can be constructed by a combination of these. (It's actually even easier than this, but we'll stick with AND, OR and NOT for this question.) To answer this question, you should provide 3 sets of weights, one for a perceptron that recognizes AND, one for a perceptron that recognizes OR, and one for a perceptron that recognizes NOT.

Question 2 (10 points) Consider a learning problem where the inputs are X and Y, and the target concept is a ball of radius 1 in the X-Y plane, i.e., everything inside the ball is "True" and everything outside the ball is "False". a) Explain why the true and false classes are not linearly separable if the inputs are just X and Y. b) Suppose the inputs are now X, Y, their squares, and their product. Explain whether the classes are linearly separable now. c) Suppose you were forced to use a classifier with just one feature. Provide a single feature that would make the classes linearly separable.

Programming Basic Classifiers

In this homework, you will design a perceptron classifier. You will test your classifier on two image data sets: a set of scanned handwritten digit images and a set of face images in which edges have already been detected. Even with simple features, your classifiers will be able to do quite well on these tasks when given enough training data.

Optical character recognition (OCR) is the task of extracting text from image sources. The first data set on which you will run your classifiers is a collection of handwritten numerical digits (0-9). This is a very commercially useful technology, similar to the technique used by the US post office to route mail by zip codes. There are systems that can perform with over 99% classification accuracy (see LeNet-5 for an example system in action).

Face detection is the task of localizing faces within video or still images. The faces can be at any location and vary in size. There are many applications for face detection, including human computer interaction and surveillance. You will attempt a simplified face detection task in which your system is presented with an image that has been pre-processed by an edge detection algorithm. The task is to determine whether the edge image is a face or not. There are several systems in use that perform quite well at the face detection task. Your digital camera or the camera on your phone may do some basic face detction.

The code for this homework includes the following files and data, available as a zip file

Data file
data.zip Data file, including the digit and face data.
Files you will edit
perceptron.py The location where you will write your perceptron classifier.
svmClassifier.py The location where you will write your SVM classifier as a bonus question.
Files you should read but NOT edit
classificationMethod.py Abstract super class for the classifiers you will write.
(You should read this file carefully to see how the infrastructure is set up.)
samples.py I/O code to read in the classification data.
util.py Code defining some useful tools. You may be familiar with some of these by now, and they will save you a lot of time.
mostFrequent.py A simple baseline classifier that just labels every instance as the most frequent class.

What to submit: You should submit perceptron.py and and svmClassifier.py (if you do the bonus question).

Evaluation: Your code will be autograded for technical correctness. Please do not change the names of any provided functions or classes within the code, or you will wreak havoc on the autograder.

Getting Started

To try out the classification pipeline, run dataClassifier.py from the command line. This will classify the digit data using the default classifier (mostFrequent) which blindly classifies every example with the most frequent label.

python dataClassifier.py  

As usual, you can learn more about the possible command line options by running:

python dataClassifier.py -h  

We have defined some simple features for you. Our simple feature set includes one feature for each pixel location, which can take values 0 or 1 (off or on). The features are encoded as a Counter where keys are feature locations (represented as (column,row)) and values are 0 or 1. The face recognition data set has value 1 only for those pixels identified by a Canny edge detector.

Implementation Note: You'll find it easiest to hard-code the binary feature assumption. If you do, make sure you don't include any non-binary features. Or, you can write your code more generally, to handle arbitrary feature values, though this will probably involve a preliminary pass through the training set to find all possible feature values (and you'll need an "unknown" option in case you encounter a value in the test data you never saw during training).


A skeleton implementation of a perceptron classifier is provided for you in perceptron.py. You will fill in the train function, and the findHighOddsFeatures function.

A perceptron does not use probabilities to make its decisions. Instead, it keeps a prototype weight vector $w^y$ of each class $y$. Given a feature list $f$, the perceptron computes the class $y$ whose prototype is most similar to the input vector $f$. Formally, given a feature vector $f$ (a map from properties to counts, pixels to intensities), we score each class with:

\mbox{score}(f,y) = \sum_i f_i w^y_i
Then we choose the class with highest score as the predicted label for that data instance. In the code, we will represent $w^y$ as a Counter, which maps features (pixels) to their count in digit $y$'s prototype.

Learning weights

What we need is a method of learning the prototype weights. In the basic multi-class perceptron, we scan over the data, one instance at a time. When we come to an instance $(f, y)$, we find the label with highest score:
y' = \textmd{arg max}_{y''} score(f,y'')

We compare $y'$ to the true label $y$. If $y' = y$, we've gotten the instance correct, and we do nothing. Otherwise, we guessed $y'$ but we should have guessed $y$. That means that the prototype $w^y$ needs to be more like $f$ and the prototype $w^{y'}$ needs to be less like $f$ to help prevent this error in the future. We update these two prototypes accordingly:

w^y += f
w^{y'} -= f

Using the addition, subtraction, and multiplication functionality of the Counter class in util.py, the perceptron updates should be relatively easy to code. Certain implementation issues have been taken care of for you in perceptron.py, such as handling iterations over the training data and ordering the update trials. Furthermore, the code sets up the weights data structure for you. Each legal label needs its own prototype Counter full of weights.

Question 3 (10 points)

Bonus: SVM

In this Bonus part, you are encouraged to train an SVM (support vector machine) classifier and compare its performance to your perceptron classifier. Note that there is no penalty for skipping the bonus question; we'll figure out grades ignoring the bonus question and the bonus will only be checked later to see if it can give those who earn it an extra oomph to push them to the next grade level. Since this assignment only has 3 questions and the bonus is worth 15 points, this is an opportunity to get 1.5X credit for the last assignment.

Don't worry about implementing an SVM by hand. Instead, we'll use the scikit-learn module that provides an easy-to-use SVM. All you need is to get scikit-learn running for our case.

You'll need to write your code in svmClassifier.py. You will complete the train and classify function.

You do not have to know how exactly SVMs work to complete this question, but you are encouraged to read the SVM page on Wikipedia.

Installing scikit-learn

The first step is to install scikit-learn on your machine and make sure you can call import sklearn without error in python. Third-party softwares/packages (when explicitly permitted by the instructor!) can save you time over implementing everything from scratch. The first step is usually installation, which can be quite challenging by itself sometimes. Luckily, scikit-learn is not so hard and you can follow the steps in its installation instruction. We recommend installing it with pip or easy_install, which are distribution tools for conveniently installing python modules.


Training SVM

Question 4 (15 points) Fill in the train and classify method in svmClassifier.py. Compare the peformance of your SVM classifier to your perceptron classifier with 100 training samples. Run your code with:

python dataClassifier.py -c svmClassifier 

Hints and observations: