In this assignment, you will implement a minimal library for reverse-mode automatic differentiation and use it to implement and train a neural network to recognize digits in (a subset of) the MNIST-digits dataset.
Answer the questions below in a “answers.txt” plain file, “answers.md” Markdown, or “answers.pdf” PDF. I will not accept Microsoft Word, OS X Pages, or OpenOffice documents. (I prefer Markdown, so I can see it from your repository on Github directly)
In addition, submit whatever code you use to answer the questions below.
Finish the implementation of reverse-mode autodiff in autodiff.py
, and write
the helper functions for relu()
, softmax()
Implement a fully-connected multi-layer neural network (with ReLU
nonlinearities) to classify the mnist-digits
dataset. Use the
multiclass cross-entropy
loss
to train your neural network. Train the neural network using simple
stochastic gradient descent, with a mini-batch size of 1. Experiment
with at least three different neural network architectures, and at
least two different numbers of layers. You will need to experiment
with learning the learning rate to find a good number.
During your training process, monitor the misclassification rate on the validation dataset, and choose the best one over a certain number of epochs. (You can determine this manually.)
What is the performance (in terms of misclassification rate) that you obtain on training data, validation data, and testing data?
Do the architectures matter significantly in this case?
What are the easy classes and hard classes? What classes tend to get confused with one another?
Attempt to the best of your ability to make your network overfit the training data. Can you? What architecture and training procedure achieves that?
Attempt to the best of your ability to make your network significantly underfit the training data. Can you? What architecture and training procedure achieves that? What does that say about this dataset?
You can expect upwards of 95% accuracy on the training data on this dataset.
Make sure you understand the data. test_mnist.py
requires
matplotlib
and scipy
to be installed, but you can use it to
inspect the training set one image at a time.
Plan for this to take a while to run! With python3
, a 3-layer
network with about 30 neurons per layer takes about three minutes
per epoch. With pypy3
, it takes about one minute. (To give you an
idea of how inefficient this library is, in PyTorch this would take
a couple of seconds at most.)
Use the test_*
functions (and consider writing your own!) to
develop the automatic differentiation classes in autodiff.py
before moving to the development of the NN
class in nn.py
!
The dataset for this assignment comes from LeCun, Cortes, and Burges.