In this assignment, you will implement a minimal library for reverse-mode automatic differentiation and use it to implement and train a neural network to recognize digits in (a subset of) the MNIST-digits dataset.
Answer the questions below in a “answers.txt” plain file, “answers.md” Markdown, or “answers.pdf” PDF. I will not accept Microsoft Word, OS X Pages, or OpenOffice documents. (I prefer Markdown, so I can see it from your repository on Github directly)
In addition, submit whatever code you use to answer the questions below.
Finish the implementation of reverse-mode autodiff in
autodiff.py, and write
the helper functions for
Implement a fully-connected multi-layer neural network (with ReLU
nonlinearities) to classify the
mnist-digits dataset. Use the
to train your neural network. Train the neural network using simple
stochastic gradient descent, with a mini-batch size of 1. Experiment
with at least three different neural network architectures, and at
least two different numbers of layers. You will need to experiment
with learning the learning rate to find a good number.
During your training process, monitor the misclassification rate on the validation dataset, and choose the best one over a certain number of epochs. (You can determine this manually.)
What is the performance (in terms of misclassification rate) that you obtain on training data, validation data, and testing data?
Do the architectures matter significantly in this case?
What are the easy classes and hard classes? What classes tend to get confused with one another?
Attempt to the best of your ability to make your network overfit the training data. Can you? What architecture and training procedure achieves that?
Attempt to the best of your ability to make your network significantly underfit the training data. Can you? What architecture and training procedure achieves that? What does that say about this dataset?
You can expect upwards of 95% accuracy on the training data on this dataset.
Make sure you understand the data.
scipy to be installed in your Python setup; if
you have those libraries, then you can use
inspect the training set one image at a time.
Plan for this to take a while to run! With
python3, a 3-layer
network with about 30 neurons per layer takes about three minutes
per epoch. With
pypy3, it takes about one minute. (To give you an
idea of how inefficient this library is, in PyTorch this would take
a couple of seconds at most.)
test_* functions (and consider writing your own!) to
develop the automatic differentiation classes in
before moving to the development of the
NN class in
The dataset for this assignment comes from LeCun, Cortes, and Burges.