In this assignment, you will implement the basic linear perceptron, use it to predict labels for the datasets we have been working on, and also perform a small amount of feature selection and engineering in order to create a good set of features for another dataset.
You will find useful helper code in the files perceptron.py
and
transform.py
in the starter repo.
Specifically, you will implement the class LinearPerceptron
in
perceptron.py
, and you will create a subclass of FeatureTransform
in transform.py
(by finishing the skeleton code in
perceptron-transform.py
) to find a suitable transformation of one of
the datasets below.
Answer the questions below in a “answers.txt” plain file, “answers.md” Markdown, or “answers.pdf” PDF. I will not accept Microsoft Word, OS X Pages, or OpenOffice documents. (I prefer Markdown, so I can see it from your repository on Github directly)
In addition, submit whatever code you use to answer the questions below.
After implementing the linear perceptron described in class, use it
to create classifiers for the agaricus-lepiota
and
primary-tumor
datasets.
Remember that primary-tumor
has multiple classes, and that
the perceptron algorithm only works on binary classifiers. The
helper code now includes code (in
Dataset.convert_labels_to_numerical
) to convert multiple labels
to a binary label.
Report the accuracy you get for all possible labels, and the influence of perceptron hyperparameters (number of passes) on training and validation/test accuracy. How do these numbers compare to the ones you’ve seen before? Explain.
Run the perceptron classifier you wrote on
mystery-dataset.pickle
. What accuracy do you get? Do you get
better accuracy with the decision trees or k-nearest-neighbor
classifiers you’ve written before?
Improve the accuracy of the perceptron classifier by engineering
better features for the perceptron to use. It should be possible
for you to attain effective 100% accuracy on this
classifier. Implement the method transform_features
of
MysteryTransform
in perceptron-transform.py
. Note that you will
have to come up with a good feature transformation yourself,
possibly by inspecting the training data and thinking hard.
What transformation did you implement, and what accuracy do you get?