In this assignment, you will implement a gradient descent procedure for predictors based on three losses: the quadratic loss, the logistic loss, and the hinge loss.
You will use a quadratic loss for a regression problem, and a logistic and hinge loss for a classification problem.
Answer the questions below in a “answers.txt” plain file, “answers.md” Markdown, or “answers.pdf” PDF. I will not accept Microsoft Word, OS X Pages, or OpenOffice documents. (I prefer Markdown, so I can see it from your repository on Github directly)
In addition, submit whatever code you use to answer the questions below.
Implement a gradient descent procedure to minimize the quadratic loss with an $l_2$ regularizer. Stop your gradient descent procedure when the squared norm of the gradient goes below $10^{-4}$.
Run the classifier for the dataset regression-dataset.pickle
under the following regularization values: $\lambda = {0, 0.0001,
0.01}$. You might have to experiment with the learning rate for
the algorithm to converge. What are the validation and training
losses you obtain?
Design a procedure for early stopping, based on monitoring the validation error. What are the accuracies you obtain for the regularization values above? Discuss.
Note that it might be easier to store a list of the losses and plot the loss curve over time in order to see what’s happening.
Implement a gradient descent optimization procedure to find a classifier that minimizes the logistic loss function with an $l_2$ regularizer. Stop your gradient descent procedure when the quadratic norm of the gradient goes below $10^{-4}$.
Run the classifier for the agaricus-lepiota
dataset under the
same regularization values as above. You might have to experiment
with the learning rate for the algorithm to converge. What are the
validation and training losses you obtain?
Does early stopping help in this case?
Implement a gradient descent optimization procedure to find a classifier that minimizes the hinge loss function with an $l_2$ regularizer. Stop your gradient descent procedure when the quadratic norm of the gradient goes below $10^{-4}$.
Run the classifier for the agaricus-lepiota
dataset. Experiment
with different regularization values. Which values produce more or
fewer training points inside the margin? Which values produce
better training or validation error?
Does early stopping help in this case?