Bias and Fairness

We’ve so far studied a variety of techniques and mechanisms to create programs that can classify unseen data so that it matches a “training set” well.

This works well when we assume that the classification we want to carry out should in fact directly replicate the distribution seen in the training set. At first glance, it would seem obvious that this has to be the goal. But: are you sure that the data you collected is itself what you want your ML program to do?

There are (at least) two central concerns about fairness in ML:

Although a full discussion of these issues is far beyond the scope of this class, we will engage with a number of proposals in the literature. It’s also worth noting that the literature in this topic is quite recent, which means we’ll be reading research papers in addition to textbook chapters.

Reading Material