[title: line-fit] Machine Learning: for whom, by whom, to whom?
Carlos Scheidegger, HDC Lab
Who we are
https://hdc.cs.arizona.edu
Computing is cheap
Storage is cheap
Software is expensive!
- We spend U$312 billion per year on debugging alone
Machine Learning is a way out
-
Instead of writing code, we come up with examples of the expected
behavior.
-
Then, we write one (pretty weird) program once, and make
the computer adapt this program so that it does the things
in the data.
-
Then we need data!
[slide-data: backgroundImage /talks/2020-02-08/jason-pacheco.jpg] [title: bg-black] The Promise of Machine Learning
- Milstein, Pacheco, et al.
Intracortical Brain-Computer
Interfaces, NeurIPS 2017
[title: line-fit] The Peril of Machine Learning, 2009
[title: line-fit] The Peril of Machine Learning, 2018
But how does it all work?
[title: line-fit] Yet another AI CS admissions app
- YAICS, for short
- AI will determine which PhD applicants to accept
- … using features associated with good and bad applications
- … using historical data
- It is data-driven, it will be objective!
[slide-data: backgroundImage /talks/2020-02-08/fifa.jpg] [title: bg-black line-fit] The features are obvious and exact, right?
In reality…
- GPA
- GRE scores
- Relevant Major?
- Good School?
- Research Experience?
- …
- Our goal: evaluate application quality.
YAICS
- create a rule that assigns a score to each candidate
- select the high-scoring candidates
- What rule?
Here’s the data
| Ajit | Blake | Cedric | Daniela |
GPA | 3.75 | 4.0 | 3.5 | 3.8 |
GRE-V | 120 | 105 | 120 | 95 |
GRE-Q | 110 | 117 | 100 | 130 |
GRE-A | 5 | 4 | 3 | 6 |
major | CS | CS | Math | ECE |
school | MIT | ASU | NAU | UA |
research? | yes | no | no | yes |
PhD in 6? | yes | no | yes | yes |
How do we evaluate our rule?
| Ajit | Blake | Cedric | Daniela |
GPA | 3.75 | 4.0 | 3.5 | 3.8 |
GRE-V | 120 | 105 | 120 | 95 |
GRE-Q | 110 | 117 | 100 | 130 |
GRE-A | 5 | 4 | 3 | 6 |
major | CS | CS | Math | ECE |
school | MIT | ASU | NAU | UA |
research? | yes | no | no | yes |
PhD in 6? | yes | no | yes | yes |
- Don't assess your rule on the data you used to compute the rule!
- Overfitting: You don't want your ML to memorize
- Split the data into training and testing
[slide-data: backgroundImage /talks/2020-02-08/yaicsv1.png] How about a very simple rule?
[slide-data: backgroundImage /talks/2020-02-08/yaicsv2.png] … Maybe more complicated?
| Ajit | Blake | Cedric | Daniela |
GPA | 3.75 | 4.0 | 3.5 | 3.8 |
GRE-V | 120 | 105 | 120 | 95 |
GRE-Q | 110 | 117 | 100 | 130 |
GRE-A | 5 | 4 | 3 | 6 |
major | CS | CS | Math | ECE |
school | MIT | ASU | NAU | UA |
research? | yes | no | no | yes |
PhD in 6? | yes | no | yes | yes |
$p_1 \textrm{GPA} + p_2 \textrm{GRE-V} + p_3 \textrm{GRE-Q} + p_4 \textrm{GRE-A} + $
$p_5 \textrm{major} + p_6\textrm{school} + p_7\textrm{research}$
[slide-data: backgroundImage /talks/2020-02-08/yaicsv3.png]
[slide-data: backgroundImage /talks/2020-02-08/yaicsv4.png] [title: line-fit] And this is a “deep” neural network
How do we find the right values?
- Often, it’s just gradient descent
- Force the (bad) model to make a prediction at a random data point
- Measure the error, take the gradient of the error wrt the parameters
- Nudge parameters in the (negative) direction, repeat
- Eventually, stop
So now that you know how to do it.. (often) don’t!
Where did you get that data?
- Who decided if the previous admission process was good?
- What if (and hear me out here) there were times where the admissions committee made a mistake?
- What if your recruiting efforts were imbalanced with respect to gender?
- What if you were giving out mortgages instead?
[slide-data: backgroundImage /talks/2020-02-08/redlining.jpg]
[slide-data: backgroundImage /talks/2020-02-08/bullshit-gaydar.png]
[slide-data: backgroundImage /talks/2020-02-08/criminality-paper.png]
[slide-data: backgroundImage /talks/2020-02-08/phrenology.png]
We don’t really understand this!
- Computer says:
- Left: "stop"
- Right: "45 mph"
- "Robust Physical-World Attacks on Deep Learning Visual Classification", CVPR 2018
[title: line-fit]
[title: line-fit] So you’re going to use ML. Great!
- But if it’s about people, please ask yourself this:
- Who benefits from it? (For whom?)
- Who are the targets, and who suffers from mistakes? (To whom?)
[title: line-fit] So you’re going to use ML. Great!
- But if it’s about people, please ask yourself this:
- Is your data going to repeat our racist, sexist past?
- Did you ask them if they want it?
- Are you ready to stop if they say no?
- Do you truly know the domain? Why you? (By whom?)
ML is not an excuse to ignore ethics, history, and society!
Thank you!
- and thank you to
- my colleagues Stephen Kobourov and Jason Pacheco for their examples and materials
- McCallum and Blok for storage data
- Come to mine and Stephen’s talk at the Centennial Hall, Feb 25th!
- @scheidegger, https://cscheid.net