This is a collection of small notes on data science, slowly sprawling into “notes on math” as I see need. They are here because I’ve found myself explaining some things to students (and to myself!) over and over again. These notes are more informal than what you may be comfortable with, and should by no means replace actual textbooks on probability, statistics, data mining, and machine learning.
I have found that students often lack the intuition that ties these fields together. These notes try to explain some of the concepts in a way that I hope will be useful.
Methods which use data to make predictions about new, unseen data.
Methods which, given a dataset that has a complex representation, create simpler versions of the dataset.