Notes on data science

This is a collection of small notes on data science, slowly sprawling into “notes on math” as I see need. They are here because I’ve found myself explaining some things to students over and over again. These notes are more informal than what you may be comfortable with, and should by no means replace actual textbooks on probability, statistics, data mining, and machine learning.

I have found that students often lack the intuition that ties these fields together. These notes try to explain some of the basic concepts in a way that I hope will be useful.

Basics

Supervised Learning

Methods which use data to make predictions about new, unseen data.

Unsupervised Learning

Methods which, given a dataset that has a complex representation, create simpler versions of the dataset.

Other

Data visualization