1.4 Topics covered (and not covered) in the course

The main topics that this course will cover are:

  • data preparation: how to clean up and massage a data set into an appropriate shape for plotting and analysis

  • data visualization: how to select relevant aspects of data for informative visualization

  • statistical models: what that is, and why it’s beneficial to think in terms of models, not tests

  • statistical inference: what that is, and how it’s done in frequentist and Bayesian approaches

  • hypothesis testing: how to test assumptions about a model’s parameters

  • generalized regression: how to apply generalized regression models to different types of data sets

There is, obviously, a lot that we will not cover in this course. We will, for instance, not dwell at any length on the specifics of algorithms for computing statistical inferences or model fits. We will also only deal with the history of statistics and questions of philosophy of science at the very end of this course and only to the extent that it helps us to better understand the theoretical notions and practical habits that are important in the context of this course. We will also not do extremely heavy math.

There are at least two different motivations for data analysis, and it is important to keep them apart. This course focuses on data analysis for explanation, i.e., routines that help us understand reality through inspection of empirical data. We will only glance at the alternative approach, which is data analysis for prediction, i.e., using models to predict future observations, as commonly practiced in machine learning and its applications. In sloppy slogan form, this course treats data science for scientific knowledge gain, not data science as an engineering application.