6 Data Visualization
Numerical summaries of complex data always incur information loss. Still lossy, but less so (if done well), is visualization. Any serious data analysis should start with a process in which the analyst becomes intimate with the data at hand. Visualization is an integral part of data-intimacy.
Section 6.1 demonstrates how summary statistics can be misleading and how a simple visualization can be much more revealing.
Section 6.2 offers some reflection on what makes a data visualization successful.
Section 6.3 introduces the basics of data visualization with the ggplot
package, an integral part of the tidyverse.24
This first exposition is based on a scatter plot for the avocado price data.
Going beyond scatter plots, Section 6.4 looks at some common types of plots and how to realize them using the geom_
family of functions in ggplot
.
The learning goals for this chapter are:
-
obtain a basic understanding of better/worse plotting
- understand the idea of hypothesis-driven visualization
- develop a basic understanding of the ‘grammar of graphs’
-
get familiar with frequent visualization strategies
- bar plots, densities, violins, error bars, etc.
- be able to fine-tune graphs for better visualization
It is possible to create ggplot-like graphs with similar syntax in Python, as described here. The Gadfly package for Julia uses very similar ideas (of incremental composition) but a different syntax.↩︎