4 Data Wrangling


badge-data-wrangling

The information relevant for our analysis goals is not always directly accessible. Sometimes, we must first uncover it effortfully from an inconvenient representation. Also, sometimes data must be cleaned (ideally: by a priori specified criteria) through removing data points that are deemed of insufficient quality for a particular goal. All of this, and more, is the domain of data wrangling: preprocessing, cleaning, reshaping, renaming etc. Section 4.1 describes how to read data from and write data to files. Section 4.2 introduces the concept of tidy data. We then look at a few common tricks of data manipulation in Section 4.3. We will learn about grouping operations in Section 4.4. Finally, we look at a concrete application in Section 4.5.

The learning goals for this chapter are:

  • be able to read from and write data to files
  • understand the notion of tidy data
  • be able to solve common problems of data preprocessing