August recap: Data Cleaning with R, Part 2

Data cleaning is the process of preparing your data for analysis; ensuring that it is technically correct and in the desired format. Data cleaning can often be more time-consuming than the actual analysis! This was our second meetup on the topic. Click here for a recap of our first data cleaning meetup in June.

Reshaping data

We began with an introduction to reshaping data from Alice. The presentation was based on the DataCamp tutorial Long to Wide Data in R. Data can be long format (one measurement per row) and wide format (many measurements in one row). It is important to be able to convert between the two formats as different functions require different formats. In R, there are a variety of functions which can be used for this task:

Function Package To long format To wide format
stack/unstack utils stack unstack
reshape stats reshape(direction = “long”, …) reshape(direction = “wide”, …)
melt/dcast reshape2 melt dcast
melt/dcast tidyr gather spread

Functions for converting to long format and wide format (adapted from Table 34 from Long to Wide Data in R)

Alice provided a few examples of how to use these functions. The script is available on github.

The R-Ladies Data Cleaning Gauntlet!

Next up was a series of data cleaning challenges which we tackled in small groups. The challenges, created by Alice, meant putting into practice approaches and techniques from both data cleaning meetups. We cleaned the Philly farmers’ markets data which was also featured in our June meetup.

The materials are available on github:

RLadies working on the data cleaning challenges


We would like to thank WeWork for hosting us!

“WeWork is a community for creators. We transform buildings into beautiful, collaborative workspaces and provide the infrastructure, services, events and technology so our members can focus on doing what they love. WeWork currently has 111 locations in 29 cities across the world with over 70,000 members. Book a tour at now!”

This post was authored by Amy Goodwin Davies. For more information contact