site |
---|
sandpaper::sandpaper_site |
A part of the data workflow is preparing the data for analysis. Some of this involves data cleaning, where errors in the data are identifed and corrected or formatting made consistent. This step must be taken with the same care and attention to reproducibility as the analysis.
OpenRefine (formerly Google Refine) is a powerful free and open source tool for working with messy data: cleaning it and transforming it from one format into another.
By the end of this lesson, you will be able to:
- create, export and import a project in OpenRefine
- view and work on subsets of rows using facets and text filters
- reduce variations in data through clustering, bulk editing and transformations
- undo and redo actions and export the history of actions
- save cleaned data in a widely supported file format
This lesson will teach you to use OpenRefine to effectively clean and format data and automatically track any changes that you make. Many people comment that this tool saves them literally months of work trying to make these edits by hand.
Importantly, this lesson does not cover all of OpenRefine's functionalities. It also does not correct all errors in the provided dataset.
Data Carpentry's teaching is hands-on, so participants are encouraged to use their own computers to ensure the proper setup of tools for an efficient workflow.
These lessons assume no prior knowledge of the skills or tools.
To most effectively use these materials, please make sure to install everything before working through this lesson.