Reproducible Research - Data in a jar

The aim of a good research protocol is that the research can be easily reproduced, by ourselves and by others, today or in three years.

A really good summary of what reproducable research is and why it is so highly recommended can be found in the online course book for Colorado State University’s R Programming for Research courses, written by Brooke Anderson, Rachel Severson, and Nicholas Good.

A data analysis is reproducible if all the information (data, files, etc.) required is available for someone else to re-do your entire analysis. This includes: (1) Data available, (2) All code for cleaning raw data, (3) All code and software (specific versions, packages) for analysis.
https://geanders.github.io/RProgrammingForResearch/reproducible-research-1.html

Basically, we want to be able to return to the specific research in some years and figure out what we did, and let other come in and reproduce it. We want to be able to make review (internal or external) of out code and analysis at any step, and re-run steps of the analysis pipeline as needed. We want to be able to easily make improvements to code or data and re-run all the analysis. We want to be able to share the whole project, and/or share useful parts with others that can help with their work or/and extend your own.

Reproducible research in R

R is an excellent tool for this. Both because it is free, and because you can save all data and scripts needed in a project environment. You basically build a pipeline with scripts that are run step by step, from cleaning data, to final analysis with any tables and figures included. The idea is that you do not include any steps that are not scripted and included in the project environment. An external step of adjusting something in excel by hand is not easily reproducible and therefore avoided.

Related posts:

Leave a Reply Cancel reply