Yesterday, I had the honour of presenting at The Data Science Conference in Chicago. My topic was Reproducible Data Science with R, and while the specific practices in the talk are aimed at R users, my intent was to make a general argument for doing data science within a reproducible workflow. Whatever your tools, a reproducible process:
- Saves time,
- Produces better science,
- Creates more trusted research,
- Reduces the risk of errors, and
- Encourages collaboration.
Sadly there's no recording of this presentation, but my hope is that the slides are sufficiently self-contained. Some of the images are links to further references, too. You can browse them below, or download (CC-BY) them from the SlideShare page.
Thanks to all who attended for the interesting questions and discussion during the panel session!