Reproducible Software Stack

less than 1 minute read

You may have hear this echoing in those conferences that “Data Science is a team sport”. So how do you go about collborating when each members of the team working on their PC? What ran on my machine might not run on yours

Replicating what someone else has done is his or her digital environment is the starting point for collaboration.

repo2docker

It takes URL to a Git repository (You already should have this as part of your workflow) and creates a suitable Docker image.

It gives data scientists the benefits of containerization technology without needing to learn Docker itself.

Thereby enable you to replicate data science environments and share it, allowing your team to verify the results of analyses.

Link

Checkpoint

This R package from Microsoft is designed to make it easy to write reproducible R code by allowing you to go backward (or forward) in time to retrieve the exact versions of the packages you need.

https://mran.microsoft.com/documents/rro/reproducibility

MyBinder

MyBinder are JupterHub with repo2Docker. It lets you host interactive Jupyter notebooks that you can share. It’s a site that runs IPython/Jupyter Notebooks from GitHub for free

Resource limits maximum 2GB RAM, 10 minute inactivity timeout, 12 hour session.

Share on

Twitter Facebook LinkedIn

Yap Shiao Shyan

Reproducible Software Stack

repo2docker

Checkpoint

MyBinder

Share on

You may also enjoy

Online Jupyter Notebook

What I learn in 2018

Formatting options for Docker commands

Data science workflow explained