Sunday 1:30 p.m.–2:20 p.m.

Creating Reproducible Data Science Workflows using Docker Containers

Aly Sivji

Audience level:
Intermediate

Description

Jupyter notebooks make it easy to create reproducible workflows that can be distributed across groups and organizations. This is a simple process provided that our end-users have access to the data along with a compatible Python environment. Learn how to use Docker to package a shareable image containing the libraries, code, and data required to reproduce every calculation.

Abstract

Containerization technologies such as Docker enable software to run across various computing environments. Data Science requires auditable workflows where we can easily share and reproduce results. Docker is a useful tool that we can use to package libraries, code, and data into a single image.

This talk will cover the basics of Docker; discuss how containers fit into Data Science workflows; and provide a quick-start guide that can be used as a template to create a shareable Docker image!

Learn how to leverage the power of Docker without having to worry about the underlying details of the technology. Although this session is geared towards data scientists, the underlying concepts have many use cases (come find me after to discuss).