Back to Talks

Parsimonious Pythonic Pipelines with Provenance

Ben Mabey Recursion Pharmaceuticals

Audience level: Intermediate
Topic area: Misc

Description

Productionalizing a ML model doesn't need to be an exercise of learning a complex workflow system. Instead, by decorating your functions with the provenance library you can quickly setup a pipeline with serialization and provenance (lineage) tracking. Using the same system you can also share models and features to facilitate team collaboration in a research setting. Learn how in this talk!

Abstract:

This talk will introduce the provenance library and illustrate how it addresses common pain points in data science projects and teams. We will cover:

  • How to decorate existing functions in your project to create a pipeline that is as easy to run from a Notebook as a build server.

  • How to put the artifacts created by the pipeline into production with a few lines of code.

  • The concept of provenance and how the library provides this, always allowing you to know where a model (or a prediction from that model!) came from.

  • How to experiment locally on a model and then share it with a team member while preserving all of the provenance. Reproducible research!