Back to Talks

Parsimonious Pythonic Pipelines with Provenance

Ben Mabey Recursion Pharmaceuticals

Audience level: Intermediate
Topic area: Misc


Productionalizing a ML model doesn't need to be an exercise of learning a complex workflow system. Instead, by decorating your functions with the provenance library you can quickly setup a pipeline with serialization and provenance (lineage) tracking. Using the same system you can also share models and features to facilitate team collaboration in a research setting. Learn how in this talk!


This talk will introduce the provenance library and illustrate how it addresses common pain points in data science projects and teams. We will cover:

  • How to decorate existing functions in your project to create a pipeline that is as easy to run from a Notebook as a build server.

  • How to put the artifacts created by the pipeline into production with a few lines of code.

  • The concept of provenance and how the library provides this, always allowing you to know where a model (or a prediction from that model!) came from.

  • How to experiment locally on a model and then share it with a team member while preserving all of the provenance. Reproducible research!