Back to Talks

Pomegranate: Fast and Flexible Probabilistic Modeling in Python

Jacob Schreiber Paul G. Allen School of Computer Science, University of Washington

Audience level: Intermediate
Topic area: Modeling

Description

We will describe the python package pomegranate, which implements flexible probabilistic modeling. We will highlight several supported models including mixtures, hidden Markov models, and Bayesian networks. At each step we will show how the supported flexibility allows for complex models to be easily constructed. We will also demonstrate the parallel and out-of-core APIs.

SLIDES: https://github.com/jmschrei/pomegranate/blob/master/slides/pomegranate%20data%20intelligence%202017.pdf

Abstract:

In this talk we will describe pomegranate, a flexible probabilistic modeling package for python. We will highlight its wide library of probability distributions and compositional models, such as mixture models, Bayes classifiers, hidden Markov models, and Bayesian networks. At each step we will emphasize the flexibility provided by pomegranate and how it can easily allow the construction of more complicated models. We will compare these implementations to other implementations in the open source community. In addition, we will show how the underlying modularity of the code allows for models to be stacked to produce models such as mixtures of Bayesian networks, or HMMs with complicated mixture emissions. Lastly, we will show how easy it is to use the built-in out-of-core and parallel APIs to allow for multithreaded training on massive amounts of data that can't fit in data-- all without the user having to think about any implementation details. An accompany Jupyter notebook will allow users to follow along, see code examples for all figures presented, and try out modifications.

Some highlights of the tutorial are the following:

General

  • BLAS and cython are used to speed up calculations

  • multithreaded parallel processing is natively supported

  • out-of-core computing for large data sets

Models

  • models can be stacked in each other, such as a Bayesian classifier of mixtures, a mixture of hidden Markov models

  • models are faster and more representationally flexible than other packages

  • Bayesian network structure learning is fully supported, including DP/A* algorithm for exact, constraint graphs, and approximate algorithms

The talk will roughly follow this breakdown:

  • Introduction to pomegranate

  • Probability distributions

    • probability distributions supported
    • general API for these distributions
    • comparison to numpy / scipy
    • natural out-of-core API from sufficient statistics
  • Bayes classifiers

    • Bayes rule, priors/likelihood/posteriors
    • Naive Bayes vs Bayes Classifier
  • Mixture models

    • Perspective as an unsupervised Bayes classifier
    • Homogenous mixtures

      • Univariate and multivariate mixtures
      • Comparison to sklearn for multivariate gaussians
    • Heterogeneous mixtures

      • Mixtures of distributions
      • Mixtures of models
  • Hidden Markov models

    • Perspective as a mixture model with a transition matrix
    • Supported features

      • Sparse underlying implementation
      • Silent states
      • Distributions can differ for different nodes
    • Supported algorithms

      • Viterbi
      • Maximum-A-Posteriori
      • Training

        • Baum-Welch
        • Viterbi
        • Labeled
    • Comparison to hmmlearn

  • Bayesian networks

    • What is a Bayesian network?
    • Belief propogation through factor graph representation
    • Structure learning

      • DP/A* algorithm
      • Constraint graphs for prior knowledge
  • Parallel API