Back to Talks

**Audience level:**
Intermediate

**Topic area:**
Modeling

We will describe the python package `pomegranate`

, which implements flexible probabilistic modeling. We will highlight several supported models including mixtures, hidden Markov models, and Bayesian networks. At each step we will show how the supported flexibility allows for complex models to be easily constructed. We will also demonstrate the parallel and out-of-core APIs.

In this talk we will describe `pomegranate`

, a flexible probabilistic modeling package for python. We will highlight its wide library of probability distributions and compositional models, such as mixture models, Bayes classifiers, hidden Markov models, and Bayesian networks. At each step we will emphasize the flexibility provided by pomegranate and how it can easily allow the construction of more complicated models. We will compare these implementations to other implementations in the open source community. In addition, we will show how the underlying modularity of the code allows for models to be stacked to produce models such as mixtures of Bayesian networks, or HMMs with complicated mixture emissions. Lastly, we will show how easy it is to use the built-in out-of-core and parallel APIs to allow for multithreaded training on massive amounts of data that can't fit in data-- all without the user having to think about any implementation details. An accompany Jupyter notebook will allow users to follow along, see code examples for all figures presented, and try out modifications.

Some highlights of the tutorial are the following:

General

BLAS and cython are used to speed up calculations

multithreaded parallel processing is natively supported

out-of-core computing for large data sets

Models

models can be stacked in each other, such as a Bayesian classifier of mixtures, a mixture of hidden Markov models

models are faster and more representationally flexible than other packages

Bayesian network structure learning is fully supported, including DP/A* algorithm for exact, constraint graphs, and approximate algorithms

The talk will roughly follow this breakdown:

Introduction to pomegranate

Probability distributions

- probability distributions supported
- general API for these distributions
- comparison to numpy / scipy
- natural out-of-core API from sufficient statistics

Bayes classifiers

- Bayes rule, priors/likelihood/posteriors
- Naive Bayes vs Bayes Classifier

Mixture models

- Perspective as an unsupervised Bayes classifier
Homogenous mixtures

- Univariate and multivariate mixtures
- Comparison to sklearn for multivariate gaussians

Heterogeneous mixtures

- Mixtures of distributions
- Mixtures of models

Hidden Markov models

- Perspective as a mixture model with a transition matrix
Supported features

- Sparse underlying implementation
- Silent states
- Distributions can differ for different nodes

Supported algorithms

- Viterbi
- Maximum-A-Posteriori
Training

- Baum-Welch
- Viterbi
- Labeled

Comparison to hmmlearn

Bayesian networks

- What is a Bayesian network?
- Belief propogation through factor graph representation
Structure learning

- DP/A* algorithm
- Constraint graphs for prior knowledge

Parallel API