Welcome to pymdp’s documentation!
pymdp
is a Python package for simulating active inference agents in
discrete space and time, using partially-observed Markov Decision Processes
(POMDPs) as a generative model class. The package is designed to be modular and flexible, to
enable users to design and simulate bespoke active inference models with varying levels of
specificity to a given task.
For a theoretical overview of active inference and the motivations for developing this package, please see our companion paper: “pymdp: A Python library for active inference in discrete state spaces”.
Installation
We recommend installing pymdp
using the package installer pip, which will install the package locally as well as its dependencies.
This can also be done in a virtual environment (e.g. one created using venv
or conda
).
When pip installing pymdp
, use the full package name: inferactively-pymdp
:
(.venv) $ pip install inferactively-pymdp
pymdp
Fundamentals
Author: Conor Heins
Note: When running this notebook in Google Colab, you may have to run !pip install inferactively-pymdp
at the top of the notebook, before you can import pymdp
. That cell is left commented out below, in case you are running this notebook from Google Colab.
# ! pip install inferactively-pymdp
Very brief pymdp
and active inference background
pymdp
is a Python library, written primarily in NumPy, for solving Partially-Observed Markov Decision Processes (POMDPs) using active inference, a modelling framework for perception, learning and action, originally derived from the Free Energy Principle. POMDP models have traditionally been used in disciplines ranging from control theory and robotics, to biological fields concerned with empirically modelling human and animal behavior.
For a theoretical overview of active inference and the motivations for developing this package, please see our companion paper: “pymdp: A Python library for active inference in discrete state spaces”.
More in-depth coverage of applying active inference to POMDPs can be found in the following two papers: Active Inference: A Process Theory and Active Inference in Discrete State Spaces: A Synthesis
Below are links to a few useful papers, talks, and blog posts that concern active inference and the free energy principle more broadly:
Blog post: Tutorial on Active Inference by Oleg Solopchuk
Talk: The variational foundations of movement by Karl Friston
Podcast: Karl Friston: Neuroscience and the Free Energy Principle hosted on Lex Fridman Podcast #99
Paper: A tutorial on the free-energy framework for modelling perception and learning by Rafal Bogacz
Paper: The free energy principle for action and perception: A mathematical review by Christopher Buckley, Chang Sub Kim, Simon McGregor, and Anil Seth
Paper: A free energy principle for the brain by Karl Friston
Paper: A Step-by-Step Tutorial on Active Inference and its Application to Empirical Data by Ryan Smith and Christopher Whyte
Also see Jared Tumiel’s blog post and Beren Millidge’s github repository for more literature and pedagogical materials related to active inference and the free energy principle.
Data structures in pymdp
In this tutorial notebook we’ll be walking through the core functionalities and data structures used by pymdp
, most of which will be easy to grasp for anyone who’s familiar with array programming in numpy
or even MATLAB, where much of the existing code related to active inference currently exists.
We’ll start by running through a full usage example of quickly building and running a random active inference agent in pymdp
.
In the interest of focusing on data structures and syntax, we’ll just initialize the agent to have a random POMDP generative model – we don’t need to analogize this generative model to any hypothetical task environment that our agent will be behaving within. The agent will just have random beliefs about how the world evolves and how it gives rise to observations. The focus here is on how we store these objects (e.g. components of the generative model) in pymdp
.
For more in-depth tutorials that actually work through full, tutorial-style examples of building active inference agents and running them, please see the Active inference from scratch tutorial and Using the agent class tutorial.
Other examples in the Examples section (e.g. the T-Maze demo or the Epistemic chaining demo provide walk-throughs of building active inference agents in particular task environments.
""" Quickly build a random active inference agent, give it an observation and have it do hidden state and policy inference """
import pymdp
from pymdp import utils
from pymdp.agent import Agent
num_obs = [3, 5] # observation modality dimensions
num_states = [4, 2, 3] # hidden state factor dimensions
num_controls = [4, 1, 1] # control state factor dimensions
A_array = utils.random_A_matrix(num_obs, num_states) # create sensory likelihood (A matrix)
B_array = utils.random_B_matrix(num_states, num_controls) # create transition likelihood (B matrix)
C_vector = utils.obj_array_uniform(num_obs) # uniform preferences
# instantiate a quick agent using your A, B and C arrays
my_agent = Agent( A = A_array, B = B_array, C = C_vector)
# give the agent a random observation and get the optimized posterior beliefs
observation = [1, 4] # a list specifying the indices of the observation, for each observation modality
qs = my_agent.infer_states(observation) # get posterior over hidden states (a multi-factor belief)
# Do active inference
q_pi, neg_efe = my_agent.infer_policies() # return the policy posterior and return (negative) expected free energies of each policy as well
action = my_agent.sample_action() # sample an action from the posterior over policies
numpy
object arrays
In pymdp
, both generative model distributions (e.g. A
or B
) as well as posterior distributions over hidden states (e.g. qs
) and observations are represented as what we call “object arrays”. Note that other conventions may refer to these arrays as “arrays-of-arrays”, “nested arrays”, or “ragged/jagged arrays.”
Object arrays differ from more “classical” numpy.ndarrays
that have numerical (e.g. np.float64
) data types. The typical numpy.ndarray
of float or integer data type, can only have scalar values as its entries. Object arrays (numpy.ndarray
with dtype = object
) however, do not have this restriction. Their array elements can be arbitrary Python data structures or objects. The entries of an object array can thus have arbitrary type and dimensionality.
Deeper dive on object arrays
Object arrays can be initialized standard numpy
constructors, but by also explicitly declaring dtype = object
.
my_empty_array = np.empty(5, dtype = object)
The pymdp.utils
module provides a set of functions in that allow you to quickly construct, convert between, and perform mathematical operations on these object arrays. Fundamentally underpinning these utils
operations is the assumption that the elements of the input object array are numpy.ndarrays
with numerical data type. In other words, we simply use object arrays to store collections of multi-dimensional arrays or standard numpy.ndarrays
. This construction is chosen to allow pymdp
to easily perform probabilistic computations on collections of multi-dimensional arrays, where each constituent array within the collection represents a particular factor or marginal distribution within a factorized representation. In the next section we’ll get into why this constructino is useful and how it’s mathematically motivated.
Constructing factorized distributions with object arrays
For example, let’s imagine a joint categorical probability distribution over some variable $\mathbf{s}$ that factorizes into the product of a set of 3 marginal (i.e. univariate) distributions. Mathematically, we can express that factorization as:
$$ P(\mathbf{s}) = \prod_i^3 P(s_i) $$
In pymdp
, instead of encoding this distribution as the fully-enumerated product of marginals $P(\mathbf{s}) = P(s_1) \times P(s_2) \times P(s_3)$, we instead exploit the factorization and only store the separate marginals $P(s_i)$ of the full distribution. Specifically, we store each marginal as a single numpy.ndarray
of ndim == 1
(i.e. a row vector) in a different entry of a numpy.ndarray
of dtype
object. The object array representation be easily constructed and manipulated using utils
functions.
num_factors = 3 # we have 3 factors total
dim_of_each_factor = [2, 2, 4] # factor 1 has 2 levels, factor 2 has 2 levels, factor 4 has 4 levels
P_s = utils.random_single_categorical(dim_of_each_factor)
print(P_s[0])
[0.49922246 0.50077754]
Here we used the utility function random_single_categorical(shape_list)
to create a random collection of marginal categorical distributions, representing the different factorized components of the full distribution $P(\mathbf{s})$.
Each of these marginal categorical distributions is stored in an according entry of the object array P_s
.
This representation is advantageous for several reasons, but one of the main advantages is sparsity / memory overhead. If we had to store the entire joint distribution $P(\mathbf{s})$ as a product of all the marginals, we could either store that as a single 1D categorical distribution with 2 * 2 * 4
different levels (the combinatorics of all the hidden state dimensionalities), or as a multi-dimensional array with size (2, 2, 4)
, where each entry of the joint Ps[i,j,k]
stores the joint probability of $P(s^1 = i, s^2 = j, s^3 = k)$. Either way, we’d have to store polynomially more values $2 \times 2 \times 4$ is twice the size of $2 + 2 + 4$. But since we know that the marginals are by construction independent, we can just store them separately without losing any information.
Side note: Here’s an alternative way to generate multi-factor categorical distributions.
if you knew what your factors were to begin with, and you wanted to construct the full distribution from the factors, you could use another utility function to build the object array from a list
of the factorized distributions.
import numpy as np
ps1 = np.array([0.5, 0.5])
ps2 = np.array([0.75, 0.25])
ps3 = np.array([0.5, 1./6., 1./6., 1./6.])
P_s = utils.obj_array_from_list([ps1, ps2, ps3])
print(P_s[2])
[0.5 0.16666667 0.16666667 0.16666667]
Object arrays and conditional distributions
Understanding the representation of factorized probability distributions as object arrays is critical for understanding and constructing generative models in pymdp
. This object array representation applies not only to multi-factor, unconditional categorical distributions (like the example above: a joint categorical distribution being factorized into a product of univariate marginals), but also is used to represent collections of conditional categorical distributions.
In particular, we use object arrays to encode the observation and transition models (also known as the observation likelihood and transition likelihood) of the agent’s generative model. Object arrays are necessary in this case, due to the convention of factorizing the observation space into multiple observation modalities and the hidden states into multiple hidden state factors. Mathematically, this can be expressed as follows:
$$ \mathbf{o}t = \prod{i = 1}^{M} \delta_{O_i}(o^{i}t) \hspace{15 mm} \mathbf{s}t = \prod{i = 1}^{F} \delta{S_i}(s^{i}_t) \ $$
where any particular observation $\mathbf{o}_t$ is actually factorized into a product of Dirac delta functions over different modality-specific observations $o^m_t$, i.e. “one-hot” vectors or unit vectors in $\mathbb{R}^{O_m}$. The same factorization applies to multi-factorial hidden states $\mathbf{s}t$. The notation $\delta{X_i}(x^{i})$ denotes a Dirac delta function with discrete support $X_i$ evaluated at point $x^i$.
This factorization of observations across modalities and hidden states across hidden state factors, carries forward into the specification of the A
and B
arrays, the representation of the conditional distributions $P(\mathbf{o}_t|\mathbf{s}t)$ and $P(\mathbf{s}t|\mathbf{s}{t-1}, \mathbf{u}{t-1})$ in pymdp
. These two arrays of conditional distributions can also be factorized by modality and factor, respectively.
The A
array, for instance, contains the agent’s observation model, that relates hidden states $\mathbf{s}_t$ to observations $\mathbf{o}_t$:
$$ \mathbf{A} = {A^1, A^2, …, A^M }, \hspace{5mm} A^m = P(o^m_t | s^1_t, s^2_t, …, s^F_t) $$
Therefore, we represent the A
array as an object array whose constituent arrays are multidimensional numpy.ndarrays
that encode conjunctive relationships between combinations of hidden states and observations.
This is best explained using the example in the original example code at the top of this tutorial. It is custom to build lists of the dimensionalities of the modalities (resp. hidden state factors) of your model, typically using lists named num_obs
(for the dimensionalities of the observation modalities) and num_states
(dimensionalities of the hidden state factors). These lists can then be used to automatically construct the A
array with the correct shape.
num_obs = [3, 5] # observation modality dimensions
num_states = [4, 2, 3] # hidden state factor dimensions
A_array = utils.random_A_matrix(num_obs, num_states) # create sensory likelihood (A array)
for m, _ in enumerate(num_obs):
print(f'Dimensionality of A array for modality {m + 1}: {A_array[m].shape}')
Dimensionality of A array for modality 1: (3, 4, 2, 3)
Dimensionality of A array for modality 2: (5, 4, 2, 3)
For example, A_array[0]
stores the conditional relationships between the hidden states $\mathbf{s}$ and observations within the first modality $o^1_t$, which has dimensionality 3
. This explains why the shape
of A_array[0]
is (3, 4, 2, 3)
– it stores the conditional relationships between each setting of the hidden state factors (which have dimensionalities [4, 2, 3]
) and the observations within the first modality, which has dimensionality 3
. Crucially, each sub-array A[m]
stores the conditional dependencies between all the hidden state factor combinations (configurations of $s^1, s^2, …, s^F$) and the observations along modality m
.
In this case, we used the pymdp.utils
function random_A_matrix()
to generate a random A
array, but in most cases users will want to design their own bespoke observation models (or at least initialize them to some reasonable starting values). In such a scenario, the usual route is to initialize the A
array to a series of identically-valued multidimensional arrays (e.g. arrays filled with 0’s or uniform values), and then fill out the conditional probability entries “by hand”, according to the task the user is interested in modelling.
For this purpose, utility functions like obj_array_zeros
and obj_array_uniform
come in handy. These functions takes as inputs list of shapes, where each shape contains the dimensionality (e.g. [2, 3, 4]
) of one of the multi-dimensional arrays that will populate the final object array. For example, creating this shape list for the A
array, given num_obs
and num_states
is quite straightforward:
A_shapes = [[o_dim] + num_states for o_dim in num_obs]
# initialize the A array to all 0's
empty_A = utils.obj_array_zeros(A_shapes)
# initialize the A array to uniform distributions as columns
# uniform_A = utils.obj_array_uniform(A_shapes)
# then fill out your A matrix elements as appropriate, depending on the task
# empty_A[0][:,0,1,2] = np.array([0.7, 0.15, 0.15])
# empty_A[0][:,...] = ....
Similarly, we can create B
arrays that encode the temporal dependencies between the hidden state factors over time, further conditioned on control factors (or state factors that correspond to action-states). Mathematically, the B
arrays are expressed as follows:
$$ \mathbf{B} = {B^1, B^2, …, B^F }, \hspace{5mm} B^f = P(s^f_t | s^f_{t-1}, u^f_{t-1}) $$
where $u^f_{t-1}$ denotes the control state for control factor $f$, taken at time $t-1$.
We can construct these matrices this by introducing an additional list num_controls
that stores the dimensionalities of each control factor, which in combination with num_states
can be used to specify the B
array.
num_states = [4, 2, 3] # hidden state factor dimensions
num_controls = [4, 1, 1] # control state factor dimensions
B_array = utils.random_B_matrix(num_states, num_controls) # create transition likelihood (B matrix)
for f, _ in enumerate(num_states):
print(f'Dimensionality of B array for hidden state factor {f + 1}: {B_array[f].shape}')
Dimensionality of B array for hidden state factor 1: (4, 4, 4)
Dimensionality of B array for hidden state factor 2: (2, 2, 1)
Dimensionality of B array for hidden state factor 3: (3, 3, 1)
As we can see here, each factor-specific B
array (e.g. B[2]
) has shape (num_states[f], num_states[f], num_controls[f])
, encoding the conditional dependencies between states for a given factor f
at subsequent timepoints, further conditioned on actions or control states along that factor. This construction means that hidden state factors are statistically independent of one another – i.e. hidden state factor f
is only affected by its own state at the previous timestep, as well as the state of the f
-th control factor.
And of course, as with the A
array, we can also pre-allocate our B
arrays to have the correct shape and then fill them out by hand, using the following template:
B_shapes = [[s_dim, s_dim, num_controls[f]] for f, s_dim in enumerate(num_states)]
# initialize the B array to all 0's
empty_B = utils.obj_array_zeros(B_shapes)
# initialize the B array to uniform distributions as columns
# uniform_B = utils.obj_array_uniform(B_shapes)
# then assign your B matrix elements as appropriate, depending on the task
# empty_B[0][0,:,0] = 1.
# empty_B[0][1,:,1] = 1.
Handy functions for dealing with object arrays.
The utils
library is also equipped with a set of handy functions for dealing with object arrays – this includes operations like normalization and sampling.
Normalization
For instance, let’s imagine you generated an A
array by hand with the following structure:
num_obs = [3]
num_states = [3, 2]
A_array = utils.obj_array(len(num_obs)) # create an empty object array with as many elements as there are observation modalities (here, only 1 modality)
A_array[0] = np.zeros(num_obs + num_states) # initialize the likelihood mapping for the single modality
A_array[0][:,:,0] = np.eye(3) # assign the mapping between hidden states of factor 0 and observations, given factor 1 == 0
A_array[0][0,:,1] = 0.5 # assign the mapping between hidden states of factor 0 and observation of level 0, given factor 1 == 1
This A
array corresponds to an observation model with a single modality and two hidden state factors.
We can check if this A
array is a proper conditional distribution by using the utils
function is_normalized
, which checks to make sure each column of each sub-array within A
sums to 1
utils.is_normalized(A_array)
False
This A
is not properly normalized because of this line:
A_array[0][0,:,1] = 0.5
This is problematic, because we’ve only assigned a single outcome to have 50% probability under all the hidden state configurations of factor 0, given factor 1 == level 1. That means each column of A_array[0][:,:,1]
is not a proper conditional distribution, since there’s still 50% of ‘unaccounted’ probability. In particular, the distributions encoded in the columns of this array do not each sum to 1.
We can fix this by hand (by changing our code to allocate the remaining 50% of probability for each of those columns) or, depending on the intended final distributions, by using the norm_dist_obj_arr
function, which column-normalizes every numpy.ndarray
contained within an object array.
A_array = utils.norm_dist_obj_arr(A_array)
print(f'P(o0 | s0, s1 == 1) after normalization: \n {A_array[0][:,:,1]}\n')
print(f'Is the full distribution is now normalized? {"Yes!" if utils.is_normalized(A_array) else "No!"}')
P(o0 | s0, s1 == 1) after normalization:
[[1. 1. 1.]
[0. 0. 0.]
[0. 0. 0.]]
Is the full distribution is now normalized? Yes!
Sampling
Sometimes we will also want to sample from categorical probability distributions, for instance when sampling observations from the generative process or sampling actions from the agent’s posterior over policies. We have two functions in utils
, sample
and sample_obj_arr
that can deal with sampling from either single probability distributions (numpy.ndarray
of ndim == 1
) as well as collections of factorized distributions, stored in object arrays.
# sample from a single random categorical distribution
my_categorical = np.random.rand(3)
my_categorical = utils.norm_dist(my_categorical) # the `utils.norm_dist` function works on single vectors or matrices that represent non-factorized probability distributions
sampled_outcome = utils.sample(my_categorical)
print(sampled_outcome)
1
# sample from a multi-factor categorical distribution
my_multifactor_categorical = utils.random_single_categorical([3, 4, 10]) # creates a collection of 3 marginal categorical distributions, each of different dimension
sampled_outcome_list = utils.sample_obj_array(my_multifactor_categorical)
print(sampled_outcome_list)
[2, 2, 1]
Note that when sampling from categorical distributions, the returned values are integers that represent the index of the support (i.e. which “outcome” or “level”) that is sampled. In the case of multi-factor distributions, the returned value is a list of integers, one sampled outcome per marginal categorical distribution
Conditional expectation
We can also compute conditional expectations of single random variables that depend on a multi-factor random variables using a function from the pymdp.maths
utility package called spm_dot
, that is based on the original function of the same name from SPM
, where active inference was first implemented.
Let’s imagine we have an observation model as above, with one observation modality with dimensionality 3
and two hidden state factors, with dimensionalities 3
and 2
. Given a factorized distribution over the two hidden state factors (i.e. an object array of marginal categorical distributions, one with 3
levels and the other with 2
levels), we want to generate a conditional expectation over the values of the observation distribution with dimension 3
. In other words, we want to evaluate:
$$ \mathbb{E}{P(\mathbf{s})}[P(o | \mathbf{s})] = \mathbb{E}{P(s^1)P(s^2)}[P(o^1 | s^1, s^2)] $$
given a conditional distribution $P(o^1 | s^1, s^2)$ and some “priors” $P(s^1), P(s^2)$ over hidden states.
Implementing this conditional expectation in practice can be written as a multi-dimensional dot product of the modality-specific A
array (A[0]
) with each of the hidden state factor distributions, $P(s^1)$ and $P(s^2)$. Under the hood, the spm_dot
function of pymdp.maths
achieves this using an optimized numpy
function called einsum
, or einstein summation. Let’s see what computing this conditional expectation with spm_dot
looks like in practice:
from pymdp import maths
num_obs = [3] # dimensionalities of observation modalities (just one here, for simplicity)
num_states = [3, 2] # dimensionalities of hidden state factors
A_shapes = [num_obs + num_states] # shape of the A arrays (only one shape needed, (3, 3, 2))
A_array = utils.obj_array_zeros(A_shapes) # create our conditional distribution using the provided shape list
A_array[0][:,:,0] = np.eye(3) # when s2 == 0, s1 fully determines the value of o1
A_array[0][1,:,1] = 1. # when s2 == 1, the value of o1 is always 2, regardless of the setting of s1
Ps = utils.obj_array_from_list([np.array([0.5, 0.0, 0.5]), np.array([0.75, 0.25])]) # create our "prior" or beliefs about hidden states, that will be used in the expectation
E_o_ps = maths.spm_dot(A_array[0], Ps)
print(f"Expected distribution over observations, given the value of Ps:\n {E_o_ps.reshape(-1,1)}")
Expected distribution over observations, given the value of Ps:
[[0.375]
[0.25 ]
[0.375]]
Tutorial 1: Active inference from scratch
Author: Conor Heins
This tutorial guides you through the construction of an active inference agent “from scratch” in a simple grid-world environment. This tutorial is designed to walk you through the various mathematical operations required for running active inference in discrete state spaces, as well as introduce you to the way discrete probability distributions are represented in pymdp
(through the use of numpy vectors and matrices). We also show you how to compute the expected free energy in terms of risk and ambiguity, and how to use it to plan sequences of actions.
By the end of this tutorial, you should be able to write simple active inference models from scratch using the main components of any POMDP generative model: the A
, B
, C
, and D
arrays, as well as be able to write the active inference loop for an agent engaged in a perception-action exchange with its environment.
Note: When running this notebook in Google Colab, you may have to run !pip install inferactively-pymdp
at the top of the notebook, before you can import pymdp
. That cell is left commented out below, in case you are running this notebook from Google Colab.
# ! pip install inferactively-pymdp
Imports
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
Define some auxiliary functions
Here are some plotting functions that will come in handy throughout the tutorial.
def plot_likelihood(matrix, xlabels = list(range(9)), ylabels = list(range(9)), title_str = "Likelihood distribution (A)"):
"""
Plots a 2-D likelihood matrix as a heatmap
"""
if not np.isclose(matrix.sum(axis=0), 1.0).all():
raise ValueError("Distribution not column-normalized! Please normalize (ensure matrix.sum(axis=0) == 1.0 for all columns)")
fig = plt.figure(figsize = (6,6))
ax = sns.heatmap(matrix, xticklabels = xlabels, yticklabels = ylabels, cmap = 'gray', cbar = False, vmin = 0.0, vmax = 1.0)
plt.title(title_str)
plt.show()
def plot_grid(grid_locations, num_x = 3, num_y = 3 ):
"""
Plots the spatial coordinates of GridWorld as a heatmap, with each (X, Y) coordinate
labeled with its linear index (its `state id`)
"""
grid_heatmap = np.zeros((num_x, num_y))
for linear_idx, location in enumerate(grid_locations):
y, x = location
grid_heatmap[y, x] = linear_idx
sns.set(font_scale=1.5)
sns.heatmap(grid_heatmap, annot=True, cbar = False, fmt='.0f', cmap='crest')
def plot_point_on_grid(state_vector, grid_locations):
"""
Plots the current location of the agent on the grid world
"""
state_index = np.where(state_vector)[0][0]
y, x = grid_locations[state_index]
grid_heatmap = np.zeros((3,3))
grid_heatmap[y,x] = 1.0
sns.heatmap(grid_heatmap, cbar = False, fmt='.0f')
def plot_beliefs(belief_dist, title_str=""):
"""
Plot a categorical distribution or belief distribution, stored in the 1-D numpy vector `belief_dist`
"""
if not np.isclose(belief_dist.sum(), 1.0):
raise ValueError("Distribution not normalized! Please normalize")
plt.grid(zorder=0)
plt.bar(range(belief_dist.shape[0]), belief_dist, color='r', zorder=3)
plt.xticks(range(belief_dist.shape[0]))
plt.title(title_str)
plt.show()
The Basics: categorical distributions
Let’s start by creating a simple categorical distribution $P(x)$. We will represent this distribution numerically with a 1-D numpy
array (i.e. a numpy.ndarray
where ndim == 1
). The discrete entries of this vector can be thought of as the probabilities of seeing each of the $K$ alternative outcomes (the support of the distribution), if we were to sample once from the distribution. We can write the distribution as a vector of probabilities, follows:
$$ P(x) = \begin{bmatrix} P(x = 0) \ P(x = 1)\ \vdots \ P(x = K) \end{bmatrix} $$
We can use numpy
and the norm_dist
function from utils
library of pymdp
to quickly create a random categorical distribution.
from pymdp import utils
my_categorical = np.random.rand(3)
# my_categorical = np.array([0.5, 0.3, 0.8]) # could also just write in your own numbers
my_categorical = utils.norm_dist(my_categorical) # normalizes the distribution so it integrates to 1.0
print(my_categorical.reshape(-1,1)) # we reshape it to display it like a column vector
print(f'Integral of the distribution: {round(my_categorical.sum(), 2)}')
[[0.15716471]
[0.25851934]
[0.58431596]]
Integral of the distribution: 1.0
We can sample a discrete outcome from this distribution using the sample()
function from the utils
module
sampled_outcome = utils.sample(my_categorical)
print(f'Sampled outcome: {sampled_outcome}')
Sampled outcome: 1
Now plot the beliefs using our plot_beliefs()
function that we defined in the beginning.
plot_beliefs(my_categorical, title_str = "A random (unconditional) Categorical distribution")

Now let’s move onto conditional categorical distributions or likelihoods, i.e. how we represent the distribution of one discrete random variable X, conditioned on the settings of another discrete random variable Y.
We would write these mathematically as:
$$ \begin{align} P(X | Y) \end{align} $$
And can represent them using 2-D numpy.ndarrays
i.e. matrices
# initialize it with random numbers
p_x_given_y = np.random.rand(3, 4)
print(p_x_given_y.round(3))
[[0.917 0.81 0.725 0.798]
[0.24 0.574 0.809 0.695]
[0.701 0.471 0.678 0.27 ]]
# normalize it
p_x_given_y = utils.norm_dist(p_x_given_y)
print(p_x_given_y.round(3))
[[0.494 0.437 0.328 0.453]
[0.129 0.309 0.366 0.394]
[0.377 0.254 0.307 0.153]]
Now print the first column (i.e. $P(X | Y = 0)$) and show that it’s a proper categorical distribution
print(p_x_given_y[:,0].reshape(-1,1))
print(f'Integral of P(X|Y=0): {p_x_given_y[:,0].sum()}')
[[0.49361113]
[0.12937199]
[0.37701688]]
Integral of P(X|Y=0): 1.0
So column i
of the matrix p_x_given_y
represents the conditional probability of $X$, given the i
-th level of the random variable $Y$, i.e. $P(X | Y = i)$
Next: taking expectations of random variables using matrix-vector products
""" Create a P(Y) and P(X|Y) using the same numbers from the slides """
p_y = np.array([0.75, 0.25]) # this is already normalized - you don't need to `utils.norm_dist()` it!
# the columns here are already normalized - you don't need to `utils.norm_dist()` it!
p_x_given_y = np.array([[0.6, 0.5],
[0.15, 0.41],
[0.25, 0.09]])
print(p_y.round(3).reshape(-1,1))
print(p_x_given_y.round(3))
[[0.75]
[0.25]]
[[0.6 0.5 ]
[0.15 0.41]
[0.25 0.09]]
Calculate expected value of $X$, given our current belief about $Y$, i.e. $P(Y)$ using a simple matrix-vector product, of the form $\mathbf{A}\mathbf{x}$.
""" Calculate the expectation using numpy's dot product functionality """
# first version of the dot product (using the method of a numpy array)
E_x_wrt_y = p_x_given_y.dot(p_y)
# second version of the dot product (using the function np.dot with two arguments)
# E_x_wrt_y = np.dot(p_x_given_y, p_y)
Now print it and verify the result is a proper categorical distribution (integrates to 1.0)
print(E_x_wrt_y)
print(f'Integral: {E_x_wrt_y.sum().round(3)}')
[0.575 0.215 0.21 ]
Integral: 1.0
A simple environment: Grid-world
Now that we understand categorical distributions and how to take conditional expectations of random variables, with categorical conditional and prior distributions, let’s move onto a worked example of Active Inference, so-called ‘Grid-World’.
Before we write down the generative model $P(o,s)$ of an active inference agent who will navigate Grid-World, let’s start by establishing what exactly this environment is and how we’re going to represent it mathematically.
import itertools
""" Create the grid locations in the form of a list of (Y, X) tuples -- HINT: use itertools """
grid_locations = list(itertools.product(range(3), repeat = 2))
print(grid_locations)
[(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)]
plot_grid(grid_locations)

Building the generative model: $\mathbf{A}$, $\mathbf{B}$, $\mathbf{C}$, and $\mathbf{D}$
1. The A matrix or $P(o\mid s)$.
The generative model’s “prior beliefs” about how hidden states relate to observations
""" Create variables for the storing the dimensionalities of the hidden states and the observations """
n_states = len(grid_locations)
n_observations = len(grid_locations)
print(f'Dimensionality of hidden states: {n_states}')
print(f'Dimensionality of observations: {n_observations}')
Dimensionality of hidden states: 9
Dimensionality of observations: 9
""" Create the A matrix """
A = np.zeros( (n_states, n_observations) )
""" Create an umambiguous or 'noise-less' mapping between hidden states and observations """
np.fill_diagonal(A, 1.0)
# alternative:
# A = np.eye(n_observations, n_states)
Likelihood matrices show relationships between indices of the state space, which are indexed linearly. Remember the (Y, X) coordinates of the grid world don’t necessarily neatly map onto to the locations in the matrix here, which has ‘unwrapped’ the Grid World’s (Y, X) locations into columns/rows.
plot_likelihood(A, title_str = "A matrix or $P(o|s)$")

How could we introduce uncertainty into this A matrix? I.e. how do we introduce ambiguity or uncertainty into the agent’s model of the world?
""" Remind yourself of the mapping between linear indices (0 through 8) and grid locations (Y, X) """
plot_grid(grid_locations)

A_noisy = A.copy()
# this line says: the probability of seeing yourself in location 0, given you're in location 0, is 1/3, AKA P(o == 0 | s == 0) = 0.3333....
A_noisy[0,0] = 1 / 3.0 # corresponds to location (0,0)
# this line says: the probability of seeing yourself in location 1, given you're in location 0, is 1/3, AKA P(o == 1 | s == 0) = 0.3333....
A_noisy[1,0] = 1 / 3.0 # corresponds to one step to the right from (0, 1)
# this line says: the probability of seeing yourself in location 3, given you're in location 0, is 1/3, AKA P(o == 3 | s == 0) = 0.3333....
A_noisy[3,0] = 1 / 3.0 # corresponds to one step down from (1, 0)
plot_likelihood(A_noisy, title_str = 'modified A matrix where location (0,0) is "blurry"')

""" Let's make ake one grid location "ambiguous" in the sense that it could be easily confused with neighbouring locations """
my_A_noisy = A_noisy.copy()
# locations 3 and 7 are the nearest neighbours to location 6
my_A_noisy[3,6] = 1.0 / 3.0
my_A_noisy[6,6] = 1.0 / 3.0
my_A_noisy[7,6] = 1.0 / 3.0
# Alternatively: you could have the probability spread among locations 3, 4, 6, and 7. This is basically saying, that whole lower-left corner of grid-world is blurry, if you're in location 6
# Remember to make sure the A matrix is column normalized. So if you do it this way, with the probabilities spread among 4 perceived locations, then you'll have to make sure the probabilities sum to 1.0
# my_A_noisy[3,6] = 1.0 / 4.0
# my_A_noisy[4,6] = 1.0 / 4.0
# my_A_noisy[6,6] = 1.0 / 4.0
# my_A_noisy[7,6] = 1.0 / 4.0
Now plot the new A matrix, that now has two “blurry” locations
plot_likelihood(my_A_noisy, title_str = "Noisy A matrix now with TWO ambiguous locations")

2. The B matrix or $P(s_{t}\mid s_{t-1}, u_{t-1})$.
The generative model’s “prior beliefs” about (controllable) transitions between hidden states over time. Namely, how do hidden states at time $t$ result from hidden states at some previous time $t-1$. These transition dynamics are further conditioned on some past action $u_t$.
actions = ["UP", "DOWN", "LEFT", "RIGHT", "STAY"]
def create_B_matrix():
B = np.zeros( (len(grid_locations), len(grid_locations), len(actions)) )
for action_id, action_label in enumerate(actions):
for curr_state, grid_location in enumerate(grid_locations):
y, x = grid_location
if action_label == "UP":
next_y = y - 1 if y > 0 else y
next_x = x
elif action_label == "DOWN":
next_y = y + 1 if y < 2 else y
next_x = x
elif action_label == "LEFT":
next_x = x - 1 if x > 0 else x
next_y = y
elif action_label == "RIGHT":
next_x = x + 1 if x < 2 else x
next_y = y
elif action_label == "STAY":
next_x = x
next_y = y
new_location = (next_y, next_x)
next_state = grid_locations.index(new_location)
B[next_state, curr_state, action_id] = 1.0
return B
B = create_B_matrix()
Let’s now explore what it looks to “take” an action, using matrix-vector product of an action-conditioned “slice” of the $\mathbf{B}$ array and a previous state vector
""" Define a starting location"""
starting_location = (1,0)
"""get the linear index of the state"""
state_index = grid_locations.index(starting_location)
""" and create a state vector out of it """
starting_state = utils.onehot(state_index, n_states)
plot_point_on_grid(starting_state, grid_locations)

plot_beliefs(starting_state, "Categorical distribution over the starting state")

Now let’s imagine we’re moving “RIGHT” - write the conditional expectation, that will create the state vector corresponding to the new state after taking a step to the right.
""" Redefine the action here, just for reference """
actions = ["UP", "DOWN", "LEFT", "RIGHT", "STAY"]
""" Generate the next state vector, given the starting state and the B matrix"""
right_action_idx = actions.index("RIGHT")
next_state = B[:,:, right_action_idx].dot(starting_state) # input the indices to the B matrix
""" Plot the next state, after taking the action """
plot_point_on_grid(next_state, grid_locations)

Now let’s imagine we’re moving “DOWN” from where we just landed - write the conditional expectation, that will create the state vector corresponding to the new state after taking a step down.
""" Generate the next state vector, given the previous state and the B matrix"""
prev_state = next_state.copy()
down_action_index = actions.index("DOWN")
next_state = B[:,:,down_action_index].dot(prev_state)
""" Plot the new state vector, after making the movement """
plot_point_on_grid(next_state, grid_locations)

3. The prior over observations: the $\mathbf{C}$ vector or $\tilde{P}(o)$
The (biased) generative model’s prior preference for particular observations, encoded in terms of probabilities.
""" Create an empty vector to store the preferences over observations """
C = np.zeros(n_observations)
""" Choose an observation index to be the 'desired' rewarding index, and fill out the C vector accordingly """
desired_location = (2,2) # choose a desired location
desired_location_index = grid_locations.index(desired_location) # get the linear index of the grid location, in terms of 0 through 8
C[desired_location_index] = 1.0 # set the preference for that location to be 100%, i.e. 1.0
""" Let's look at the prior preference distribution """
plot_beliefs(C, title_str = "Preferences over observations")

Action Selection and the Expected Free Energy
Now we’ll write functions to compute the expected states $Q(s_{t+1}|u_t)$, expected observations $Q(o_{t+1}|u_t)$, the entropy of $P(o|s)$: $\mathbf{H}\left[\mathbf{A}\right]$, and the KL divergence between the expected observations and the prior preferences $\mathbf{C}$.
""" define component functions for computing expected free energy """
def get_expected_states(B, qs_current, action):
""" Compute the expected states one step into the future, given a particular action """
qs_u = B[:,:,action].dot(qs_current)
return qs_u
def get_expected_observations(A, qs_u):
""" Compute the expected observations one step into the future, given a particular action """
qo_u = A.dot(qs_u)
return qo_u
def entropy(A):
""" Compute the entropy of a set of conditional distributions, i.e. one entropy value per column """
H_A = - (A * log_stable(A)).sum(axis=0)
return H_A
def kl_divergence(qo_u, C):
""" Compute the Kullback-Leibler divergence between two 1-D categorical distributions"""
return (log_stable(qo_u) - log_stable(C)).dot(qo_u)
Now let’s imagine we’re in some starting state, like (1,1) N.B. This is the generative process we’re talking about – i.e. the true state of the world
""" Get state index, create state vector for (1,1) """
state_idx = grid_locations.index((1,1))
state_vector = utils.onehot(state_idx, n_states)
plot_point_on_grid(state_vector, grid_locations)

And let’s furthermore assume we start with the (accurate) belief about our location, that we are in location (1,1). So let’s just make our current $Q(s_t)$ equal to the true state vector. You could think of this as if we just did one ‘step’ of inference using our current observation along with precise A
/B
matrices.
""" Make qs_current identical to the true starting state """
qs_current = state_vector.copy()
plot_beliefs(qs_current, title_str ="Where do we believe we are?")

And we prefer to be in state (1,2). So we express that as high probability over the corresponding entry of the $\mathbf{C}$ vector – that happens to be index 5.
plot_grid(grid_locations)

""" Create a preference to be in (1,2) """
desired_idx = grid_locations.index((1,2))
C = utils.onehot(desired_idx, n_observations)
plot_beliefs(C, title_str = "Preferences")

Now to keep things simple, let’s evaluate the expected free energies $\mathbf{G}$ of just two actions that we could take from our current position – either moving LEFT or moving RIGHT. Let’s remind ourselves of what action indices those are
left_idx = actions.index("LEFT")
right_idx = actions.index("RIGHT")
print(f'Action index of moving left: {left_idx}')
print(f'Action index of moving right: {right_idx}')
Action index of moving left: 2
Action index of moving right: 3
And now let’s use the functions we’ve just defined, in combination with our A
, B
, C
arrays and our current posterior qs_current
, to compute the expected free energies for the two actions
""" Compute the expected free energies for moving left vs. moving right """
G = np.zeros(2) # store the expected free energies for each action in here
"""
Compute G for MOVE LEFT here
"""
qs_u_left = get_expected_states(B, qs_current, left_idx)
# alternative
# qs_u_left = B[:,:,left_idx].dot(qs_current)
H_A = entropy(A)
qo_u_left = get_expected_observations(A, qs_u_left)
# alternative
# qo_u_left = A.dot(qs_u_left)
predicted_uncertainty_left = H_A.dot(qs_u_left)
predicted_divergence_left = kl_divergence(qo_u_left, C)
G[0] = predicted_uncertainty_left + predicted_divergence_left
"""
Compute G for MOVE RIGHT here
"""
qs_u_right = get_expected_states(B, qs_current, right_idx)
# alternative
# qs_u_right = B[:,:,right_idx].dot(qs_current)
H_A = entropy(A)
qo_u_right = get_expected_observations(A, qs_u_right)
# alternative
# qo_u_right = A.dot(qs_u_right)
predicted_uncertainty_right = H_A.dot(qs_u_right)
predicted_divergence_right = kl_divergence(qo_u_right, C)
G[1] = predicted_uncertainty_right + predicted_divergence_right
""" Now let's print the expected free energies for the two actions, that we just calculated """
print(f'Expected free energy of moving left: {G[0]}\n')
print(f'Expected free energy of moving right: {G[1]}\n')
Expected free energy of moving left: 36.84136148790473
Expected free energy of moving right: 0.0
Now let’s use formula for the posterior over actions, i.e.
$$ \begin{align} Q(u_t) = \sigma(-\mathbf{G}) \end{align} $$
to compute the probabilities of each action
Q_u = softmax(-G)
""" and print the probability of each action """
print(f'Probability of moving left: {Q_u[0]}')
print(f'Probability of moving right: {Q_u[1]}')
Probability of moving left: 1.0000000000000037e-16
Probability of moving right: 1.0
For usefulness later on, let’s wrap the expected free energy calculations into a function
def calculate_G(A, B, C, qs_current, actions):
G = np.zeros(len(actions)) # vector of expected free energies, one per action
H_A = entropy(A) # entropy of the observation model, P(o|s)
for action_i in range(len(actions)):
qs_u = get_expected_states(B, qs_current, action_i) # expected states, under the action we're currently looping over
qo_u = get_expected_observations(A, qs_u) # expected observations, under the action we're currently looping over
pred_uncertainty = H_A.dot(qs_u) # predicted uncertainty, i.e. expected entropy of the A matrix
pred_div = kl_divergence(qo_u, C) # predicted divergence
G[action_i] = pred_uncertainty + pred_div # sum them together to get expected free energy
return G
Complete Recipe for Active Inference
Sample an observation $o_t$ from the current state of the environment
Perform inference over hidden states i.e., optimize $q(s)$ through free-energy minimization
Calculate expected free energy of actions $\mathbf{G}$
Sample action from the posterior over actions $Q(u_t) \sim \sigma(-\mathbf{G})$.
Use the sampled action $a_t$ to perturb the generative process and go back to step 1.
Let’s start by creating a class that will represent the Grid World environment (i.e., generative process) that our active inference agent will navigate within. Note that we don’t need to specify the generative process in terms of A
and B
matrices - the generative process will be as arbitrary and complex as the environment is. The A
and B
matrices are just the agent’s representation of the world and the task, which in this case happen to capture the Markovian, noiseless dynamics of the world perfectly.
class GridWorldEnv():
def __init__(self,starting_state = (0,0)):
self.init_state = starting_state
self.current_state = self.init_state
print(f'Starting state is {starting_state}')
def step(self,action_label):
(Y, X) = self.current_state
if action_label == "UP":
Y_new = Y - 1 if Y > 0 else Y
X_new = X
elif action_label == "DOWN":
Y_new = Y + 1 if Y < 2 else Y
X_new = X
elif action_label == "LEFT":
Y_new = Y
X_new = X - 1 if X > 0 else X
elif action_label == "RIGHT":
Y_new = Y
X_new = X +1 if X < 2 else X
elif action_label == "STAY":
Y_new, X_new = Y, X
self.current_state = (Y_new, X_new) # store the new grid location
obs = self.current_state # agent always directly observes the grid location they're in
return obs
def reset(self):
self.current_state = self.init_state
print(f'Re-initialized location to {self.init_state}')
obs = self.current_state
print(f'..and sampled observation {obs}')
return obs
env = GridWorldEnv()
Starting state is (0, 0)
Now equipped with an environment and a generative model (our A
, B
, C
, and D
), we can code up the entire active inference loop
To have everything in one place, let’s re-create the whole generative model
""" Fill out the components of the generative model """
A = np.eye(n_observations, n_states)
B = create_B_matrix()
C = utils.onehot(grid_locations.index( (2, 2) ), n_observations) # make the agent prefer location (2,2) (lower right corner of grid world)
D = utils.onehot(grid_locations.index( (1,2) ), n_states) # start the agent with the prior belief that it starts in location (1,2)
actions = ["UP", "DOWN", "LEFT", "RIGHT", "STAY"]
Now let’s initialize the environment with starting state (1,2)
, so that the agent has accurate beliefs about where it’s starting.
env = GridWorldEnv(starting_state = (1,2))
Starting state is (1, 2)
Let’s write a function that runs the whole active inference loop, and then run it for T = 5
timesteps.
""" Write a function that, when called, runs the entire active inference loop for a desired number of timesteps"""
def run_active_inference_loop(A, B, C, D, actions, env, T = 5):
""" Initialize the prior that will be passed in during inference to be the same as `D` """
prior = D.copy() # initial prior should be the D vector
""" Initialize the observation that will be passed in during inference - hint use env.reset()"""
obs = env.reset() # initialize the `obs` variable to be the first observation you sample from the environment, before `step`-ing it.
for t in range(T):
print(f'Time {t}: Agent observes itself in location: {obs}')
# convert the observation into the agent's observational state space (in terms of 0 through 8)
obs_idx = grid_locations.index(obs)
# perform inference over hidden states
qs_current = infer_states(obs_idx, A, prior)
plot_beliefs(qs_current, title_str = f"Beliefs about location at time {t}")
# calculate expected free energy of actions
G = calculate_G(A, B, C, qs_current, actions)
# compute action posterior
Q_u = softmax(-G)
# sample action from probability distribution over actions
chosen_action = utils.sample(Q_u)
# compute prior for next timestep of inference
prior = B[:,:,chosen_action].dot(qs_current)
# update generative process
action_label = actions[chosen_action]
obs = env.step(action_label)
return qs_current
""" Run the function we just wrote, for T = 5 timesteps """
qs = run_active_inference_loop(A, B, C, D, actions, env, T = 5)
Re-initialized location to (1, 2)
..and sampled observation (1, 2)
Time 0: Agent observes itself in location: (1, 2)

Time 1: Agent observes itself in location: (2, 2)

Time 2: Agent observes itself in location: (2, 2)

Time 3: Agent observes itself in location: (2, 2)

Time 4: Agent observes itself in location: (2, 2)

Planning
But there’s a problem. Imagine we started the agent at the top left corner (coordinate (0,0)
) rather than one step away from the target location
D = utils.onehot(grid_locations.index((0,0)), n_states) # let's have the agent believe it starts in location (0,0)
env = GridWorldEnv(starting_state = (0,0))
qs = run_active_inference_loop(A, B, C, D, actions, env, T = 5)
Starting state is (0, 0)
Re-initialized location to (0, 0)
..and sampled observation (0, 0)
Time 0: Agent observes itself in location: (0, 0)

Time 1: Agent observes itself in location: (0, 0)

Time 2: Agent observes itself in location: (0, 0)

Time 3: Agent observes itself in location: (0, 0)

Time 4: Agent observes itself in location: (0, 0)

Because the expected free energy of each action is only evaluated for one timestep in the future, the agent has no way of knowing which action to initially take, to get closer to its final “goal” of (2,2)
. This is because the expected divergence term for all actions $\operatorname{D}_{KL}(Q(o|u) \parallel \mathbf{C})$ is identical for all considered actions (which are just 1-step moves away from starting from (0,0)
). This speaks to the importance of planning, or multiple timestep policies.
Now let’s do active inference with multi-step policies
We can rely on a useful function from pymdp
’s control
module called construct_policies()
to automatically generate a list of all the policies we want to entertain, for a given number of control states (actions) and a desired temporal horizon.
from pymdp.control import construct_policies
policy_len = 4
n_actions = len(actions)
# we have to wrap `n_states` and `n_actions` in a list for reasons that will become clear in Part II
all_policies = construct_policies([n_states], [n_actions], policy_len = policy_len)
print(f'Total number of policies for {n_actions} possible actions and a planning horizon of {policy_len}: {len(all_policies)}')
Total number of policies for 5 possible actions and a planning horizon of 4: 625
Let’s re-write our expected free energy function, but now we loop over policies (sequences of actions), rather than actions (1-step policies)
def calculate_G_policies(A, B, C, qs_current, policies):
G = np.zeros(len(policies)) # initialize the vector of expected free energies, one per policy
H_A = entropy(A) # can calculate the entropy of the A matrix beforehand, since it'll be the same for all policies
for policy_id, policy in enumerate(policies): # loop over policies - policy_id will be the linear index of the policy (0, 1, 2, ...) and `policy` will be a column vector where `policy[t,0]` indexes the action entailed by that policy at time `t`
t_horizon = policy.shape[0] # temporal depth of the policy
G_pi = 0.0 # initialize expected free energy for this policy
for t in range(t_horizon): # loop over temporal depth of the policy
action = policy[t,0] # action entailed by this particular policy, at time `t`
# get the past predictive posterior - which is either your current posterior at the current time (not the policy time) or the predictive posterior entailed by this policy, one timstep ago (in policy time)
if t == 0:
qs_prev = qs_current
else:
qs_prev = qs_pi_t
qs_pi_t = get_expected_states(B, qs_prev, action) # expected states, under the action entailed by the policy at this particular time
qo_pi_t = get_expected_observations(A, qs_pi_t) # expected observations, under the action entailed by the policy at this particular time
kld = kl_divergence(qo_pi_t, C) # Kullback-Leibler divergence between expected observations and the prior preferences C
G_pi_t = H_A.dot(qs_pi_t) + kld # predicted uncertainty + predicted divergence, for this policy & timepoint
G_pi += G_pi_t # accumulate the expected free energy for each timepoint into the overall EFE for the policy
G[policy_id] += G_pi
return G
Let’s write a function for computing the action posterior, given the posterior probability of each policy:
def compute_prob_actions(actions, policies, Q_pi):
P_u = np.zeros(len(actions)) # initialize the vector of probabilities of each action
for policy_id, policy in enumerate(policies):
P_u[int(policy[0,0])] += Q_pi[policy_id] # get the marginal probability for the given action, entailed by this policy at the first timestep
P_u = utils.norm_dist(P_u) # normalize the action probabilities
return P_u
Now we can write a new active inference function, that uses temporally-deep planning
def active_inference_with_planning(A, B, C, D, n_actions, env, policy_len = 2, T = 5):
""" Initialize prior, first observation, and policies """
prior = D # initial prior should be the D vector
obs = env.reset() # get the initial observation
policies = construct_policies([n_states], [n_actions], policy_len = policy_len)
for t in range(T):
print(f'Time {t}: Agent observes itself in location: {obs}')
# convert the observation into the agent's observational state space (in terms of 0 through 8)
obs_idx = grid_locations.index(obs)
# perform inference over hidden states
qs_current = infer_states(obs_idx, A, prior)
plot_beliefs(qs_current, title_str = f"Beliefs about location at time {t}")
# calculate expected free energy of actions
G = calculate_G_policies(A, B, C, qs_current, policies)
# to get action posterior, we marginalize P(u|pi) with the probabilities of each policy Q(pi), given by \sigma(-G)
Q_pi = softmax(-G)
# compute the probability of each action
P_u = compute_prob_actions(actions, policies, Q_pi)
# sample action from probability distribution over actions
chosen_action = utils.sample(P_u)
# compute prior for next timestep of inference
prior = B[:,:,chosen_action].dot(qs_current)
# step the generative process and get new observation
action_label = actions[chosen_action]
obs = env.step(action_label)
return qs_current
Now run it to see if the agent is able to navigate to location (2,2)
D = utils.onehot(grid_locations.index((0,0)), n_states) # let's have the agent believe it starts in location (0,0) (upper left corner)
env = GridWorldEnv(starting_state = (0,0))
qs_final = active_inference_with_planning(A, B, C, D, n_actions, env, policy_len = 3, T = 10)
Starting state is (0, 0)
Re-initialized location to (0, 0)
..and sampled observation (0, 0)
Time 0: Agent observes itself in location: (0, 0)

Time 1: Agent observes itself in location: (0, 0)

Time 2: Agent observes itself in location: (0, 0)

Time 3: Agent observes itself in location: (0, 0)

Time 4: Agent observes itself in location: (1, 0)

Time 5: Agent observes itself in location: (1, 1)

Time 6: Agent observes itself in location: (2, 1)

Time 7: Agent observes itself in location: (2, 2)

Time 8: Agent observes itself in location: (2, 2)

Time 9: Agent observes itself in location: (2, 2)

Tutorial 2: the Agent
API
Author: Conor Heins
This tutorial introduces the Agent
class, the main API offered by pymdp
that allows you to abstract away all the mathematical nuances involved in running active inference processes. All you have to do is build a generative model in terms of A
, B
, C
, and D
arrays, plug them into the Agent()
constructor, and then start running active inference processes using the desired functions of the Agent
class, like self.infer_states()
and self.infer_policies()
.
We demonstrate the use of the Agent
class by building an active inference agent to play a simple explore/exploit task, what we call the “epistemic 2-armed bandit”. The task structure is identical to the “explore/exploit” task described in Smith et al. (2022): “A Step-by-Step Tutorial on Active Inference Modelling and its Application to Empirical Data”. The task is a modified version of a classic multi-armed bandit problem, with particular changes that make it well suited to demonstrating the information-availing nature of active inference in discrete state spaces.
Note: When running this notebook in Google Colab, you may have to run !pip install inferactively-pymdp
at the top of the notebook, before you can import pymdp
. That cell is left commented out below, in case you are running this notebook from Google Colab.
# ! pip install inferactively-pymdp
Imports
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
Define some auxiliary functions
Here are some plotting functions that will come in handy throughout the tutorial.
def plot_likelihood(matrix, title_str = "Likelihood distribution (A)"):
"""
Plots a 2-D likelihood matrix as a heatmap
"""
if not np.isclose(matrix.sum(axis=0), 1.0).all():
raise ValueError("Distribution not column-normalized! Please normalize (ensure matrix.sum(axis=0) == 1.0 for all columns)")
fig = plt.figure(figsize = (6,6))
ax = sns.heatmap(matrix, cmap = 'gray', cbar = False, vmin = 0.0, vmax = 1.0)
plt.title(title_str)
plt.show()
def plot_beliefs(belief_dist, title_str=""):
"""
Plot a categorical distribution or belief distribution, stored in the 1-D numpy vector `belief_dist`
"""
if not np.isclose(belief_dist.sum(), 1.0):
raise ValueError("Distribution not normalized! Please normalize")
plt.grid(zorder=0)
plt.bar(range(belief_dist.shape[0]), belief_dist, color='r', zorder=3)
plt.xticks(range(belief_dist.shape[0]))
plt.title(title_str)
plt.show()
More complex generative models
In this notebook we will build a more complicated generative model, with the following goals in mind:
Demonstrate the unique behavior of active inference agents, as opposed to other agent-based approaches to solving POMDPs (e.g. utility-maximization / reinforcement learning)
Provide a task example that is more relevant to decision-making research and computational psychiatry.
Take advantage of the functionalities of
pymdp
, especially theAgent()
class, to abstract away all the mathematical operations involved with inference and planning under active inference.
Before we dive into specifying this generative model, we need to talk about observation modalities and hidden state factors…
The data structure we use in pymdp
to represent different hidden state factors and observation modalities is the object array
(referred to in other contexts as ‘arrays of arrays’ or ‘jagged/ragged arrays’).
They are no different than standard numpy ndarrays
(they can arbitrarily multidimensional shape), except for the fact that their contents are unrestricted in terms of type. Their dtype
is object
, which means that their elements can be any kind of Python data structure – including other arrays!
In case you’re coming from a MATLAB background: these object arrays are the equivalent of cell arrays.
In pymdp
, we represent a multi-factor $\mathbf{B}$ array as an object array which has as many elements as there are hidden state factors. Therefore, each element B[i]
of the object array contains the B
matrix for hidden state factor i
.
import pymdp
from pymdp import utils
Explore/exploit task with a epistemic two-armed bandit
Now we’re going to build a generative model for an active inference agent playing a two-armed bandit task. The multi-armed bandit (or MAB) is a classic decision-making task that captures the core features of the the “explore/exploit tradeoff”. The multi-armed bandit formulation is ubiquitous across problem spaces that require sequential decision-making under uncertainty – this includes disciplines ranging from economics, neuroscience, machine learning, engineering all the way to advertising.
In the standard MAB problem formulation, an agent has must choose between mutually-exclusive alternatives (also known as ‘arms’) in order to maximize reward over time. The probability of reward depends on which arm the agent chooses. A common real-world analogy for a MAB problem is imagining a special slot machine with three possible levers to pull (rather than the usual one), where playing each lever has different probabilities of payoff (e.g. getting a winning combination of symbols or bonus). In fact, the ‘standard’ slot machine, which usually only only one lever, was historically referred to as a ‘one-armed bandit’ – this was the direct ancestor of the name for the generic machine learning / decision-making problem class, the MAB.
Crucially, MAB problems are interesting and difficult because in general, the reward statistics of each arm are unknown or only partially known. In a probabilistic or Bayesian context, an agent must therefore act based on beliefs about the reward statistics, since they don’t have perfect access to this information.
The inherent partial-observability of the ask creates a conflict between exploitation or choosing the arm that is currently believed to be most rewarding, and exploration or gathering information about the remaining arms, in the hopes of discovering a potentially more rewarding option.
The fact that expected reward or utility is contextualized by beliefs – i.e. which arm is currently thought to be the most rewarding – motivates the use of active inference in this context. This is because the key objective function for action-selection, the expected free energy $\mathbf{G}$, depends on the agent’s beliefs about the world. And not only that, but expected free energy balances the desire to maximize rewards with the drive to resolve uncertainty about unknown parts of the agent’s model. The more accurate the agent’s beliefs are, the more faithfully decision-making can be guided by maximizing expected utility or rewards.
MAB with an epistemic twist
In the MAB formulation we’ll be exploring in this tutorial, the agent must choose to play among two possible arms, each of which has unknown reward statistics. These reward statistics take the form of Bernoulli distributions over two possible reward outcomes: “Loss” and “Reward”. However, one of the arms has probability $p$ of yielding “Reward”” and probabiliity $(1-p)$ of yielding “Loss”. The other arm has swapped statistics. In this example, agent knows that the bandit has this general reward structure, except they don’t know which of the two arms is the rewarding one (the arm where reward probability is $p$, assuming $p \in [0.5, 1]$).
However, we introduce an additional feature in the environment, that induces an explicit trade-off in decision-making between exploration (information-seeking) and exploitation (reward-seeking). In this special “epistemic bandit” problem, there is an additional action available to the agent, besides playing the two arms. We call this third action “Get Hint”, and allows the agent to acquire information reveals (potentially probabilistically) which arm is the more rewarding one. There is a trade-off here, because by choosing to acquire the hint, the agent forgoes the possibility of playing an arm and thus the possibility of getting a reward at that moment. The mutual exclusivity of hint-acquisition and arm-playing imbues the system with an explore/exploit trade-off, which active inference is particularly equipped to handle, when compared to simple reinforcement learning schemes (e.g. epsilon-greedy reward maximization).
Specify the dimensionalities of the hidden state factors, the control factors, and the observation modalities
context_names = ['Left-Better', 'Right-Better']
choice_names = ['Start', 'Hint', 'Left Arm', 'Right Arm']
""" Define `num_states` and `num_factors` below """
num_states = [len(context_names), len(choice_names)]
num_factors = len(num_states)
context_action_names = ['Do-nothing']
choice_action_names = ['Move-start', 'Get-hint', 'Play-left', 'Play-right']
""" Define `num_controls` below """
num_controls = [len(context_action_names), len(choice_action_names)]
hint_obs_names = ['Null', 'Hint-left', 'Hint-right']
reward_obs_names = ['Null', 'Loss', 'Reward']
choice_obs_names = ['Start', 'Hint', 'Left Arm', 'Right Arm']
""" Define `num_obs` and `num_modalities` below """
num_obs = [len(hint_obs_names), len(reward_obs_names), len(choice_obs_names)]
num_modalities = len(num_obs)
The A
array
Note: Unlike in Tutorial 1: Active inference from scratch, here we we will be building a more complex generative model more than one hidden state factor and observation modality. This leads to accordingly higher-dimensional A
and B
arrays. See pymdp fundamentals for more details on how we specify these high-dimensional arrays in NumPy
.
""" Generate the A array """
A = utils.obj_array( num_modalities )
Fill out the hint modality, a sub-array of A
which we’ll call A_hint
p_hint = 0.7 # accuracy of the hint, according to the agent's generative model (how much does the agent trust the hint?)
A_hint = np.zeros( (len(hint_obs_names), len(context_names), len(choice_names)) )
for choice_id, choice_name in enumerate(choice_names):
if choice_name == 'Start':
A_hint[0,:,choice_id] = 1.0
elif choice_name == 'Hint':
A_hint[1:,:,choice_id] = np.array([[p_hint, 1.0 - p_hint],
[1.0 - p_hint, p_hint]])
elif choice_name == 'Left Arm':
A_hint[0,:,choice_id] = 1.0
elif choice_name == 'Right Arm':
A_hint[0,:,choice_id] = 1.0
A[0] = A_hint
plot_likelihood(A[0][:,:,1], title_str = "Probability of the two hint types, for the two game states")

Fill out the reward modality, a sub-array of A
which we’ll call A_rew
p_reward = 0.8 # probability of getting a rewarding outcome, if you are sampling the more rewarding bandit
A_reward = np.zeros((len(reward_obs_names), len(context_names), len(choice_names)))
for choice_id, choice_name in enumerate(choice_names):
if choice_name == 'Start':
A_reward[0,:,choice_id] = 1.0
elif choice_name == 'Hint':
A_reward[0,:,choice_id] = 1.0
elif choice_name == 'Left Arm':
A_reward[1:,:,choice_id] = np.array([ [1.0-p_reward, p_reward],
[p_reward, 1.0-p_reward]])
elif choice_name == 'Right Arm':
A_reward[1:, :, choice_id] = np.array([[ p_reward, 1.0- p_reward],
[1- p_reward, p_reward]])
A[1] = A_reward
plot_likelihood(A[1][:,:,2], 'Payoff structure if playing the Left Arm, for the two contexts')

Fill out the choice observation modality, a sub-array of A
which we’ll call A_choice
A_choice = np.zeros((len(choice_obs_names), len(context_names), len(choice_names)))
for choice_id in range(len(choice_names)):
A_choice[choice_id, :, choice_id] = 1.0
A[2] = A_choice
""" Condition on context (first hidden state factor) and display the remaining indices (outcome and choice state) """
plot_likelihood(A[2][:,0,:], "Mapping between sensed states and true states")

The B
array
B = utils.obj_array(num_factors)
Fill out the context state factor dynamics, a sub-array of B
which we’ll call B_context
B_context = np.zeros( (len(context_names), len(context_names), len(context_action_names)) )
B_context[:,:,0] = np.eye(len(context_names))
B[0] = B_context
Fill out the choice factor dynamics, a sub-array of B
which we’ll call B_choice
B_choice = np.zeros( (len(choice_names), len(choice_names), len(choice_action_names)) )
for choice_i in range(len(choice_names)):
B_choice[choice_i, :, choice_i] = 1.0
B[1] = B_choice
The C
vectors
""" Explain `obj_array_zeros` and how you don't have to populate them necessarily """
C = utils.obj_array_zeros(num_obs)
from pymdp.maths import softmax
C_reward = np.zeros(len(reward_obs_names))
C_reward[1] = -4.0
C_reward[2] = 2.0
C[1] = C_reward
plot_beliefs(softmax(C_reward), title_str = "Prior preferences")

The D
vectors
D = utils.obj_array(num_factors)
D_context = np.array([0.5,0.5])
D[0] = D_context
D_choice = np.zeros(len(choice_names))
D_choice[choice_names.index("Start")] = 1.0
D[1] = D_choice
print(f'Beliefs about which arm is better: {D[0]}')
print(f'Beliefs about starting location: {D[1]}')
Beliefs about which arm is better: [0.5 0.5]
Beliefs about starting location: [1. 0. 0. 0.]
Constructing an Agent
from pymdp.agent import Agent
my_agent = Agent(A = A, B = B, C = C, D = D)
Define a class for the 2-armed bandit environment (AKA the generative process)
class TwoArmedBandit(object):
def __init__(self, context = None, p_hint = 1.0, p_reward = 0.8):
self.context_names = ["Left-Better", "Right-Better"]
if context == None:
self.context = self.context_names[utils.sample(np.array([0.5, 0.5]))] # randomly sample which bandit arm is better (Left or Right)
else:
self.context = context
self.p_hint = p_hint
self.p_reward = p_reward
self.reward_obs_names = ['Null', 'Loss', 'Reward']
self.hint_obs_names = ['Null', 'Hint-left', 'Hint-right']
def step(self, action):
if action == "Move-start":
observed_hint = "Null"
observed_reward = "Null"
observed_choice = "Start"
elif action == "Get-hint":
if self.context == "Left-Better":
observed_hint = self.hint_obs_names[utils.sample(np.array([0.0, self.p_hint, 1.0 - self.p_hint]))]
elif self.context == "Right-Better":
observed_hint = self.hint_obs_names[utils.sample(np.array([0.0, 1.0 - self.p_hint, self.p_hint]))]
observed_reward = "Null"
observed_choice = "Hint"
elif action == "Play-left":
observed_hint = "Null"
observed_choice = "Left Arm"
if self.context == "Left-Better":
observed_reward = self.reward_obs_names[utils.sample(np.array([0.0, 1.0 - self.p_reward, self.p_reward]))]
elif self.context == "Right-Better":
observed_reward = self.reward_obs_names[utils.sample(np.array([0.0, self.p_reward, 1.0 - self.p_reward]))]
elif action == "Play-right":
observed_hint = "Null"
observed_choice = "Right Arm"
if self.context == "Right-Better":
observed_reward = self.reward_obs_names[utils.sample(np.array([0.0, 1.0 - self.p_reward, self.p_reward]))]
elif self.context == "Left-Better":
observed_reward = self.reward_obs_names[utils.sample(np.array([0.0, self.p_reward, 1.0 - self.p_reward]))]
obs = [observed_hint, observed_reward, observed_choice]
return obs
Now we’ll write a function that will take the agent, the environment, and a time length and run the active inference loop
def run_active_inference_loop(my_agent, my_env, T = 5):
""" Initialize the first observation """
obs_label = ["Null", "Null", "Start"] # agent observes itself seeing a `Null` hint, getting a `Null` reward, and seeing itself in the `Start` location
obs = [hint_obs_names.index(obs_label[0]), reward_obs_names.index(obs_label[1]), choice_obs_names.index(obs_label[2])]
for t in range(T):
qs = my_agent.infer_states(obs)
plot_beliefs(qs[0], title_str = f"Beliefs about the context at time {t}")
q_pi, efe = my_agent.infer_policies()
chosen_action_id = my_agent.sample_action()
movement_id = int(chosen_action_id[1])
choice_action = choice_action_names[movement_id]
obs_label = my_env.step(choice_action)
obs = [hint_obs_names.index(obs_label[0]), reward_obs_names.index(obs_label[1]), choice_obs_names.index(obs_label[2])]
print(f'Action at time {t}: {choice_action}')
print(f'Reward at time {t}: {obs_label[1]}')
Now all we have to do is define the two-armed bandit environment, choose the length of the simulation, and run the function we wrote above.
Try playing with the hint accuracy and/or reward statistics of the environment - remember this is different than the agent’s representation of the reward statistics (i.e. the agent’s generative model, e.g. the A or B matrices).
p_hint_env = 1.0 # this is the "true" accuracy of the hint - i.e. how often does the hint actually signal which arm is better. REMEMBER: THIS IS INDEPENDENT OF HOW YOU PARAMETERIZE THE A MATRIX FOR THE HINT MODALITY
p_reward_env = 0.7 # this is the "true" reward probability - i.e. how often does the better arm actually return a reward, as opposed to a loss. REMEMBER: THIS IS INDEPENDENT OF HOW YOU PARAMETERIZE THE A MATRIX FOR THE REWARD MODALITY
env = TwoArmedBandit(p_hint = p_hint_env, p_reward = p_reward_env)
print(f'Context: {env.context}')
T = 10
# my_agent = Agent(A = A, B = B, C = C, D = D) # in case you want to re-define the agent, you can run this again
run_active_inference_loop(my_agent, env, T = T)
Context: Left-Better

Action at time 0: Get-hint
Reward at time 0: Null

Action at time 1: Get-hint
Reward at time 1: Null

Action at time 2: Play-left
Reward at time 2: Loss

Action at time 3: Get-hint
Reward at time 3: Null

Action at time 4: Play-left
Reward at time 4: Loss

Action at time 5: Get-hint
Reward at time 5: Null

Action at time 6: Get-hint
Reward at time 6: Null

Action at time 7: Play-left
Reward at time 7: Reward

Action at time 8: Play-left
Reward at time 8: Reward

Action at time 9: Play-left
Reward at time 9: Loss
Now let’s try manipulating the agent’s prior preferences over reward observations ($\mathbf{C}[1]$) in order to examine the tension between exploration and exploitation.
# change the 'shape' of the agent's reward function
C[1][1] = 0.0 # makes the Loss "less aversive" than before (higher prior prior probability assigned to seeing the Loss outcome). This should make the agent less risk-averse / willing to explore both arms, under uncertainty
my_agent = Agent(A = A, B = B, C = C, D = D) # redefine the agent with the new preferences
env = TwoArmedBandit(p_hint = 0.8, p_reward = 0.8) # re-initialize the environment -- this time, the hint is not always accurate (`p_hint = 0.8`)
print(f'Context: {env.context}')
run_active_inference_loop(my_agent, env, T = T)
Context: Right-Better

Action at time 0: Play-left
Reward at time 0: Loss

Action at time 1: Play-right
Reward at time 1: Loss

Action at time 2: Play-right
Reward at time 2: Reward

Action at time 3: Play-right
Reward at time 3: Loss

Action at time 4: Play-left
Reward at time 4: Loss

Action at time 5: Play-right
Reward at time 5: Loss

Action at time 6: Play-left
Reward at time 6: Loss

Action at time 7: Play-right
Reward at time 7: Reward

Action at time 8: Play-right
Reward at time 8: Reward

Action at time 9: Play-right
Reward at time 9: Reward
Active Inference Demo: T-Maze Environment
Author: Conor Heins
This demo notebook provides a full walk-through of active inference using the Agent()
class of pymdp
. The canonical example used here is the ‘T-maze’ task, often used in the active inference literature in discussions of epistemic behavior (see, for example, “Active Inference and Epistemic Value”)
Note: When running this notebook in Google Colab, you may have to run !pip install inferactively-pymdp
at the top of the notebook, before you can import pymdp
. That cell is left commented out below, in case you are running this notebook from Google Colab.
# ! pip install inferactively-pymdp
Imports
First, import pymdp
and the modules we’ll need.
import os
import sys
import pathlib
import numpy as np
import copy
from pymdp.agent import Agent
from pymdp.utils import plot_beliefs, plot_likelihood
from pymdp import utils
from pymdp.envs import TMazeEnv
Auxiliary Functions
Define some utility functions that will be helpful for plotting.
Environment
Here we consider an agent navigating a three-armed ‘T-maze,’ with the agent starting in a central location of the maze. The bottom arm of the maze contains an informative cue, which signals in which of the two top arms (‘Left’ or ‘Right’, the ends of the ‘T’) a reward is likely to be found.
At each timestep, the environment is described by the joint occurrence of two qualitatively-different ‘kinds’ of states (hereafter referred to as hidden state factors). These hidden state factors are independent of one another.
The first hidden state factor (Location
) is a discrete random variable with 4 levels, that encodes the current position of the agent. Each of its four levels can be mapped to the following values: {CENTER
, RIGHT ARM
, LEFT ARM
, or CUE LOCATION
}. The random variable Location
taking a particular value can be represented as a one-hot vector with a 1
at the appropriate level, and 0
’s everywhere else. For example, if the agent is in the CUE LOCATION
, the current state of this factor would be $s_1 = \begin{bmatrix} 0 & 0 & 0 & 1 \end{bmatrix}$.
We represent the second hidden state factor (Reward Condition
) is a discrete random variable with 2 levels, that encodes the reward condition of the trial: {Reward on Right
, or Reward on Left
}. A trial where the condition is reward is Reward on Left
is thus encoded as the state $s_2 = \begin{bmatrix} 0 & 1\end{bmatrix}$.
The environment is designed such that when the agent is located in the RIGHT ARM
and the reward condition is Reward on Right
, the agent has a specified probability $a$ (where $a > 0.5$) of receiving a reward, and a low probability $b = 1 - a$ of receiving a ‘loss’ (we can think of this as an aversive or unpreferred stimulus). If the agent is in the LEFT ARM
for the same reward condition, the reward probabilities are swapped, and the agent experiences loss with probability $a$, and reward with lower probability $b = 1 - a$. These reward contingencies are intuitively swapped for the Reward on Left
condition.
For instance, we can encode the state of the environment at the first time step in a Reward on Right
trial with the following pair of hidden state vectors: $s_1 = \begin{bmatrix} 1 & 0 & 0 & 0\end{bmatrix}$, $s_2 = \begin{bmatrix} 1 & 0\end{bmatrix}$, where we assume the agent starts sitting in the central location. If the agent moved to the right arm, then the corresponding hidden state vectors would now be $s_1 = \begin{bmatrix} 0 & 1 & 0 & 0\end{bmatrix}$, $s_2 = \begin{bmatrix} 1 & 0\end{bmatrix}$. This highlights the independence of the two hidden state factors. In other words, the location of the agent ($s_1$) can change without affecting the identity of the reward condition ($s_2$).
Initialize environment
Now we can initialize the T-maze environment using the built-in TMazeEnv
class from the pymdp.envs
module.
Choose reward probabilities $a$ and $b$, where $a$ and $b$ are the probabilities of reward / loss in the ‘correct’ arm, and the probabilities of loss / reward in the ‘incorrect’ arm. Which arm counts as ‘correct’ vs. ‘incorrect’ depends on the reward condition (state of the 2nd hidden state factor).
reward_probabilities = [0.98, 0.02] # probabilities used in the original SPM T-maze demo
Initialize an instance of the T-maze environment
env = TMazeEnv(reward_probs = reward_probabilities)
Structure of the state –> outcome mapping
We can ‘peer into’ the rules encoded by the environment (also known as the generative process ) by looking at the probability distributions that map from hidden states to observations. Following the SPM version of active inference, we refer to this collection of probabilistic relationships as the A
array. In the case of the true rules of the environment, we refer to this array as A_gp
(where the suffix _gp
denotes the generative process).
It is worth outlining what constitute the agent’s observations in this task. In this T-maze demo, we have three sensory channels or observation modalities: Location
, Reward
, and Cue
.
The
Location
observation values are identical to theLocation
hidden state values. In this case, the agent always unambiguously observes its own state - if the agent is inRIGHT ARM
, it receives aRIGHT ARM
observation in the corresponding modality. This might be analogized to a ‘proprioceptive’ sense of one’s own place.The
Reward
observation modality assumes the valuesNo Reward
,Reward
orLoss
. TheNo Reward
(index 0) observation is observed whenever the agent isn’t occupying one of the two T-maze arms (the right or left arms). TheReward
(index 1) andLoss
(index 2) observations are observed in the right and left arms of the T-maze, with associated probabilities that depend on the reward condition (i.e. on the value of the second hidden state factor).The
Cue
observation modality assumes the valuesCue Right
,Cue Left
. This observation unambiguously signals the reward condition of the trial, and therefore in which arm theReward
observation is more probable. When the agent occupies the other arms, theCue
observation will beCue Right
orCue Left
with equal probability. However (as we’ll see below when we intialise the agent), the agent’s beliefs about the likelihood mapping render these observations uninformative and irrelevant to state inference.
In pymdp
, we store the set of probability distributions encoding the conditional probabilities of observations, under different configurations of hidden states, as a set of matrices referred to as the likelihood mapping or A
array (this is a convention borrowed from SPM). The likelihood mapping for a single modality is stored as a single multidimensional array (a numpy.ndarray
) A[m]
with the larger likelihood array, where m
is the index of the corresponding modality. Each modality-specific A array has num_obs[m]
rows, and as many lagging dimensions (e.g. columns, ‘slices’ and higher-order dimensions) as there are hidden state factors. num_obs[m]
tells you the number of observation values (also known as “levels”) for observation modality m
.
# here, we can get the likelihood mapping directly from the environmental class. So this is the likelihood mapping that truly describes the relatinoship between the
# environment's hidden state and the observations the agent will get
A_gp = env.get_likelihood_dist()
plot_likelihood(A_gp[1][:,:,0],'Reward Right')

plot_likelihood(A_gp[1][:,:,1],'Reward Left')

plot_likelihood(A_gp[2][:,3,:],'Cue Mapping')

Transition Dynamics
We represent the dynamics of the environment (e.g. changes in the location of the agent and changes to the reward condition) as conditional probability distributions that encode the likelihood of transitions between the states of a given hidden state factor. These distributions are collected into the so-called B
array, also known as transition likelihoods or transition distribution . As with the A
array, we denote the true probabilities describing the environmental dynamics as B_gp
. Each sub-matrix B_gp[f]
of the larger array encodes the transition probabilities between state-values of a given hidden state factor with index f
. These matrices encode dynamics as Markovian transition probabilities, such that the entry $i,j$ of a given matrix encodes the probability of transition to state $i$ at time $t+1$, given state $j$ at $t$.
# here, we can get the transition mapping directly from the environmental class. So this is the transition mapping that truly describes the environment's dynamics
B_gp = env.get_transition_dist()
For example, we can inspect the ‘dynamics’ of the Reward Condition
factor by indexing into the appropriate sub-matrix of B_gp
plot_likelihood(B_gp[1][:,:,0],'Reward Condition Transitions')

The above transition array is the ‘trivial’ identity matrix, meaning that the reward condition doesn’t change over time (it’s mapped from whatever it’s current value is to the same value at the next timestep).
(Controllable-) Transition Dynamics
Importantly, some hidden state factors are controllable by the agent, meaning that the probability of being in state $i$ at $t+1$ isn’t merely a function of the state at $t$, but also of actions (or from the agent’s perspective, control states ). So now each transition likelihood encodes conditional probability distributions over states at $t+1$, where the conditioning variables are both the states at $t-1$ and the actions at $t-1$. This extra conditioning on actions is encoded via an optional third dimension to each factor-specific B
array.
For example, in our case the first hidden state factor (Location
) is under the control of the agent, which means the corresponding transition likelihoods B[0]
are index-able by both previous state and action.
plot_likelihood(B_gp[0][:,:,0],'Transition likelihood for "Move to Center"')

plot_likelihood(B_gp[0][:,:,1],'Transition likelihood for "Move to Right Arm"')

plot_likelihood(B_gp[0][:,:,2],'Transition likelihood for "Move to Left Arm"')

plot_likelihood(B_gp[0][:,:,3],'Transition likelihood for "Move to Cue Location"')

The generative model
Now we can move onto setting up the generative model of the agent - namely, the agent’s beliefs or assumptions about how hidden states give rise to observations, and how hidden states transition among eachother.
In almost all MDPs, the critical building blocks of this generative model are the agent’s representation of the observation likelihood, which we’ll refer to as A_gm
, and its representation of the transition likelihood, which we’ll call B_gm
.
Here, we assume the agent has a veridical representation of the rules of the T-maze (namely, how hidden states cause observations) as well as its ability to control its own movements with certain consequences (i.e. ‘noiseless’ transitions). So in other words, the agent will have a veridical representation of the “rules” of the environment, as encoded in the arrays A_gp
and B_gp
of the generative process.
A_gm = copy.deepcopy(A_gp) # make a copy of the true observation likelihood to initialize the observation model
B_gm = copy.deepcopy(B_gp) # make a copy of the true transition likelihood to initialize the transition model
Important Concept to Note Here!!!
It is not necessary, or even in many cases important , that the generative model is a veridical representation of the generative process. This distinction between generative model (essentially, beliefs entertained by the agent and its interaction with the world) and the generative process (the actual dynamical system ‘out there’ generating sensations) is of crucial importance to the active inference formalism and (in our experience) often overlooked in code.
It is for notational and computational convenience that we encode the generative process using A
and B
matrices. By doing so, it simply puts the rules of the environment in a data structure that can easily be converted into the Markovian-style conditional distributions useful for encoding the agent’s generative model.
Strictly speaking, however, all the generative process needs to do is generate observations that are ‘digestible’ by the agent, and be ‘perturbable’ by actions issued by the agent. The way in which it does so can be arbitrarily complex, non-linear, and unaccessible by the agent. Namely, it doesn’t have to be encoded by A
and B
arrays (what amount to Markovian, conditional probability tables), but could be described by arbitrarily complex nonlinear or non-differentiable transformations of hidden states that generate observatons.
Introducing the Agent()
class
In pymdp
, we have abstracted much of the computations required for active inference into the Agent
class, a flexible object that can be used to store necessary aspects of the generative model, the agent’s instantaneous observations and actions, and perform action / perception using functions like Agent.infer_states
and Agent.infer_policies
.
An instance of Agent
is straightforwardly initialized with a call to the Agent()
constructor with a list of optional arguments.
In our call to Agent()
, we need to constrain the default behavior with some of our T-Maze-specific needs. For example, we want to make sure that the agent’s beliefs about transitions are constrained by the fact that it can only control the Location
factor - not the Reward Condition
(which we assumed stationary across an epoch of time). Therefore we specify this using a list of indices that will be passed as the control_fac_idx
argument of the Agent()
constructor.
Each element in the list specifies a hidden state factor (in terms of its index) that is controllable by the agent. Hidden state factors whose indices are not in this list are assumed to be uncontrollable.
controllable_indices = [0] # this is a list of the indices of the hidden state factors that are controllable
Now we can construct our agent…
agent = Agent(A=A_gm, B=B_gm, control_fac_idx=controllable_indices)
Now we can inspect properties (and change) of the agent as we see fit. Let’s look at the initial beliefs the agent has about its starting location and reward condition, encoded in the prior over hidden states $P(s)$, known in SPM-lingo as the D
array.
plot_beliefs(agent.D[0],"Beliefs about initial location")

plot_beliefs(agent.D[1],"Beliefs about reward condition")

Let’s make it so that agent starts with precise and accurate prior beliefs about its starting location.
agent.D[0] = utils.onehot(0, agent.num_states[0])
The onehot(value, dimension)
function within pymdp.utils
is a nice function for quickly generating a one-hot vector of dimensions dimension
with a 1
at the index value
.
And now confirm that our agent knows (i.e. has accurate beliefs about) its initial state by visualizing its priors again.
plot_beliefs(agent.D[0],"Beliefs about initial location")

Another thing we want to do in this case is make sure the agent has a ‘sense’ of reward / loss and thus a motivation to be in the ‘correct’ arm (the arm that maximizes the probability of getting the reward outcome).
We can do this by changing the prior beliefs about observations, the C
array (also known as the prior preferences ). This is represented as a collection of distributions over observations for each modality. It is initialized by default to be all 0s. This means agent has no preference for particular outcomes. Since the second modality (index 1
of the C
array) is the Reward
modality, with the index of the Reward
outcome being 1
, and that of the Loss
outcome being 2
, we populate the corresponding entries with values whose relative magnitudes encode the preference for one outcome over another (technically, this is encoded directly in terms of relative log-probabilities).
Our ability to make the agent’s prior beliefs that it tends to observe the outcome with index 1
in the Reward
modality, more often than the outcome with index 2
, is what makes this modality a Reward modality in the first place – otherwise, it would just be an arbitrary observation with no extrinsic value per se.
agent.C[1][1] = 3.0
agent.C[1][2] = -3.0
plot_beliefs(agent.C[1],"Prior beliefs about observations")

Active Inference
Now we can start off the T-maze with an initial observation and run active inference via a loop over a desired time interval.
T = 5 # number of timesteps
obs = env.reset() # reset the environment and get an initial observation
# these are useful for displaying read-outs during the loop over time
reward_conditions = ["Right", "Left"]
location_observations = ['CENTER','RIGHT ARM','LEFT ARM','CUE LOCATION']
reward_observations = ['No reward','Reward!','Loss!']
cue_observations = ['Cue Right','Cue Left']
msg = """ === Starting experiment === \n Reward condition: {}, Observation: [{}, {}, {}]"""
print(msg.format(reward_conditions[env.reward_condition], location_observations[obs[0]], reward_observations[obs[1]], cue_observations[obs[2]]))
for t in range(T):
qx = agent.infer_states(obs)
q_pi, efe = agent.infer_policies()
action = agent.sample_action()
msg = """[Step {}] Action: [Move to {}]"""
print(msg.format(t, location_observations[int(action[0])]))
obs = env.step(action)
msg = """[Step {}] Observation: [{}, {}, {}]"""
print(msg.format(t, location_observations[obs[0]], reward_observations[obs[1]], cue_observations[obs[2]]))
=== Starting experiment ===
Reward condition: Right, Observation: [CENTER, No reward, Cue Left]
[Step 0] Action: [Move to CUE LOCATION]
[Step 0] Observation: [CUE LOCATION, No reward, Cue Right]
[Step 1] Action: [Move to RIGHT ARM]
[Step 1] Observation: [RIGHT ARM, Reward!, Cue Right]
[Step 2] Action: [Move to RIGHT ARM]
[Step 2] Observation: [RIGHT ARM, Reward!, Cue Right]
[Step 3] Action: [Move to RIGHT ARM]
[Step 3] Observation: [RIGHT ARM, Reward!, Cue Right]
[Step 4] Action: [Move to RIGHT ARM]
[Step 4] Observation: [RIGHT ARM, Reward!, Cue Right]
The agent begins by moving to the CUE LOCATION
to resolve its uncertainty about the reward condition - this is because it knows it will get an informative cue in this location, which will signal the true reward condition unambiguously. At the beginning of the next timestep, the agent then uses this observaiton to update its posterior beliefs about states qx[1]
to reflect the true reward condition. Having resolved its uncertainty about the reward condition, the agent then moves to RIGHT ARM
to maximize utility and continues to do so, given its (correct) beliefs about the reward condition and the mapping between hidden states and reward observations.
Notice, perhaps confusingly, that the agent continues to receive observations in the 3rd modality (i.e. samples from A_gp[2]
). These are observations of the form Cue Right
or Cue Left
. However, these ‘cue’ observations are random and totally umambiguous unless the agent is in the CUE LOCATION
- this is reflected by totally entropic distributions in the corresponding columns of A_gp[2]
(and the agents beliefs about this ambiguity, reflected in A_gm[2]
. See below.
plot_likelihood(A_gp[2][:,:,0],'Cue Observations when condition is Reward on Right, for Different Locations')

plot_likelihood(A_gp[2][:,:,1],'Cue Observations when condition is Reward on Left, for Different Locations')

The final column on the right side of these matrices represents the distribution over cue observations, conditioned on the agent being in CUE LOCATION
and the appropriate Reward Condition. This demonstrates that cue observations are uninformative / lacking epistemic value for the agent, unless they are in CUE LOCATION.
Now we can inspect the agent’s final beliefs about the reward condition characterizing the ‘trial,’ having undergone 10 timesteps of active inference.
plot_beliefs(qx[1],"Final posterior beliefs about reward condition")

Active Inference Demo: Epistemic Chaining
Author: Conor Heins
This demo notebook builds a generative model from scratch, constructs an Agent
instance using the constructed generative model, and then runs an active inference simulation in a simple environment.
The environment used here is similar in spirit to the T-Maze demo, but the task structure is more complex. Here, we analogize the agent to a rat tasked with solving a spatial puzzle. The rat must sequentially visit a sequence of two cues located at different locations in a 2-D grid world, in order to ultimately reveal the locations of two (opposite) reward outcomes: one location will give the rat a reward (“Cheese”) and the other location will give the rat a punishment (“Shock”).
Using active inference to solve a POMDP representation of this task, the rat can successfully forage the correct cues in sequence, in order to ultimately discover the location of the “Cheese”, and avoid the “Shock”.
Note: When running this notebook in Google Colab, you may have to run !pip install inferactively-pymdp
at the top of the notebook, before you can import pymdp
. That cell is left commented out below, in case you are running this notebook from Google Colab.
# ! pip install inferactively-pymdp
Imports
import os
import sys
import pathlib
import numpy as np
from pymdp.agent import Agent
from pymdp import utils, maths
Grid World Parameters
Let’s begin by initializing several variables related to the physical environment inhabited by the agent. These variables will encode things like the dimensions of the grid, the possible locations of the different cues, and the possible locations of the reward or punishment.
Having these variables defined will also come in handy when setting up the generative model of our agent and when creating the environment class.
We will create a grid world with dimensions $5 \times 7$. Particular locations of the grid are indexed as (y, x) tuples, that select a particular row and column respectively of that location in the grid.
By design of the task, one location in the grid world contain a cue: Cue 1. There will be four additional locations, that will serve as possible locations for a second cue: Cue 2. Crucially, only one of these four additional locations will actually contain Cue 2 - the other 3 will be empty. When the agent visits Cue 1 by moving to its location, one of four signals is presented, which each unambiguously signal which of the 4 possible locations Cue 2 occupies – we can refer to these Cue-2-location-signals with obvious names: L1
, L2
, L3
, L4
. Once Cue 2’s location has been revealed, by visiting that location the agent will then receive one of two possible signals, that indicate where the hidden reward is located (and conversely, where the hidden punishment lies). These two possible reward/punishment locations are indicated by two locations: “TOP” (meaning the “Cheese” reward is on the upper of the two locations) or “BOTTOM” (meaning the “Cheese” reward is on the lower of the two locations).
In this way, the most efficient and risk-sensitive way to achieve reward in this task is to first visit Cue 1, in order to figure out the location of Cue 2, in order to figure out the location of the reward.
Tip: When setting up pymdp
generative models and task environments, we recommend creating additional variables, like lists of strings or dicts with string-valued keys, that help you relate the values of various aspects of the task to semantically-meaning labels. These come in handy when generating print statements during debugging or labels for plotting. For example, below we create a list called reward_conditions
that stores the “names” of the two reward conditions: "TOP"
and "BOTTOM"
grid_dims = [5, 7] # dimensions of the grid (number of rows, number of columns)
num_grid_points = np.prod(grid_dims) # total number of grid locations (rows X columns)
# create a look-up table `loc_list` that maps linear indices to tuples of (y, x) coordinates
grid = np.arange(num_grid_points).reshape(grid_dims)
it = np.nditer(grid, flags=["multi_index"])
loc_list = []
while not it.finished:
loc_list.append(it.multi_index)
it.iternext()
# (y, x) coordinate of the first cue's location, and then a list of the (y, x) coordinates of the possible locations of the second cue, and their labels (`L1`, `L2`, ...)
cue1_location = (2, 0)
cue2_loc_names = ['L1', 'L2', 'L3', 'L4']
cue2_locations = [(0, 2), (1, 3), (3, 3), (4, 2)]
# names of the reward conditions and their locations
reward_conditions = ["TOP", "BOTTOM"]
reward_locations = [(1, 5), (3, 5)]
Visualize the grid world
Let’s quickly use the variables we just defined to visualize the grid world, including the Cue 1 location, the possible Cue 2 locations, and the possible reward locations (in gray, since we don’t know which one has the “Cheese” and which one has the “Shock”)
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import matplotlib.cm as cm
fig, ax = plt.subplots(figsize=(10, 6))
# create the grid visualization
X, Y = np.meshgrid(np.arange(grid_dims[1]+1), np.arange(grid_dims[0]+1))
h = ax.pcolormesh(X, Y, np.ones(grid_dims), edgecolors='k', vmin = 0, vmax = 30, linewidth=3, cmap = 'coolwarm')
ax.invert_yaxis()
# Put gray boxes around the possible reward locations
reward_top = ax.add_patch(patches.Rectangle((reward_locations[0][1],reward_locations[0][0]),1.0,1.0,linewidth=5,edgecolor=[0.5, 0.5, 0.5],facecolor=[0.5, 0.5, 0.5]))
reward_bottom = ax.add_patch(patches.Rectangle((reward_locations[1][1],reward_locations[1][0]),1.0,1.0,linewidth=5,edgecolor=[0.5, 0.5, 0.5],facecolor=[0.5, 0.5, 0.5]))
text_offsets = [0.4, 0.6]
cue_grid = np.ones(grid_dims)
cue_grid[cue1_location[0],cue1_location[1]] = 15.0
for ii, loc_ii in enumerate(cue2_locations):
row_coord, column_coord = loc_ii
cue_grid[row_coord, column_coord] = 5.0
ax.text(column_coord+text_offsets[0], row_coord+text_offsets[1], cue2_loc_names[ii], fontsize = 15, color='k')
h.set_array(cue_grid.ravel())

Generative model
The hidden states $\mathbf{s}$ of the generative model are factorized into three hidden states factors:
a Location hidden state factor with as many levels as there are grid locations. This encodes the agent’s location in the grid world.
a Cue 2 Location hidden state factor with 4 levels – this encodes in which of the four possible locations Cue 2 is located.
a Reward Condition hidden state factor with 2 levels – this encodes which of the two reward locations (“TOP” or “BOTTOM”) the “Cheese” is to be found in. When the Reward Condition level is “TOP”, then the “Cheese” reward is the upper of the two locations, and the “Shock” punishment is on the lower of the two locations. The locations are switched in the “BOTTOM” level of the Reward Condition factor.
The observations $\mathbf{o}$ of the generative model are factorized into four different observation modalities:
a Location observation modality with as many levels as there are grid locations, representing the agent’s observation of its location in the grid world.
a Cue 1 observation modality with 5 levels – this is an observation, only obtained at the Cue 1 location, that signals in which of the 4 possible locations Cue 2 is located. When not at the Cue 1 location, the agent sees
Null
or a meaningless observation.a Cue 2 observation modality with 3 levels – this is an observation, only obtained at the Cue 2 location, that signals in which of the two reward locations (“TOP” or “BOTTOM”) the “Cheese” is located. When not at the Cue 2 location, the agent sees
Null
or a meaningless observation.a Reward observation modality with 3 levels – this is an observation that signals whether the agent is receiving “Cheese”, “Shock” or nothing at all (“Null”). The agent only receives “Cheese” or “Shock” when occupying one of the two reward locations, and
Null
otherwise.
As is the usual convention in pymdp
, let’s create a list that contains the dimensionalities of the hidden state factors, named num_states
, and a list that contains the dimensionalities of the observation modalities, named num_obs
.
# list of dimensionalities of the hidden states -- useful for creating generative model later on
num_states = [num_grid_points, len(cue2_locations), len(reward_conditions)]
# Names of the cue1 observation levels, the cue2 observation levels, and the reward observation levels
cue1_names = ['Null'] + cue2_loc_names # signals for the possible Cue 2 locations, that only are seen when agent is visiting Cue 1
cue2_names = ['Null', 'reward_on_top', 'reward_on_bottom']
reward_names = ['Null', 'Cheese', 'Shock']
num_obs = [num_grid_points, len(cue1_names), len(cue2_names), len(reward_names)]
The observation model: A array
Now using num_states
and num_obs
we can initialize A
, the observation model
A_m_shapes = [ [o_dim] + num_states for o_dim in num_obs] # list of shapes of modality-specific A[m] arrays
A = utils.obj_array_zeros(A_m_shapes) # initialize A array to an object array of all-zero subarrays
Let’s fill out the various modalities of the A
array, encoding the agents beliefs about how hidden states probabilistically cause observations within each modality.
Starting with the 0
-th modality, the Location observation modality: A[0]
# make the location observation only depend on the location state (proprioceptive observation modality)
A[0] = np.tile(np.expand_dims(np.eye(num_grid_points), (-2, -1)), (1, 1, num_states[1], num_states[2]))
Now we can build the 1
-st modality, the Cue 1 observation modality: A[1]
# make the cue1 observation depend on the location (being at cue1_location) and the true location of cue2
A[1][0,:,:,:] = 1.0 # default makes Null the most likely observation everywhere
# Make the Cue 1 signal depend on 1) being at the Cue 1 location and 2) the location of Cue 2
for i, cue_loc2_i in enumerate(cue2_locations):
A[1][0,loc_list.index(cue1_location),i,:] = 0.0
A[1][i+1,loc_list.index(cue1_location),i,:] = 1.0
Now we can build the 2
-nd modality, the Cue 2 observation modality: A[2]
# make the cue2 observation depend on the location (being at the correct cue2_location) and the reward condition
A[2][0,:,:,:] = 1.0 # default makes Null the most likely observation everywhere
for i, cue_loc2_i in enumerate(cue2_locations):
# if the cue2-location is the one you're currently at, then you get a signal about where the reward is
A[2][0,loc_list.index(cue_loc2_i),i,:] = 0.0
A[2][1,loc_list.index(cue_loc2_i),i,0] = 1.0
A[2][2,loc_list.index(cue_loc2_i),i,1] = 1.0
Finally, we build the 3rd modality, the Reward observation modality: A[3]
# make the reward observation depend on the location (being at reward location) and the reward condition
A[3][0,:,:,:] = 1.0 # default makes Null the most likely observation everywhere
rew_top_idx = loc_list.index(reward_locations[0]) # linear index of the location of the "TOP" reward location
rew_bott_idx = loc_list.index(reward_locations[1]) # linear index of the location of the "BOTTOM" reward location
# fill out the contingencies when the agent is in the "TOP" reward location
A[3][0,rew_top_idx,:,:] = 0.0
A[3][1,rew_top_idx,:,0] = 1.0
A[3][2,rew_top_idx,:,1] = 1.0
# fill out the contingencies when the agent is in the "BOTTOM" reward location
A[3][0,rew_bott_idx,:,:] = 0.0
A[3][1,rew_bott_idx,:,1] = 1.0
A[3][2,rew_bott_idx,:,0] = 1.0
The transition model: B array
To create the B
array or transition model, we have to further specify num_controls
, which like num_states
/ num_obs
is a list, but this time of the dimensionalities of each control factor, which are the hidden state factors that are controllable by the agent. Uncontrollable hidden state factors can be encoded as control factors of dimension 1. Once num_controls
is defined, we can then use it and num_states
to specify the dimensionality of the B
arrays. Recall that in pymdp
hidden state factors are conditionally independent of eachother, meaning that each sub-array B[f]
describes the dynamics of only a single hidden state factor, and its probabilistic dependence on both its own state (at the previous time) and the state of its corresponding control factor.
In the current grid world task, we will have the agent have the ability to make movements in the 4 cardinal directions (UP, DOWN, LEFT, RIGHT) as well as the option to stay in the same place (STAY). This means we will associate a single 5-dimensional control state factor with the first hidden state factor.
Note: Make sure the indices of the num_controls
variables “lines up” with those of num_states
.
# initialize `num_controls`
num_controls = [5, 1, 1]
# initialize the shapes of each sub-array `B[f]`
B_f_shapes = [ [ns, ns, num_controls[f]] for f, ns in enumerate(num_states)]
# create the `B` array and fill it out
B = utils.obj_array_zeros(B_f_shapes)
Fill out B[0]
according to the expected consequences of each of the 5 actions. Note that we also create a list that stores the names of each action, for interpretability.
actions = ["UP", "DOWN", "LEFT", "RIGHT", "STAY"]
# fill out `B[0]` using the
for action_id, action_label in enumerate(actions):
for curr_state, grid_location in enumerate(loc_list):
y, x = grid_location
if action_label == "UP":
next_y = y - 1 if y > 0 else y
next_x = x
elif action_label == "DOWN":
next_y = y + 1 if y < (grid_dims[0]-1) else y
next_x = x
elif action_label == "LEFT":
next_x = x - 1 if x > 0 else x
next_y = y
elif action_label == "RIGHT":
next_x = x + 1 if x < (grid_dims[1]-1) else x
next_y = y
elif action_label == "STAY":
next_x = x
next_y = y
new_location = (next_y, next_x)
next_state = loc_list.index(new_location)
B[0][next_state, curr_state, action_id] = 1.0
Fill out B[1]
and B[2]
as identity matrices, encoding the fact that those hidden states are uncontrollable
B[1][:,:,0] = np.eye(num_states[1])
B[2][:,:,0] = np.eye(num_states[2])
Prior preferences: the C vectors
Now we specify the agent’s prior over observations, also known as the “prior preferences” or “goal vector.” This is not technically a part of the same generative model used for inference of hidden states, but part of a special predictive generative model using for policy inference.
Since the prior preferences are defined in pymdp
as priors over observations, not states, so C
will be an object array whose sub-arrays correspond to the priors over specific observation modalities, e.g C[3]
encodes the prior preferences for different levels of the Reward observation modality.
C = utils.obj_array_zeros(num_obs)
C[3][1] = 2.0 # make the agent want to encounter the "Cheese" observation level
C[3][2] = -4.0 # make the agent not want to encounter the "Shock" observation level
Generative process
Now we need to write down the “rules” of the game, i.e. the environment that the agent will actually be interacting with. The most concise way to do this in pymdp
is by adopting a similar format to what’s used in frameworks like OpenAIGym – namely, we create an env
class that takes actions as inputs to a self.step()
method, and returns observations for the agent as outputs. In Active inference we refer to this agent-independent, physical “reality” in which the agent operates as the generative process, to be distinguished from the agent’s representation of that reality the generative model (the A
, B
, C
and D
that we just wrote down above).
Writing a custom env
Now we’ll define an environment class called GridWorldEnv
. The constructor for this class allows you to establish various parameters of the generative process, like where the agent starts in the grid-world at the beginning of the trial (starting_loc
), the location of Cue 1 (cue1_loc
), the location of Cue 2 (cue2_loc
), and the reward condition (reward_condition
).
Note: Remember the distinction between the generative model and the generative process: one can build the environment class to be as arbitrarily different from the agent’s generative model as desired. For example, for the GridWorldEnv
example, you could construct the agent’s A
array such that the agent believes Cue 1 is in Location (1,0)
, but in fact the cue is located somewhere else like (3,0)
(as would be set by the cue1_loc
argument to the GridWorldEnv
constructor). Similarly, one could write the internal step
method of the GridWorldEnv
class so that the way the reward_condition
is signalled is opposite from what the agent expects – so when the agent sees a particular signal at the Cue 2 location, they assume (via the A
array) it means that the “Cheese” is located on the "TOP"
location, but in fact the rule is switched so that “Shock” is at the "TOP"
location in reality, and “Cheese” is actually at the "BOTTOM"
location.
class GridWorldEnv():
def __init__(self,starting_loc = (0,0), cue1_loc = (2, 0), cue2 = 'L1', reward_condition = 'TOP'):
self.init_loc = starting_loc
self.current_location = self.init_loc
self.cue1_loc = cue1_loc
self.cue2_name = cue2
self.cue2_loc_names = ['L1', 'L2', 'L3', 'L4']
self.cue2_loc = cue2_locations[self.cue2_loc_names.index(self.cue2_name)]
self.reward_condition = reward_condition
print(f'Starting location is {self.init_loc}, Reward condition is {self.reward_condition}, cue is located in {self.cue2_name}')
def step(self,action_label):
(Y, X) = self.current_location
if action_label == "UP":
Y_new = Y - 1 if Y > 0 else Y
X_new = X
elif action_label == "DOWN":
Y_new = Y + 1 if Y < (grid_dims[0]-1) else Y
X_new = X
elif action_label == "LEFT":
Y_new = Y
X_new = X - 1 if X > 0 else X
elif action_label == "RIGHT":
Y_new = Y
X_new = X +1 if X < (grid_dims[1]-1) else X
elif action_label == "STAY":
Y_new, X_new = Y, X
self.current_location = (Y_new, X_new) # store the new grid location
loc_obs = self.current_location # agent always directly observes the grid location they're in
if self.current_location == self.cue1_loc:
cue1_obs = self.cue2_name
else:
cue1_obs = 'Null'
if self.current_location == self.cue2_loc:
cue2_obs = cue2_names[reward_conditions.index(self.reward_condition)+1]
else:
cue2_obs = 'Null'
# @NOTE: here we use the same variable `reward_locations` to create both the agent's generative model (the `A` matrix) as well as the generative process.
# This is just for simplicity, but it's not necessary - you could have the agent believe that the Cheese/Shock are actually stored in arbitrary, incorrect locations.
if self.current_location == reward_locations[0]:
if self.reward_condition == 'TOP':
reward_obs = 'Cheese'
else:
reward_obs = 'Shock'
elif self.current_location == reward_locations[1]:
if self.reward_condition == 'BOTTOM':
reward_obs = 'Cheese'
else:
reward_obs = 'Shock'
else:
reward_obs = 'Null'
return loc_obs, cue1_obs, cue2_obs, reward_obs
def reset(self):
self.current_location = self.init_loc
print(f'Re-initialized location to {self.init_loc}')
loc_obs = self.current_location
cue1_obs = 'Null'
cue2_obs = 'Null'
reward_obs = 'Null'
return loc_obs, cue1_obs, cue2_obs, reward_obs
Active Inference
Now that we have a generative model and generative process set up, we can quickly run active inference in pymdp
. In order to do this, all we need to do is to create an Agent
using the Agent()
constructor and create a generative process / environment using our custom GridWorldEnv
class. Then we just exchange observations and actions between the two in a loop over time, where the agent updates its beliefs and actions using the Agent
methods like infer_states()
and infer_policies()
.
Initialize an Agent
and an instance of GridWorldEnv
We can quickly construct an instance of Agent
using our generative model arrays as inputs: A
, B
, C
, and D
. Since we are dealing with a spatially-extended navigation example, we will also use a policy_len
parameter that lets the agent plan its movements forward in time. This sort of temporally deep planning is needed because of A) the local nature of the agent’s action repetoire (only being able to move UP, LEFT, RIGHT, and DOWN), and B) the physical distance between the cues and reward locations in the grid world.
We can also initialize the GridWorldEnv
class using a desired starting location, a Cue 1 location, Cue 2 location, and reward condition. We can get the first (multi-modality) observation of the simulation by using env.reset()
my_agent = Agent(A = A, B = B, C = C, D = D, policy_len = 4)
my_env = GridWorldEnv(starting_loc = (0,0), cue1_loc = (2, 0), cue2 = 'L4', reward_condition = 'BOTTOM')
loc_obs, cue1_obs, cue2_obs, reward_obs = my_env.reset()
Starting location is (0, 0), Reward condition is BOTTOM, cue is located in L4
Re-initialized location to (0, 0)
Run an active inference loop over time
…saving the history of the rat’s locations as you do so. Include some print statements if you want to see the output of the agent’s choices as they unfold.
history_of_locs = [loc_obs]
obs = [loc_list.index(loc_obs), cue1_names.index(cue1_obs), cue2_names.index(cue2_obs), reward_names.index(reward_obs)]
T = 10 # number of total timesteps
for t in range(T):
qs = my_agent.infer_states(obs)
my_agent.infer_policies()
chosen_action_id = my_agent.sample_action()
movement_id = int(chosen_action_id[0])
choice_action = actions[movement_id]
print(f'Action at time {t}: {choice_action}')
loc_obs, cue1_obs, cue2_obs, reward_obs = my_env.step(choice_action)
obs = [loc_list.index(loc_obs), cue1_names.index(cue1_obs), cue2_names.index(cue2_obs), reward_names.index(reward_obs)]
history_of_locs.append(loc_obs)
print(f'Grid location at time {t}: {loc_obs}')
print(f'Reward at time {t}: {reward_obs}')
Action at time 0: DOWN
Grid location at time 0: (1, 0)
Reward at time 0: Null
Action at time 1: DOWN
Grid location at time 1: (2, 0)
Reward at time 1: Null
Action at time 2: RIGHT
Grid location at time 2: (2, 1)
Reward at time 2: Null
Action at time 3: DOWN
Grid location at time 3: (3, 1)
Reward at time 3: Null
Action at time 4: DOWN
Grid location at time 4: (4, 1)
Reward at time 4: Null
Action at time 5: RIGHT
Grid location at time 5: (4, 2)
Reward at time 5: Null
Action at time 6: RIGHT
Grid location at time 6: (4, 3)
Reward at time 6: Null
Action at time 7: RIGHT
Grid location at time 7: (4, 4)
Reward at time 7: Null
Action at time 8: UP
Grid location at time 8: (3, 4)
Reward at time 8: Null
Action at time 9: RIGHT
Grid location at time 9: (3, 5)
Reward at time 9: Cheese
Visualization
Now let’s do a quick visualization of the rat’s movements over a single trial. We’ll indicate the grid location and time of its movements using a hot colormap (so hotter colors means later in the trial), and indicate the Cue 1 and Cue 2 locations with purple outlined boxes. Each of the possible Cue 2 locations will be highlighted in a light blue.
Try changing the initial settings of the generative process (the locations of Cue1, Cue 2, the reward condition, etc.) to see how and the extent to which the active inference agent can adapt its behavior to the changing environmental contingencies.
all_locations = np.vstack(history_of_locs).astype(float) # create a matrix containing the agent's Y/X locations over time (each coordinate in one row of the matrix)
fig, ax = plt.subplots(figsize=(10, 6))
# create the grid visualization
X, Y = np.meshgrid(np.arange(grid_dims[1]+1), np.arange(grid_dims[0]+1))
h = ax.pcolormesh(X, Y, np.ones(grid_dims), edgecolors='k', vmin = 0, vmax = 30, linewidth=3, cmap = 'coolwarm')
ax.invert_yaxis()
# get generative process global parameters (the locations of the Cues, the reward condition, etc.)
cue1_loc, cue2_loc, reward_condition = my_env.cue1_loc, my_env.cue2_loc, my_env.reward_condition
reward_top = ax.add_patch(patches.Rectangle((reward_locations[0][1],reward_locations[0][0]),1.0,1.0,linewidth=5,edgecolor=[0.5, 0.5, 0.5],facecolor='none'))
reward_bottom = ax.add_patch(patches.Rectangle((reward_locations[1][1],reward_locations[1][0]),1.0,1.0,linewidth=5,edgecolor=[0.5, 0.5, 0.5],facecolor='none'))
reward_loc = reward_locations[0] if reward_condition == "TOP" else reward_locations[1]
if reward_condition == "TOP":
reward_top.set_edgecolor('g')
reward_top.set_facecolor('g')
reward_bottom.set_edgecolor([0.7, 0.2, 0.2])
reward_bottom.set_facecolor([0.7, 0.2, 0.2])
elif reward_condition == "BOTTOM":
reward_bottom.set_edgecolor('g')
reward_bottom.set_facecolor('g')
reward_top.set_edgecolor([0.7, 0.2, 0.2])
reward_top.set_facecolor([0.7, 0.2, 0.2])
reward_top.set_zorder(1)
reward_bottom.set_zorder(1)
text_offsets = [0.4, 0.6]
cue_grid = np.ones(grid_dims)
cue_grid[cue1_loc[0],cue1_loc[1]] = 15.0
for ii, loc_ii in enumerate(cue2_locations):
row_coord, column_coord = loc_ii
cue_grid[row_coord, column_coord] = 5.0
ax.text(column_coord+text_offsets[0], row_coord+text_offsets[1], cue2_loc_names[ii], fontsize = 15, color='k')
h.set_array(cue_grid.ravel())
cue1_rect = ax.add_patch(patches.Rectangle((cue1_loc[1],cue1_loc[0]),1.0,1.0,linewidth=8,edgecolor=[0.5, 0.2, 0.7],facecolor='none'))
cue2_rect = ax.add_patch(patches.Rectangle((cue2_loc[1],cue2_loc[0]),1.0,1.0,linewidth=8,edgecolor=[0.5, 0.2, 0.7],facecolor='none'))
ax.plot(all_locations[:,1]+0.5,all_locations[:,0]+0.5, 'r', zorder = 2)
temporal_colormap = cm.hot(np.linspace(0,1,T+1))
dots = ax.scatter(all_locations[:,1]+0.5,all_locations[:,0]+0.5, 450, c = temporal_colormap, zorder=3)
ax.set_title(f"Cue 1 located at {cue2_loc}, Cue 2 located at {cue2_loc}, Cheese on {reward_condition}", fontsize=16)
Text(0.5, 1.0, 'Cue 1 located at (4, 2), Cue 2 located at (4, 2), Cheese on BOTTOM')

Experimenting with different environmental structure
Try changing around the locations of the rewards, the cues, the agent’s beliefs, etc. For example, below we’ll change the location of the rewards, both in the generative model and the generative process.
# names of the reward conditions and their locations
reward_conditions = ["LEFT", "RIGHT"]
reward_locations = [(2, 2), (2, 4)] # DIFFERENT REWARD LOCATIONS
## reset `A[3]`, the reward observation model
A[3] = np.zeros([num_obs[3]] + num_states)
# make the reward observation depend on the location (being at reward location) and the reward condition
A[3][0,:,:,:] = 1.0 # default makes Null the most likely observation everywhere
rew_top_idx = loc_list.index(reward_locations[0]) # linear index of the location of the "TOP" reward location
rew_bott_idx = loc_list.index(reward_locations[1]) # linear index of the location of the "BOTTOM" reward location
# fill out the contingencies when the agent is in the "TOP" reward location
A[3][0,rew_top_idx,:,:] = 0.0
A[3][1,rew_top_idx,:,0] = 1.0
A[3][2,rew_top_idx,:,1] = 1.0
# fill out the contingencies when the agent is in the "BOTTOM" reward location
A[3][0,rew_bott_idx,:,:] = 0.0
A[3][1,rew_bott_idx,:,1] = 1.0
A[3][2,rew_bott_idx,:,0] = 1.0
class GridWorldEnv():
def __init__(self,starting_loc = (4,0), cue1_loc = (2, 0), cue2 = 'L1', reward_condition = 'LEFT'):
self.init_loc = starting_loc
self.current_location = self.init_loc
self.cue1_loc = cue1_loc
self.cue2_name = cue2
self.cue2_loc_names = ['L1', 'L2', 'L3', 'L4']
self.cue2_loc = cue2_locations[self.cue2_loc_names.index(self.cue2_name)]
self.reward_condition = reward_condition
print(f'Starting location is {self.init_loc}, Reward condition is {self.reward_condition}, cue is located in {self.cue2_name}')
def step(self,action_label):
(Y, X) = self.current_location
if action_label == "UP":
Y_new = Y - 1 if Y > 0 else Y
X_new = X
elif action_label == "DOWN":
Y_new = Y + 1 if Y < (grid_dims[0]-1) else Y
X_new = X
elif action_label == "LEFT":
Y_new = Y
X_new = X - 1 if X > 0 else X
elif action_label == "RIGHT":
Y_new = Y
X_new = X +1 if X < (grid_dims[1]-1) else X
elif action_label == "STAY":
Y_new, X_new = Y, X
self.current_location = (Y_new, X_new) # store the new grid location
loc_obs = self.current_location # agent always directly observes the grid location they're in
if self.current_location == self.cue1_loc:
cue1_obs = self.cue2_name
else:
cue1_obs = 'Null'
if self.current_location == self.cue2_loc:
cue2_obs = cue2_names[reward_conditions.index(self.reward_condition)+1]
else:
cue2_obs = 'Null'
# @NOTE: here we use the same variable `reward_locations` to create both the agent's generative model (the `A` matrix) as well as the generative process.
# This is just for simplicity, but it's not necessary - you could have the agent believe that the Cheese/Shock are actually stored in arbitrary, incorrect locations.
if self.current_location == reward_locations[0]:
if self.reward_condition == 'LEFT':
reward_obs = 'Cheese'
else:
reward_obs = 'Shock'
elif self.current_location == reward_locations[1]:
if self.reward_condition == 'RIGHT':
reward_obs = 'Cheese'
else:
reward_obs = 'Shock'
else:
reward_obs = 'Null'
return loc_obs, cue1_obs, cue2_obs, reward_obs
def reset(self):
self.current_location = self.init_loc
print(f'Re-initialized location to {self.init_loc}')
loc_obs = self.current_location
cue1_obs = 'Null'
cue2_obs = 'Null'
reward_obs = 'Null'
return loc_obs, cue1_obs, cue2_obs, reward_obs
my_agent = Agent(A = A, B = B, C = C, D = D, policy_len = 4)
my_env = GridWorldEnv(starting_loc = (0,0), cue1_loc = (2, 0), cue2 = 'L1', reward_condition = 'RIGHT')
loc_obs, cue1_obs, cue2_obs, reward_obs = my_env.reset()
history_of_locs = [loc_obs]
obs = [loc_list.index(loc_obs), cue1_names.index(cue1_obs), cue2_names.index(cue2_obs), reward_names.index(reward_obs)]
T = 10 # number of total timesteps
for t in range(T):
qs = my_agent.infer_states(obs)
my_agent.infer_policies()
chosen_action_id = my_agent.sample_action()
movement_id = int(chosen_action_id[0])
choice_action = actions[movement_id]
print(f'Action at time {t}: {choice_action}')
loc_obs, cue1_obs, cue2_obs, reward_obs = my_env.step(choice_action)
obs = [loc_list.index(loc_obs), cue1_names.index(cue1_obs), cue2_names.index(cue2_obs), reward_names.index(reward_obs)]
history_of_locs.append(loc_obs)
print(f'Grid location at time {t}: {loc_obs}')
print(f'Reward at time {t}: {reward_obs}')
Starting location is (0, 0), Reward condition is RIGHT, cue is located in L1
Re-initialized location to (0, 0)
Action at time 0: DOWN
Grid location at time 0: (1, 0)
Reward at time 0: Null
Action at time 1: DOWN
Grid location at time 1: (2, 0)
Reward at time 1: Null
Action at time 2: UP
Grid location at time 2: (1, 0)
Reward at time 2: Null
Action at time 3: RIGHT
Grid location at time 3: (1, 1)
Reward at time 3: Null
Action at time 4: UP
Grid location at time 4: (0, 1)
Reward at time 4: Null
Action at time 5: RIGHT
Grid location at time 5: (0, 2)
Reward at time 5: Null
Action at time 6: RIGHT
Grid location at time 6: (0, 3)
Reward at time 6: Null
Action at time 7: DOWN
Grid location at time 7: (1, 3)
Reward at time 7: Null
Action at time 8: DOWN
Grid location at time 8: (2, 3)
Reward at time 8: Null
Action at time 9: RIGHT
Grid location at time 9: (2, 4)
Reward at time 9: Cheese
all_locations = np.vstack(history_of_locs).astype(float) # create a matrix containing the agent's Y/X locations over time (each coordinate in one row of the matrix)
fig, ax = plt.subplots(figsize=(10, 6))
# create the grid visualization
X, Y = np.meshgrid(np.arange(grid_dims[1]+1), np.arange(grid_dims[0]+1))
h = ax.pcolormesh(X, Y, np.ones(grid_dims), edgecolors='k', vmin = 0, vmax = 30, linewidth=3, cmap = 'coolwarm')
ax.invert_yaxis()
# get generative process global parameters (the locations of the Cues, the reward condition, etc.)
cue1_loc, cue2_loc, reward_condition = my_env.cue1_loc, my_env.cue2_loc, my_env.reward_condition
reward_top = ax.add_patch(patches.Rectangle((reward_locations[0][1],reward_locations[0][0]),1.0,1.0,linewidth=5,edgecolor=[0.5, 0.5, 0.5],facecolor='none'))
reward_bottom = ax.add_patch(patches.Rectangle((reward_locations[1][1],reward_locations[1][0]),1.0,1.0,linewidth=5,edgecolor=[0.5, 0.5, 0.5],facecolor='none'))
reward_loc = reward_locations[0] if reward_condition == "LEFT" else reward_locations[1]
if reward_condition == "LEFT":
reward_top.set_edgecolor('g')
reward_top.set_facecolor('g')
reward_bottom.set_edgecolor([0.7, 0.2, 0.2])
reward_bottom.set_facecolor([0.7, 0.2, 0.2])
elif reward_condition == "RIGHT":
reward_bottom.set_edgecolor('g')
reward_bottom.set_facecolor('g')
reward_top.set_edgecolor([0.7, 0.2, 0.2])
reward_top.set_facecolor([0.7, 0.2, 0.2])
reward_top.set_zorder(1)
reward_bottom.set_zorder(1)
text_offsets = [0.4, 0.6]
cue_grid = np.ones(grid_dims)
cue_grid[cue1_loc[0],cue1_loc[1]] = 15.0
for ii, loc_ii in enumerate(cue2_locations):
row_coord, column_coord = loc_ii
cue_grid[row_coord, column_coord] = 5.0
ax.text(column_coord+text_offsets[0], row_coord+text_offsets[1], cue2_loc_names[ii], fontsize = 15, color='k')
h.set_array(cue_grid.ravel())
cue1_rect = ax.add_patch(patches.Rectangle((cue1_loc[1],cue1_loc[0]),1.0,1.0,linewidth=8,edgecolor=[0.5, 0.2, 0.7],facecolor='none'))
cue2_rect = ax.add_patch(patches.Rectangle((cue2_loc[1],cue2_loc[0]),1.0,1.0,linewidth=8,edgecolor=[0.5, 0.2, 0.7],facecolor='none'))
ax.plot(all_locations[:,1]+0.5,all_locations[:,0]+0.5, 'r', zorder = 2)
temporal_colormap = cm.hot(np.linspace(0,1,T+1))
dots = ax.scatter(all_locations[:,1]+0.5,all_locations[:,0]+0.5, 450, c = temporal_colormap, zorder=3)
ax.set_title(f"Cue 1 located at {cue2_loc}, Cue 2 located at {cue2_loc}, Cheese on {reward_condition}", fontsize=16)
Text(0.5, 1.0, 'Cue 1 located at (0, 2), Cue 2 located at (0, 2), Cheese on RIGHT')

Inference
The inference.py
module contains the functions for performing inference of discrete hidden states (categorical distributions) in POMDP generative models.
- pymdp.inference.average_states_over_policies(qs_pi, q_pi)
This function computes a expected posterior over hidden states with respect to the posterior over policies, also known as the ‘Bayesian model average of states with respect to policies’.
- Parameters
qs_pi (
numpy.ndarray
of dtype object) – Posterior beliefs over hidden states for each policy. Nesting structure is policies, factors, where e.g.qs_pi[p][f]
stores the marginal belief about factorf
under policyp
.q_pi (
numpy.ndarray
of dtype object) – Posterior beliefs about policies wherelen(q_pi) = num_policies
- Returns
qs_bma – Marginal posterior over hidden states for the current timepoint, averaged across policies according to their posterior probability given by
q_pi
- Return type
numpy.ndarray
of dtype object
- pymdp.inference.update_posterior_states(A, obs, prior=None, **kwargs)
Update marginal posterior over hidden states using mean-field fixed point iteration FPI or Fixed point iteration.
See the following links for details: http://www.cs.cmu.edu/~guestrin/Class/10708/recitations/r9/VI-view.pdf, slides 13- 18, and http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.137.221&rep=rep1&type=pdf, slides 24 - 38.
- Parameters
A (
numpy.ndarray
of dtype object) – Sensory likelihood mapping or ‘observation model’, mapping from hidden states to observations. Each elementA[m]
of stores annp.ndarray
multidimensional array for observation modalitym
, whose entriesA[m][i, j, k, ...]
store the probability of observation leveli
given hidden state levelsj, k, ...
obs (1D
numpy.ndarray
,numpy.ndarray
of dtype object, int or tuple) – The observation (generated by the environment). If single modality, this can be a 1Dnp.ndarray
(one-hot vector representation) or anint
(observation index) If multi-modality, this can benp.ndarray
of dtype object whose entries are 1D one-hot vectors, or a tuple (ofint
)prior (1D
numpy.ndarray
ornumpy.ndarray
of dtype object, default None) – Prior beliefs about hidden states, to be integrated with the marginal likelihood to obtain a posterior distribution. If not provided, prior is set to be equal to a flat categorical distribution (at the level of the individual inference functions).**kwargs (keyword arguments) – List of keyword/parameter arguments corresponding to parameter values for the fixed-point iteration algorithm
algos.fpi.run_vanilla_fpi.py
- Returns
qs – Marginal posterior beliefs over hidden states at current timepoint
- Return type
1D
numpy.ndarray
ornumpy.ndarray
of dtype object
- pymdp.inference.update_posterior_states_factorized(A, obs, num_obs, num_states, mb_dict, prior=None, **kwargs)
Update marginal posterior over hidden states using mean-field fixed point iteration FPI or Fixed point iteration. This version identifies the Markov blanket of each factor using A_factor_list
See the following links for details: http://www.cs.cmu.edu/~guestrin/Class/10708/recitations/r9/VI-view.pdf, slides 13- 18, and http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.137.221&rep=rep1&type=pdf, slides 24 - 38.
- Parameters
A (
numpy.ndarray
of dtype object) – Sensory likelihood mapping or ‘observation model’, mapping from hidden states to observations. Each elementA[m]
of stores annp.ndarray
multidimensional array for observation modalitym
, whose entriesA[m][i, j, k, ...]
store the probability of observation leveli
given hidden state levelsj, k, ...
obs (1D
numpy.ndarray
,numpy.ndarray
of dtype object, int or tuple) – The observation (generated by the environment). If single modality, this can be a 1Dnp.ndarray
(one-hot vector representation) or anint
(observation index) If multi-modality, this can benp.ndarray
of dtype object whose entries are 1D one-hot vectors, or a tuple (ofint
)num_obs (
list
ofint
) – List of dimensionalities of each observation modalitynum_states (
list
ofint
) – List of dimensionalities of each hidden state factormb_dict (
Dict
) – Dictionary with two keys (A_factor_list
andA_modality_list
), that stores the factor indices that influence each modality (A_factor_list
) and the modality indices influenced by each factor (A_modality_list
).prior (1D
numpy.ndarray
ornumpy.ndarray
of dtype object, default None) – Prior beliefs about hidden states, to be integrated with the marginal likelihood to obtain a posterior distribution. If not provided, prior is set to be equal to a flat categorical distribution (at the level of the individual inference functions).**kwargs (keyword arguments) – List of keyword/parameter arguments corresponding to parameter values for the fixed-point iteration algorithm
algos.fpi.run_vanilla_fpi.py
- Returns
qs – Marginal posterior beliefs over hidden states at current timepoint
- Return type
1D
numpy.ndarray
ornumpy.ndarray
of dtype object
- pymdp.inference.update_posterior_states_full(A, B, prev_obs, policies, prev_actions=None, prior=None, policy_sep_prior=True, **kwargs)
Update posterior over hidden states using marginal message passing
- Parameters
A (
numpy.ndarray
of dtype object) – Sensory likelihood mapping or ‘observation model’, mapping from hidden states to observations. Each elementA[m]
of stores annumpy.ndarray
multidimensional array for observation modalitym
, whose entriesA[m][i, j, k, ...]
store the probability of observation leveli
given hidden state levelsj, k, ...
B (
numpy.ndarray
of dtype object) – Dynamics likelihood mapping or ‘transition model’, mapping from hidden states att
to hidden states att+1
, given some control stateu
. Each elementB[f]
of this object array stores a 3-D tensor for hidden state factorf
, whose entriesB[f][s, v, u]
store the probability of hidden state levels
at the current time, given hidden state levelv
and actionu
at the previous time.prev_obs (
list
) – List of observations over time. Each observation in the list can be anint
, alist
of ints, atuple
of ints, a one-hot vector or an object array of one-hot vectors.policies (
list
of 2Dnumpy.ndarray
) – List that stores each policy inpolicies[p_idx]
. Shape ofpolicies[p_idx]
is(num_timesteps, num_factors)
where num_timesteps is the temporal depth of the policy andnum_factors
is the number of control factors.prior (
numpy.ndarray
of dtype object, defaultNone
) – If provided, this anumpy.ndarray
of dtype object, with one sub-array per hidden state factor, that stores the prior beliefs about initial states. IfNone
, this defaults to a flat (uninformative) prior over hidden states.policy_sep_prior (
Bool
, defaultTrue
) – Flag determining whether the prior beliefs from the past are unconditioned on policy, or separated by /conditioned on the policy variable.**kwargs (keyword arguments) – Optional keyword arguments for the function
algos.mmp.run_mmp
- Returns
qs_seq_pi (
numpy.ndarray
of dtype object) – Posterior beliefs over hidden states for each policy. Nesting structure is policies, timepoints, factors, where e.g.qs_seq_pi[p][t][f]
stores the marginal belief about factorf
at timepointt
under policyp
.F (1D
numpy.ndarray
) – Vector of variational free energies for each policy
- pymdp.inference.update_posterior_states_full_factorized(A, mb_dict, B, B_factor_list, prev_obs, policies, prev_actions=None, prior=None, policy_sep_prior=True, **kwargs)
Update posterior over hidden states using marginal message passing
- Parameters
A (
numpy.ndarray
of dtype object) – Sensory likelihood mapping or ‘observation model’, mapping from hidden states to observations. Each elementA[m]
of stores annumpy.ndarray
multidimensional array for observation modalitym
, whose entriesA[m][i, j, k, ...]
store the probability of observation leveli
given hidden state levelsj, k, ...
mb_dict (
Dict
) – Dictionary with two keys (A_factor_list
andA_modality_list
), that stores the factor indices that influence each modality (A_factor_list
) and the modality indices influenced by each factor (A_modality_list
).B (
numpy.ndarray
of dtype object) – Dynamics likelihood mapping or ‘transition model’, mapping from hidden states att
to hidden states att+1
, given some control stateu
. Each elementB[f]
of this object array stores a 3-D tensor for hidden state factorf
, whose entriesB[f][s, v, u]
store the probability of hidden state levels
at the current time, given hidden state levelv
and actionu
at the previous time.B_factor_list (
list
oflist
ofint
) – List of lists of hidden state factors each hidden state factor depends on. Each elementB_factor_list[i]
is a list of the factor indices that factor i’s dynamics depend on.prev_obs (
list
) – List of observations over time. Each observation in the list can be anint
, alist
of ints, atuple
of ints, a one-hot vector or an object array of one-hot vectors.policies (
list
of 2Dnumpy.ndarray
) – List that stores each policy inpolicies[p_idx]
. Shape ofpolicies[p_idx]
is(num_timesteps, num_factors)
where num_timesteps is the temporal depth of the policy andnum_factors
is the number of control factors.prior (
numpy.ndarray
of dtype object, defaultNone
) – If provided, this anumpy.ndarray
of dtype object, with one sub-array per hidden state factor, that stores the prior beliefs about initial states. IfNone
, this defaults to a flat (uninformative) prior over hidden states.policy_sep_prior (
Bool
, defaultTrue
) – Flag determining whether the prior beliefs from the past are unconditioned on policy, or separated by /conditioned on the policy variable.**kwargs (keyword arguments) – Optional keyword arguments for the function
algos.mmp.run_mmp
- Returns
qs_seq_pi (
numpy.ndarray
of dtype object) – Posterior beliefs over hidden states for each policy. Nesting structure is policies, timepoints, factors, where e.g.qs_seq_pi[p][t][f]
stores the marginal belief about factorf
at timepointt
under policyp
.F (1D
numpy.ndarray
) – Vector of variational free energies for each policy
Control
The control.py
module contains the functions for performing inference of policies (sequences of control states) in POMDP generative models,
according to active inference.
- pymdp.control.backwards_induction(H, B, B_factor_list, threshold, depth)
Runs backwards induction of reaching a goal state H given a transition model B.
- Parameters
H (
numpy.ndarray
of dtype object) – Prior over statesB (
numpy.ndarray
of dtype object) – Dynamics likelihood mapping or ‘transition model’, mapping from hidden states att
to hidden states att+1
, given some control stateu
. Each elementB[f]
of this object array stores a 3-D tensor for hidden state factorf
, whose entriesB[f][s, v, u]
store the probability of hidden state levels
at the current time, given hidden state levelv
and actionu
at the previous time.B_factor_list (
list
oflist
ofint
) – List of lists of hidden state factors each hidden state factor depends on. Each elementB_factor_list[i]
is a list of the factor indices that factor i’s dynamics depend on.threshold (
float
) – The threshold for pruning transitions that are below a certain probabilitydepth (
int
) – The temporal depth of the backward induction
- Returns
I – For each state factor, contains a 2D
numpy.ndarray
whose element i,j yields the probability of reaching the goal state backwards from state j after i steps.- Return type
numpy.ndarray
of dtype object
- pymdp.control.calc_expected_utility(qo_pi, C)
Computes the expected utility of a policy, using the observation distribution expected under that policy and a prior preference vector.
- Parameters
qo_pi (
list
ofnumpy.ndarray
of dtype object) – Predictive posterior beliefs over observations expected under the policy, whereqo_pi[t]
stores the beliefs about observations expected under the policy at timet
C (
numpy.ndarray
of dtype object) – Prior over observations or ‘prior preferences’, storing the “value” of each outcome in terms of relative log probabilities. This is softmaxed to form a proper probability distribution before being used to compute the expected utility.
- Returns
expected_util – Utility (reward) expected under the policy in question
- Return type
float
- pymdp.control.calc_inductive_cost(qs, qs_pi, I, epsilon=0.001)
Computes the inductive cost of a state.
- Parameters
qs (
numpy.ndarray
of dtype object) – Marginal posterior beliefs over hidden states at a given timepoint.qs_pi (
list
ofnumpy.ndarray
of dtype object) – Predictive posterior beliefs over hidden states expected under the policy, whereqs_pi[t]
stores the beliefs about states expected under the policy at timet
I (
numpy.ndarray
of dtype object) – For each state factor, contains a 2Dnumpy.ndarray
whose element i,j yields the probability of reaching the goal state backwards from state j after i steps.
- Returns
inductive_cost – Cost of visited this state using backwards induction under the policy in question
- Return type
float
- pymdp.control.calc_pA_info_gain(pA, qo_pi, qs_pi)
Compute expected Dirichlet information gain about parameters
pA
under a policy- Parameters
pA (
numpy.ndarray
of dtype object) – Dirichlet parameters over observation model (same shape asA
)qo_pi (
list
ofnumpy.ndarray
of dtype object) – Predictive posterior beliefs over observations expected under the policy, whereqo_pi[t]
stores the beliefs about observations expected under the policy at timet
qs_pi (
list
ofnumpy.ndarray
of dtype object) – Predictive posterior beliefs over hidden states expected under the policy, whereqs_pi[t]
stores the beliefs about hidden states expected under the policy at timet
- Returns
infogain_pA – Surprise (about Dirichlet parameters) expected under the policy in question
- Return type
float
- pymdp.control.calc_pA_info_gain_factorized(pA, qo_pi, qs_pi, A_factor_list)
Compute expected Dirichlet information gain about parameters
pA
under a policy. In this version of the function, we assume that the observation model is factorized, i.e. that each observation modality depends on a subset of the hidden state factors.- Parameters
pA (
numpy.ndarray
of dtype object) – Dirichlet parameters over observation model (same shape asA
)qo_pi (
list
ofnumpy.ndarray
of dtype object) – Predictive posterior beliefs over observations expected under the policy, whereqo_pi[t]
stores the beliefs about observations expected under the policy at timet
qs_pi (
list
ofnumpy.ndarray
of dtype object) – Predictive posterior beliefs over hidden states expected under the policy, whereqs_pi[t]
stores the beliefs about hidden states expected under the policy at timet
A_factor_list (
list
oflist
ofint
) – List of lists, whereA_factor_list[m]
is a list of the hidden state factor indices that observation modality with the indexm
depends on
- Returns
infogain_pA – Surprise (about Dirichlet parameters) expected under the policy in question
- Return type
float
- pymdp.control.calc_pB_info_gain(pB, qs_pi, qs_prev, policy)
Compute expected Dirichlet information gain about parameters
pB
under a given policy- Parameters
pB (
numpy.ndarray
of dtype object) – Dirichlet parameters over transition model (same shape asB
)qs_pi (
list
ofnumpy.ndarray
of dtype object) – Predictive posterior beliefs over hidden states expected under the policy, whereqs_pi[t]
stores the beliefs about hidden states expected under the policy at timet
qs_prev (
numpy.ndarray
of dtype object) – Posterior over hidden states at beginning of trajectory (before receiving observations)policy (2D
numpy.ndarray
) – Array that stores actions entailed by a policy over time. Shape is(num_timesteps, num_factors)
wherenum_timesteps
is the temporal depth of the policy andnum_factors
is the number of control factors.
- Returns
infogain_pB – Surprise (about dirichlet parameters) expected under the policy in question
- Return type
float
- pymdp.control.calc_pB_info_gain_interactions(pB, qs_pi, qs_prev, B_factor_list, policy)
Compute expected Dirichlet information gain about parameters
pB
under a given policy- Parameters
pB (
numpy.ndarray
of dtype object) – Dirichlet parameters over transition model (same shape asB
)qs_pi (
list
ofnumpy.ndarray
of dtype object) – Predictive posterior beliefs over hidden states expected under the policy, whereqs_pi[t]
stores the beliefs about hidden states expected under the policy at timet
qs_prev (
numpy.ndarray
of dtype object) – Posterior over hidden states at beginning of trajectory (before receiving observations)B_factor_list (
list
oflist
ofint
) – List of lists, whereB_factor_list[f]
is a list of the hidden state factor indices that hidden state factor with the indexf
depends onpolicy (2D
numpy.ndarray
) – Array that stores actions entailed by a policy over time. Shape is(num_timesteps, num_factors)
wherenum_timesteps
is the temporal depth of the policy andnum_factors
is the number of control factors.
- Returns
infogain_pB – Surprise (about dirichlet parameters) expected under the policy in question
- Return type
float
- pymdp.control.calc_states_info_gain(A, qs_pi)
Computes the Bayesian surprise or information gain about states of a policy, using the observation model and the hidden state distribution expected under that policy.
- Parameters
A (
numpy.ndarray
of dtype object) – Sensory likelihood mapping or ‘observation model’, mapping from hidden states to observations. Each elementA[m]
of stores annumpy.ndarray
multidimensional array for observation modalitym
, whose entriesA[m][i, j, k, ...]
store the probability of observation leveli
given hidden state levelsj, k, ...
qs_pi (
list
ofnumpy.ndarray
of dtype object) – Predictive posterior beliefs over hidden states expected under the policy, whereqs_pi[t]
stores the beliefs about hidden states expected under the policy at timet
- Returns
states_surprise – Bayesian surprise (about states) or salience expected under the policy in question
- Return type
float
- pymdp.control.calc_states_info_gain_factorized(A, qs_pi, A_factor_list)
Computes the Bayesian surprise or information gain about states of a policy, using the observation model and the hidden state distribution expected under that policy.
- Parameters
A (
numpy.ndarray
of dtype object) – Sensory likelihood mapping or ‘observation model’, mapping from hidden states to observations. Each elementA[m]
of stores annumpy.ndarray
multidimensional array for observation modalitym
, whose entriesA[m][i, j, k, ...]
store the probability of observation leveli
given hidden state levelsj, k, ...
qs_pi (
list
ofnumpy.ndarray
of dtype object) – Predictive posterior beliefs over hidden states expected under the policy, whereqs_pi[t]
stores the beliefs about hidden states expected under the policy at timet
A_factor_list (
list
oflist
ofint
) – List of lists, whereA_factor_list[m]
is a list of the hidden state factor indices that observation modality with the indexm
depends on
- Returns
states_surprise – Bayesian surprise (about states) or salience expected under the policy in question
- Return type
float
- pymdp.control.construct_policies(num_states, num_controls=None, policy_len=1, control_fac_idx=None)
Generate a
list
of policies. The returned arraypolicies
is alist
that stores one policy per entry. A particular policy (policies[i]
) has shape(num_timesteps, num_factors)
wherenum_timesteps
is the temporal depth of the policy andnum_factors
is the number of control factors.- Parameters
num_states (
list
ofint
) –list
of the dimensionalities of each hidden state factornum_controls (
list
ofint
, defaultNone
) –list
of the dimensionalities of each control state factor. IfNone
, then is automatically computed as the dimensionality of each hidden state factor that is controllablepolicy_len (
int
, default 1) – temporal depth (“planning horizon”) of policiescontrol_fac_idx (
list
ofint
) –list
of indices of the hidden state factors that are controllable (i.e. those state factorsi
wherenum_controls[i] > 1
)
- Returns
policies –
list
that stores each policy as a 2D array inpolicies[p_idx]
. Shape ofpolicies[p_idx]
is(num_timesteps, num_factors)
wherenum_timesteps
is the temporal depth of the policy andnum_factors
is the number of control factors.- Return type
list
of 2Dnumpy.ndarray
- pymdp.control.get_expected_obs(qs_pi, A)
Compute the expected observations under a policy, also known as the posterior predictive density over observations
- Parameters
qs_pi (
list
ofnumpy.ndarray
of dtype object) – Predictive posterior beliefs over hidden states expected under the policy, whereqs_pi[t]
stores the beliefs about hidden states expected under the policy at timet
A (
numpy.ndarray
of dtype object) – Sensory likelihood mapping or ‘observation model’, mapping from hidden states to observations. Each elementA[m]
of stores annumpy.ndarray
multidimensional array for observation modalitym
, whose entriesA[m][i, j, k, ...]
store the probability of observation leveli
given hidden state levelsj, k, ...
- Returns
qo_pi – Predictive posterior beliefs over observations expected under the policy, where
qo_pi[t]
stores the beliefs about observations expected under the policy at timet
- Return type
list
ofnumpy.ndarray
of dtype object
- pymdp.control.get_expected_obs_factorized(qs_pi, A, A_factor_list)
Compute the expected observations under a policy, also known as the posterior predictive density over observations
- Parameters
qs_pi (
list
ofnumpy.ndarray
of dtype object) – Predictive posterior beliefs over hidden states expected under the policy, whereqs_pi[t]
stores the beliefs about hidden states expected under the policy at timet
A (
numpy.ndarray
of dtype object) – Sensory likelihood mapping or ‘observation model’, mapping from hidden states to observations. Each elementA[m]
of stores annumpy.ndarray
multidimensional array for observation modalitym
, whose entriesA[m][i, j, k, ...]
store the probability of observation leveli
given hidden state levelsj, k, ...
A_factor_list (
list
oflist
ofint
) – List of lists of hidden state factor indices that each observation modality depends on. Each elementA_factor_list[i]
is a list of the factor indices that modality i’s observation model depends on.
- Returns
qo_pi – Predictive posterior beliefs over observations expected under the policy, where
qo_pi[t]
stores the beliefs about observations expected under the policy at timet
- Return type
list
ofnumpy.ndarray
of dtype object
- pymdp.control.get_expected_states(qs, B, policy)
Compute the expected states under a policy, also known as the posterior predictive density over states
- Parameters
qs (
numpy.ndarray
of dtype object) – Marginal posterior beliefs over hidden states at a given timepoint.B (
numpy.ndarray
of dtype object) – Dynamics likelihood mapping or ‘transition model’, mapping from hidden states att
to hidden states att+1
, given some control stateu
. Each elementB[f]
of this object array stores a 3-D tensor for hidden state factorf
, whose entriesB[f][s, v, u]
store the probability of hidden state levels
at the current time, given hidden state levelv
and actionu
at the previous time.policy (2D
numpy.ndarray
) – Array that stores actions entailed by a policy over time. Shape is(num_timesteps, num_factors)
wherenum_timesteps
is the temporal depth of the policy andnum_factors
is the number of control factors.
- Returns
qs_pi – Predictive posterior beliefs over hidden states expected under the policy, where
qs_pi[t]
stores the beliefs about hidden states expected under the policy at timet
- Return type
list
ofnumpy.ndarray
of dtype object
- pymdp.control.get_expected_states_interactions(qs, B, B_factor_list, policy)
Compute the expected states under a policy, also known as the posterior predictive density over states
- Parameters
qs (
numpy.ndarray
of dtype object) – Marginal posterior beliefs over hidden states at a given timepoint.B (
numpy.ndarray
of dtype object) – Dynamics likelihood mapping or ‘transition model’, mapping from hidden states att
to hidden states att+1
, given some control stateu
. Each elementB[f]
of this object array stores a 3-D tensor for hidden state factorf
, whose entriesB[f][s, v, u]
store the probability of hidden state levels
at the current time, given hidden state levelv
and actionu
at the previous time.B_factor_list (
list
oflist
ofint
) – List of lists of hidden state factors each hidden state factor depends on. Each elementB_factor_list[i]
is a list of the factor indices that factor i’s dynamics depend on.policy (2D
numpy.ndarray
) – Array that stores actions entailed by a policy over time. Shape is(num_timesteps, num_factors)
wherenum_timesteps
is the temporal depth of the policy andnum_factors
is the number of control factors.
- Returns
qs_pi – Predictive posterior beliefs over hidden states expected under the policy, where
qs_pi[t]
stores the beliefs about hidden states expected under the policy at timet
- Return type
list
ofnumpy.ndarray
of dtype object
- pymdp.control.get_num_controls_from_policies(policies)
Calculates the
list
of dimensionalities of control factors (num_controls
) from thelist
or array of policies. This assumes a policy space such that for each control factor, there is at least one policy that entails taking the action with the maximum index along that control factor.- Parameters
policies (
list
of 2Dnumpy.ndarray
) –list
that stores each policy as a 2D array inpolicies[p_idx]
. Shape ofpolicies[p_idx]
is(num_timesteps, num_factors)
wherenum_timesteps
is the temporal depth of the policy andnum_factors
is the number of control factors.- Returns
num_controls –
list
of the dimensionalities of each control state factor, computed here automatically from alist
of policies.- Return type
list
ofint
- pymdp.control.sample_action(q_pi, policies, num_controls, action_selection='deterministic', alpha=16.0)
Computes the marginal posterior over actions and then samples an action from it, one action per control factor.
- Parameters
q_pi (1D
numpy.ndarray
) – Posterior beliefs over policies, i.e. a vector containing one posterior probability per policy.policies (
list
of 2Dnumpy.ndarray
) –list
that stores each policy as a 2D array inpolicies[p_idx]
. Shape ofpolicies[p_idx]
is(num_timesteps, num_factors)
wherenum_timesteps
is the temporal depth of the policy andnum_factors
is the number of control factors.num_controls (
list
ofint
) –list
of the dimensionalities of each control state factor.action_selection (
str
, default “deterministic”) – String indicating whether whether the selected action is chosen as the maximum of the posterior over actions, or whether it’s sampled from the posterior marginal over actionsalpha (
float
, default 16.0) – Action selection precision – the inverse temperature of the softmax that is used to scale the action marginals before sampling. This is only used ifaction_selection
argument is “stochastic”
- Returns
selected_policy – Vector containing the indices of the actions for each control factor
- Return type
1D
numpy.ndarray
- pymdp.control.sample_policy(q_pi, policies, num_controls, action_selection='deterministic', alpha=16.0)
Samples a policy from the posterior over policies, taking the action (per control factor) entailed by the first timestep of the selected policy.
- Parameters
q_pi (1D
numpy.ndarray
) – Posterior beliefs over policies, i.e. a vector containing one posterior probability per policy.policies (
list
of 2Dnumpy.ndarray
) –list
that stores each policy as a 2D array inpolicies[p_idx]
. Shape ofpolicies[p_idx]
is(num_timesteps, num_factors)
wherenum_timesteps
is the temporal depth of the policy andnum_factors
is the number of control factors.num_controls (
list
ofint
) –list
of the dimensionalities of each control state factor.action_selection (string, default "deterministic") – String indicating whether whether the selected policy is chosen as the maximum of the posterior over policies, or whether it’s sampled from the posterior over policies.
alpha (float, default 16.0) – Action selection precision – the inverse temperature of the softmax that is used to scale the policy posterior before sampling. This is only used if
action_selection
argument is “stochastic”
- Returns
selected_policy – Vector containing the indices of the actions for each control factor
- Return type
1D
numpy.ndarray
- pymdp.control.select_highest(options_array)
Selects the highest value among the provided ones. If the higher value is more than once and they’re closer than 1e-5, a random choice is made. :param options_array: The array to examine :type options_array:
numpy.ndarray
- Returns
- Return type
The highest value in the given list
- pymdp.control.update_posterior_policies(qs, A, B, C, policies, use_utility=True, use_states_info_gain=True, use_param_info_gain=False, pA=None, pB=None, E=None, I=None, gamma=16.0)
Update posterior beliefs about policies by computing expected free energy of each policy and integrating that with the prior over policies
E
. This is intended to be used in conjunction with theupdate_posterior_states
method of theinference
module, since only the posterior about the hidden states at the current timestepqs
is assumed to be provided, unconditional on policies. The predictive posterior over hidden states under all policies Q(s, pi) is computed using the starting posterior about states at the current timestepqs
and the generative model (e.g.A
,B
,C
)- Parameters
qs (
numpy.ndarray
of dtype object) – Marginal posterior beliefs over hidden states at current timepoint (unconditioned on policies)A (
numpy.ndarray
of dtype object) – Sensory likelihood mapping or ‘observation model’, mapping from hidden states to observations. Each elementA[m]
of stores annumpy.ndarray
multidimensional array for observation modalitym
, whose entriesA[m][i, j, k, ...]
store the probability of observation leveli
given hidden state levelsj, k, ...
B (
numpy.ndarray
of dtype object) – Dynamics likelihood mapping or ‘transition model’, mapping from hidden states att
to hidden states att+1
, given some control stateu
. Each elementB[f]
of this object array stores a 3-D tensor for hidden state factorf
, whose entriesB[f][s, v, u]
store the probability of hidden state levels
at the current time, given hidden state levelv
and actionu
at the previous time.C (
numpy.ndarray
of dtype object) – Prior over observations or ‘prior preferences’, storing the “value” of each outcome in terms of relative log probabilities. This is softmaxed to form a proper probability distribution before being used to compute the expected utility term of the expected free energy.policies (
list
of 2Dnumpy.ndarray
) –list
that stores each policy inpolicies[p_idx]
. Shape ofpolicies[p_idx]
is(num_timesteps, num_factors)
where num_timesteps is the temporal depth of the policy andnum_factors
is the number of control factors.use_utility (
Bool
, defaultTrue
) – Boolean flag that determines whether expected utility should be incorporated into computation of EFE.use_states_info_gain (
Bool
, defaultTrue
) – Boolean flag that determines whether state epistemic value (info gain about hidden states) should be incorporated into computation of EFE.use_param_info_gain (
Bool
, defaultFalse
) – Boolean flag that determines whether parameter epistemic value (info gain about generative model parameters) should be incorporated into computation of EFE.pA (
numpy.ndarray
of dtype object, optional) – Dirichlet parameters over observation model (same shape asA
)pB (
numpy.ndarray
of dtype object, optional) – Dirichlet parameters over transition model (same shape asB
)E (1D
numpy.ndarray
, optional) – Vector of prior probabilities of each policy (what’s referred to in the active inference literature as “habits”)I (
numpy.ndarray
of dtype object) – For each state factor, contains a 2Dnumpy.ndarray
whose element i,j yields the probability of reaching the goal state backwards from state j after i steps.gamma (float, default 16.0) – Prior precision over policies, scales the contribution of the expected free energy to the posterior over policies
- Returns
q_pi (1D
numpy.ndarray
) – Posterior beliefs over policies, i.e. a vector containing one posterior probability per policy.G (1D
numpy.ndarray
) – Negative expected free energies of each policy, i.e. a vector containing one negative expected free energy per policy.
- pymdp.control.update_posterior_policies_factorized(qs, A, B, C, A_factor_list, B_factor_list, policies, use_utility=True, use_states_info_gain=True, use_param_info_gain=False, pA=None, pB=None, E=None, I=None, gamma=16.0)
Update posterior beliefs about policies by computing expected free energy of each policy and integrating that with the prior over policies
E
. This is intended to be used in conjunction with theupdate_posterior_states
method of theinference
module, since only the posterior about the hidden states at the current timestepqs
is assumed to be provided, unconditional on policies. The predictive posterior over hidden states under all policies Q(s, pi) is computed using the starting posterior about states at the current timestepqs
and the generative model (e.g.A
,B
,C
)- Parameters
qs (
numpy.ndarray
of dtype object) – Marginal posterior beliefs over hidden states at current timepoint (unconditioned on policies)A (
numpy.ndarray
of dtype object) – Sensory likelihood mapping or ‘observation model’, mapping from hidden states to observations. Each elementA[m]
of stores annumpy.ndarray
multidimensional array for observation modalitym
, whose entriesA[m][i, j, k, ...]
store the probability of observation leveli
given hidden state levelsj, k, ...
B (
numpy.ndarray
of dtype object) – Dynamics likelihood mapping or ‘transition model’, mapping from hidden states att
to hidden states att+1
, given some control stateu
. Each elementB[f]
of this object array stores a 3-D tensor for hidden state factorf
, whose entriesB[f][s, v, u]
store the probability of hidden state levels
at the current time, given hidden state levelv
and actionu
at the previous time.C (
numpy.ndarray
of dtype object) – Prior over observations or ‘prior preferences’, storing the “value” of each outcome in terms of relative log probabilities. This is softmaxed to form a proper probability distribution before being used to compute the expected utility term of the expected free energy.A_factor_list (
list
oflist``s of ``int
) –list
that stores the indices of the hidden state factor indices that each observation modality depends on. For example, ifA_factor_list[m] = [0, 1]
, then observation modalitym
depends on hidden state factors 0 and 1.B_factor_list (
list
oflist``s of ``int
) –list
that stores the indices of the hidden state factor indices that each hidden state factor depends on. For example, ifB_factor_list[f] = [0, 1]
, then the transitions in hidden state factorf
depend on hidden state factors 0 and 1.policies (
list
of 2Dnumpy.ndarray
) –list
that stores each policy inpolicies[p_idx]
. Shape ofpolicies[p_idx]
is(num_timesteps, num_factors)
where num_timesteps is the temporal depth of the policy andnum_factors
is the number of control factors.use_utility (
Bool
, defaultTrue
) – Boolean flag that determines whether expected utility should be incorporated into computation of EFE.use_states_info_gain (
Bool
, defaultTrue
) – Boolean flag that determines whether state epistemic value (info gain about hidden states) should be incorporated into computation of EFE.use_param_info_gain (
Bool
, defaultFalse
) – Boolean flag that determines whether parameter epistemic value (info gain about generative model parameters) should be incorporated into computation of EFE.pA (
numpy.ndarray
of dtype object, optional) – Dirichlet parameters over observation model (same shape asA
)pB (
numpy.ndarray
of dtype object, optional) – Dirichlet parameters over transition model (same shape asB
)E (1D
numpy.ndarray
, optional) – Vector of prior probabilities of each policy (what’s referred to in the active inference literature as “habits”)I (
numpy.ndarray
of dtype object) – For each state factor, contains a 2Dnumpy.ndarray
whose element i,j yields the probability of reaching the goal state backwards from state j after i steps.gamma (float, default 16.0) – Prior precision over policies, scales the contribution of the expected free energy to the posterior over policies
- Returns
q_pi (1D
numpy.ndarray
) – Posterior beliefs over policies, i.e. a vector containing one posterior probability per policy.G (1D
numpy.ndarray
) – Negative expected free energies of each policy, i.e. a vector containing one negative expected free energy per policy.
- pymdp.control.update_posterior_policies_full(qs_seq_pi, A, B, C, policies, use_utility=True, use_states_info_gain=True, use_param_info_gain=False, prior=None, pA=None, pB=None, F=None, E=None, I=None, gamma=16.0)
Update posterior beliefs about policies by computing expected free energy of each policy and integrating that with the variational free energy of policies
F
and prior over policiesE
. This is intended to be used in conjunction with theupdate_posterior_states_full
method ofinference.py
, since the full posterior over future timesteps, under all policies, is assumed to be provided in the input arrayqs_seq_pi
.- Parameters
qs_seq_pi (
numpy.ndarray
of dtype object) – Posterior beliefs over hidden states for each policy. Nesting structure is policies, timepoints, factors, where e.g.qs_seq_pi[p][t][f]
stores the marginal belief about factorf
at timepointt
under policyp
.A (
numpy.ndarray
of dtype object) – Sensory likelihood mapping or ‘observation model’, mapping from hidden states to observations. Each elementA[m]
of stores annumpy.ndarray
multidimensional array for observation modalitym
, whose entriesA[m][i, j, k, ...]
store the probability of observation leveli
given hidden state levelsj, k, ...
B (
numpy.ndarray
of dtype object) – Dynamics likelihood mapping or ‘transition model’, mapping from hidden states att
to hidden states att+1
, given some control stateu
. Each elementB[f]
of this object array stores a 3-D tensor for hidden state factorf
, whose entriesB[f][s, v, u]
store the probability of hidden state levels
at the current time, given hidden state levelv
and actionu
at the previous time.C (
numpy.ndarray
of dtype object) – Prior over observations or ‘prior preferences’, storing the “value” of each outcome in terms of relative log probabilities. This is softmaxed to form a proper probability distribution before being used to compute the expected utility term of the expected free energy.policies (
list
of 2Dnumpy.ndarray
) –list
that stores each policy inpolicies[p_idx]
. Shape ofpolicies[p_idx]
is(num_timesteps, num_factors)
where num_timesteps is the temporal depth of the policy andnum_factors
is the number of control factors.use_utility (
Bool
, defaultTrue
) – Boolean flag that determines whether expected utility should be incorporated into computation of EFE.use_states_info_gain (
Bool
, defaultTrue
) – Boolean flag that determines whether state epistemic value (info gain about hidden states) should be incorporated into computation of EFE.use_param_info_gain (
Bool
, defaultFalse
) – Boolean flag that determines whether parameter epistemic value (info gain about generative model parameters) should be incorporated into computation of EFE.prior (
numpy.ndarray
of dtype object, defaultNone
) – If provided, this is anumpy
object array with one sub-array per hidden state factor, that stores the prior beliefs about initial states. IfNone
, this defaults to a flat (uninformative) prior over hidden states.pA (
numpy.ndarray
of dtype object, defaultNone
) – Dirichlet parameters over observation model (same shape asA
)pB (
numpy.ndarray
of dtype object, defaultNone
) – Dirichlet parameters over transition model (same shape asB
)F (1D
numpy.ndarray
, defaultNone
) – Vector of variational free energies for each policyE (1D
numpy.ndarray
, defaultNone
) – Vector of prior probabilities of each policy (what’s referred to in the active inference literature as “habits”). IfNone
, this defaults to a flat (uninformative) prior over policies.I (
numpy.ndarray
of dtype object) – For each state factor, contains a 2Dnumpy.ndarray
whose element i,j yields the probability of reaching the goal state backwards from state j after i steps.gamma (
float
, default 16.0) – Prior precision over policies, scales the contribution of the expected free energy to the posterior over policies
- Returns
q_pi (1D
numpy.ndarray
) – Posterior beliefs over policies, i.e. a vector containing one posterior probability per policy.G (1D
numpy.ndarray
) – Negative expected free energies of each policy, i.e. a vector containing one negative expected free energy per policy.
- pymdp.control.update_posterior_policies_full_factorized(qs_seq_pi, A, B, C, A_factor_list, B_factor_list, policies, use_utility=True, use_states_info_gain=True, use_param_info_gain=False, prior=None, pA=None, pB=None, F=None, E=None, I=None, gamma=16.0)
Update posterior beliefs about policies by computing expected free energy of each policy and integrating that with the variational free energy of policies
F
and prior over policiesE
. This is intended to be used in conjunction with theupdate_posterior_states_full
method ofinference.py
, since the full posterior over future timesteps, under all policies, is assumed to be provided in the input arrayqs_seq_pi
.- Parameters
qs_seq_pi (
numpy.ndarray
of dtype object) – Posterior beliefs over hidden states for each policy. Nesting structure is policies, timepoints, factors, where e.g.qs_seq_pi[p][t][f]
stores the marginal belief about factorf
at timepointt
under policyp
.A (
numpy.ndarray
of dtype object) – Sensory likelihood mapping or ‘observation model’, mapping from hidden states to observations. Each elementA[m]
of stores annumpy.ndarray
multidimensional array for observation modalitym
, whose entriesA[m][i, j, k, ...]
store the probability of observation leveli
given hidden state levelsj, k, ...
B (
numpy.ndarray
of dtype object) – Dynamics likelihood mapping or ‘transition model’, mapping from hidden states att
to hidden states att+1
, given some control stateu
. Each elementB[f]
of this object array stores a 3-D tensor for hidden state factorf
, whose entriesB[f][s, v, u]
store the probability of hidden state levels
at the current time, given hidden state levelv
and actionu
at the previous time.C (
numpy.ndarray
of dtype object) – Prior over observations or ‘prior preferences’, storing the “value” of each outcome in terms of relative log probabilities. This is softmaxed to form a proper probability distribution before being used to compute the expected utility term of the expected free energy.A_factor_list (
list
oflist``s of ``int
) –list
that stores the indices of the hidden state factor indices that each observation modality depends on. For example, ifA_factor_list[m] = [0, 1]
, then observation modalitym
depends on hidden state factors 0 and 1.B_factor_list (
list
oflist``s of ``int
) –list
that stores the indices of the hidden state factor indices that each hidden state factor depends on. For example, ifB_factor_list[f] = [0, 1]
, then the transitions in hidden state factorf
depend on hidden state factors 0 and 1.policies (
list
of 2Dnumpy.ndarray
) –list
that stores each policy inpolicies[p_idx]
. Shape ofpolicies[p_idx]
is(num_timesteps, num_factors)
where num_timesteps is the temporal depth of the policy andnum_factors
is the number of control factors.use_utility (
Bool
, defaultTrue
) – Boolean flag that determines whether expected utility should be incorporated into computation of EFE.use_states_info_gain (
Bool
, defaultTrue
) – Boolean flag that determines whether state epistemic value (info gain about hidden states) should be incorporated into computation of EFE.use_param_info_gain (
Bool
, defaultFalse
) – Boolean flag that determines whether parameter epistemic value (info gain about generative model parameters) should be incorporated into computation of EFE.prior (
numpy.ndarray
of dtype object, defaultNone
) – If provided, this is anumpy
object array with one sub-array per hidden state factor, that stores the prior beliefs about initial states. IfNone
, this defaults to a flat (uninformative) prior over hidden states.pA (
numpy.ndarray
of dtype object, defaultNone
) – Dirichlet parameters over observation model (same shape asA
)pB (
numpy.ndarray
of dtype object, defaultNone
) – Dirichlet parameters over transition model (same shape asB
)F (1D
numpy.ndarray
, defaultNone
) – Vector of variational free energies for each policyE (1D
numpy.ndarray
, defaultNone
) – Vector of prior probabilities of each policy (what’s referred to in the active inference literature as “habits”). IfNone
, this defaults to a flat (uninformative) prior over policies.I (
numpy.ndarray
of dtype object) – For each state factor, contains a 2Dnumpy.ndarray
whose element i,j yields the probability of reaching the goal state backwards from state j after i steps.gamma (
float
, default 16.0) – Prior precision over policies, scales the contribution of the expected free energy to the posterior over policies
- Returns
q_pi (1D
numpy.ndarray
) – Posterior beliefs over policies, i.e. a vector containing one posterior probability per policy.G (1D
numpy.ndarray
) – Negative expected free energies of each policy, i.e. a vector containing one negative expected free energy per policy.
Learning
The learning.py
module contains the functions for updating parameters of Dirichlet posteriors (that paramaterise categorical priors and likelihoods) in POMDP generative models.
- pymdp.learning.update_obs_likelihood_dirichlet(pA, A, obs, qs, lr=1.0, modalities='all')
Update Dirichlet parameters of the observation likelihood distribution.
- Parameters
pA (
numpy.ndarray
of dtype object) – Prior Dirichlet parameters over observation model (same shape asA
)A (
numpy.ndarray
of dtype object) – Sensory likelihood mapping or ‘observation model’, mapping from hidden states to observations. Each elementA[m]
of stores annumpy.ndarray
multidimensional array for observation modalitym
, whose entriesA[m][i, j, k, ...]
store the probability of observation leveli
given hidden state levelsj, k, ...
obs (1D
numpy.ndarray
,numpy.ndarray
of dtype object,int
ortuple
) – The observation (generated by the environment). If single modality, this can be a 1Dnumpy.ndarray
(one-hot vector representation) or anint
(observation index) If multi-modality, this can benumpy.ndarray
of dtype object whose entries are 1D one-hot vectors, or atuple
(ofint
)qs (1D
numpy.ndarray
ornumpy.ndarray
of dtype object, default None) – Marginal posterior beliefs over hidden states at current timepoint.lr (float, default 1.0) – Learning rate, scale of the Dirichlet pseudo-count update.
modalities (
list
, default “all”) – Indices (ranging from 0 ton_modalities - 1
) of the observation modalities to include in learning. Defaults to “all”, meaning that modality-specific sub-arrays ofpA
are all updated using the corresponding observations.
- Returns
qA – Posterior Dirichlet parameters over observation model (same shape as
A
), after having updated it with observations.- Return type
numpy.ndarray
of dtype object
- pymdp.learning.update_obs_likelihood_dirichlet_factorized(pA, A, obs, qs, A_factor_list, lr=1.0, modalities='all')
Update Dirichlet parameters of the observation likelihood distribution, in a case where the observation model is reduced (factorized) and only represents the conditional dependencies between the observation modalities and particular hidden state factors (whose indices are specified in each modality-specific entry of
A_factor_list
)- Parameters
pA (
numpy.ndarray
of dtype object) – Prior Dirichlet parameters over observation model (same shape asA
)A (
numpy.ndarray
of dtype object) – Sensory likelihood mapping or ‘observation model’, mapping from hidden states to observations. Each elementA[m]
of stores annumpy.ndarray
multidimensional array for observation modalitym
, whose entriesA[m][i, j, k, ...]
store the probability of observation leveli
given hidden state levelsj, k, ...
obs (1D
numpy.ndarray
,numpy.ndarray
of dtype object,int
ortuple
) – The observation (generated by the environment). If single modality, this can be a 1Dnumpy.ndarray
(one-hot vector representation) or anint
(observation index) If multi-modality, this can benumpy.ndarray
of dtype object whose entries are 1D one-hot vectors, or atuple
(ofint
)qs (1D
numpy.ndarray
ornumpy.ndarray
of dtype object, default None) – Marginal posterior beliefs over hidden states at current timepoint.A_factor_list (
list
oflist
ofint
) – List of lists, where each list with index m contains the indices of the hidden states that observation modality m depends on.lr (float, default 1.0) – Learning rate, scale of the Dirichlet pseudo-count update.
modalities (
list
, default “all”) – Indices (ranging from 0 ton_modalities - 1
) of the observation modalities to include in learning. Defaults to “all”, meaning that modality-specific sub-arrays ofpA
are all updated using the corresponding observations.
- Returns
qA – Posterior Dirichlet parameters over observation model (same shape as
A
), after having updated it with observations.- Return type
numpy.ndarray
of dtype object
- pymdp.learning.update_state_likelihood_dirichlet(pB, B, actions, qs, qs_prev, lr=1.0, factors='all')
Update Dirichlet parameters of the transition distribution.
- Parameters
pB (
numpy.ndarray
of dtype object) – Prior Dirichlet parameters over transition model (same shape asB
)B (
numpy.ndarray
of dtype object) – Dynamics likelihood mapping or ‘transition model’, mapping from hidden states att
to hidden states att+1
, given some control stateu
. Each elementB[f]
of this object array stores a 3-D tensor for hidden state factorf
, whose entriesB[f][s, v, u]
store the probability of hidden state levels
at the current time, given hidden state levelv
and actionu
at the previous time.actions (1D
numpy.ndarray
) – A vector with length equal to the number of control factors, where each element contains the index of the action (for that control factor) performed at a given timestep.qs (1D
numpy.ndarray
ornumpy.ndarray
of dtype object) – Marginal posterior beliefs over hidden states at current timepoint.qs_prev (1D
numpy.ndarray
ornumpy.ndarray
of dtype object) – Marginal posterior beliefs over hidden states at previous timepoint.lr (float, default
1.0
) – Learning rate, scale of the Dirichlet pseudo-count update.factors (
list
, default “all”) – Indices (ranging from 0 ton_factors - 1
) of the hidden state factors to include in learning. Defaults to “all”, meaning that factor-specific sub-arrays ofpB
are all updated using the corresponding hidden state distributions and actions.
- Returns
qB – Posterior Dirichlet parameters over transition model (same shape as
B
), after having updated it with state beliefs and actions.- Return type
numpy.ndarray
of dtype object
- pymdp.learning.update_state_likelihood_dirichlet_interactions(pB, B, actions, qs, qs_prev, B_factor_list, lr=1.0, factors='all')
Update Dirichlet parameters of the transition distribution, in the case when ‘interacting’ hidden state factors are present, i.e. the dynamics of a given hidden state factor f are no longer independent of the dynamics of other hidden state factors.
- Parameters
pB (
numpy.ndarray
of dtype object) – Prior Dirichlet parameters over transition model (same shape asB
)B (
numpy.ndarray
of dtype object) – Dynamics likelihood mapping or ‘transition model’, mapping from hidden states att
to hidden states att+1
, given some control stateu
. Each elementB[f]
of this object array stores a 3-D tensor for hidden state factorf
, whose entriesB[f][s, v, u]
store the probability of hidden state levels
at the current time, given hidden state levelv
and actionu
at the previous time.actions (1D
numpy.ndarray
) – A vector with length equal to the number of control factors, where each element contains the index of the action (for that control factor) performed at a given timestep.qs (1D
numpy.ndarray
ornumpy.ndarray
of dtype object) – Marginal posterior beliefs over hidden states at current timepoint.qs_prev (1D
numpy.ndarray
ornumpy.ndarray
of dtype object) – Marginal posterior beliefs over hidden states at previous timepoint.B_factor_list (
list
oflist
ofint
) – A list of lists, where each elementB_factor_list[f]
is a list of indices of hidden state factors that that are needed to predict the dynamics of hidden state factorf
.lr (float, default
1.0
) – Learning rate, scale of the Dirichlet pseudo-count update.factors (
list
, default “all”) – Indices (ranging from 0 ton_factors - 1
) of the hidden state factors to include in learning. Defaults to “all”, meaning that factor-specific sub-arrays ofpB
are all updated using the corresponding hidden state distributions and actions.
- Returns
qB – Posterior Dirichlet parameters over transition model (same shape as
B
), after having updated it with state beliefs and actions.- Return type
numpy.ndarray
of dtype object
- pymdp.learning.update_state_prior_dirichlet(pD, qs, lr=1.0, factors='all')
Update Dirichlet parameters of the initial hidden state distribution (prior beliefs about hidden states at the beginning of the inference window).
- Parameters
pD (
numpy.ndarray
of dtype object) – Prior Dirichlet parameters over initial hidden state prior (same shape asqs
)qs (1D
numpy.ndarray
ornumpy.ndarray
of dtype object) – Marginal posterior beliefs over hidden states at current timepointlr (float, default
1.0
) – Learning rate, scale of the Dirichlet pseudo-count update.factors (
list
, default “all”) – Indices (ranging from 0 ton_factors - 1
) of the hidden state factors to include in learning. Defaults to “all”, meaning that factor-specific sub-vectors ofpD
are all updated using the corresponding hidden state distributions.
- Returns
qD – Posterior Dirichlet parameters over initial hidden state prior (same shape as
qs
), after having updated it with state beliefs.- Return type
numpy.ndarray
of dtype object
Algos
The algos.py
library contains the functions for implementing message passing algorithms for variational inference on POMDP generative models
Sub-libraries
FPI (Fixed Point Iteration)
- pymdp.algos.fpi.run_vanilla_fpi(A, obs, num_obs, num_states, prior=None, num_iter=10, dF=1.0, dF_tol=0.001, compute_vfe=True)
Update marginal posterior beliefs over hidden states using mean-field variational inference, via fixed point iteration.
- Parameters
A (
numpy.ndarray
of dtype object) – Sensory likelihood mapping or ‘observation model’, mapping from hidden states to observations. Each elementA[m]
of stores annp.ndarray
multidimensional array for observation modalitym
, whose entriesA[m][i, j, k, ...]
store the probability of observation leveli
given hidden state levelsj, k, ...
obs (numpy 1D array or numpy ndarray of dtype object) – The observation (generated by the environment). If single modality, this should be a 1D
np.ndarray
(one-hot vector representation). If multi-modality, this should benp.ndarray
of dtype object whose entries are 1D one-hot vectors.num_obs (list of ints) – List of dimensionalities of each observation modality
num_states (list of ints) – List of dimensionalities of each hidden state factor
prior (numpy ndarray of dtype object, default None) – Prior over hidden states. If absent, prior is set to be the log uniform distribution over hidden states (identical to the initialisation of the posterior)
num_iter (int, default 10) – Number of variational fixed-point iterations to run until convergence.
dF (float, default 1.0) – Initial free energy gradient (dF/dt) before updating in the course of gradient descent.
dF_tol (float, default 0.001) – Threshold value of the time derivative of the variational free energy (dF/dt), to be checked at each iteration. If dF <= dF_tol, the iterations are halted pre-emptively and the final marginal posterior belief(s) is(are) returned
compute_vfe (bool, default True) – Whether to compute the variational free energy at each iteration. If False, the function runs through all variational iterations.
- Returns
qs – Marginal posterior beliefs over hidden states at current timepoint
- Return type
numpy 1D array, numpy ndarray of dtype object, optional
- pymdp.algos.fpi.run_vanilla_fpi_factorized(A, obs, num_obs, num_states, mb_dict, prior=None, num_iter=10, dF=1.0, dF_tol=0.001, compute_vfe=True)
Update marginal posterior beliefs over hidden states using mean-field variational inference, via fixed point iteration.
- Parameters
A (
numpy.ndarray
of dtype object) – Sensory likelihood mapping or ‘observation model’, mapping from hidden states to observations. Each elementA[m]
of stores annp.ndarray
multidimensional array for observation modalitym
, whose entriesA[m][i, j, k, ...]
store the probability of observation leveli
given hidden state levelsj, k, ...
obs (numpy 1D array or numpy ndarray of dtype object) – The observation (generated by the environment). If single modality, this should be a 1D
np.ndarray
(one-hot vector representation). If multi-modality, this should benp.ndarray
of dtype object whose entries are 1D one-hot vectors.num_obs (
list
of ints) – List of dimensionalities of each observation modalitynum_states (
list
of ints) – List of dimensionalities of each hidden state factormb_dict (
Dict
) – Dictionary with two keys (A_factor_list
andA_modality_list
), that stores the factor indices that influence each modality (A_factor_list
) and the modality indices influenced by each factor (A_modality_list
).prior (numpy ndarray of dtype object, default None) – Prior over hidden states. If absent, prior is set to be the log uniform distribution over hidden states (identical to the initialisation of the posterior)
num_iter (int, default 10) – Number of variational fixed-point iterations to run until convergence.
dF (float, default 1.0) – Initial free energy gradient (dF/dt) before updating in the course of gradient descent.
dF_tol (float, default 0.001) – Threshold value of the time derivative of the variational free energy (dF/dt), to be checked at each iteration. If dF <= dF_tol, the iterations are halted pre-emptively and the final marginal posterior belief(s) is(are) returned
compute_vfe (bool, default True) – Whether to compute the variational free energy at each iteration. If False, the function runs through all variational iterations.
- Returns
qs – Marginal posterior beliefs over hidden states at current timepoint
- Return type
numpy 1D array, numpy ndarray of dtype object, optional
MMP (Marginal Message Passing)
- pymdp.algos.mmp.run_mmp(lh_seq, B, policy, prev_actions=None, prior=None, num_iter=10, grad_descent=True, tau=0.25, last_timestep=False)
Marginal message passing scheme for updating marginal posterior beliefs about hidden states over time, conditioned on a particular policy.
- Parameters
lh_seq (
numpy.ndarray
of dtype object) – Log likelihoods of hidden states under a sequence of observations over time. This is assumed to already be log-transformed. Eachlh_seq[t]
contains the log likelihood of hidden states for a particular observation at timet
B (
numpy.ndarray
of dtype object) – Dynamics likelihood mapping or ‘transition model’, mapping from hidden states att
to hidden states att+1
, given some control stateu
. Each elementB[f]
of this object array stores a 3-D tensor for hidden state factorf
, whose entriesB[f][s, v, u]
store the probability of hidden state levels
at the current time, given hidden state levelv
and actionu
at the previous time.policy (2D
numpy.ndarray
) – Matrix of shape(policy_len, num_control_factors)
that indicates the indices of each action (control state index) upon timestept
and control_factorf` in the element ``policy[t,f]
for a given policy.prev_actions (
numpy.ndarray
, default None) – If provided, should be a matrix of previous actions of shape(infer_len, num_control_factors)
that indicates the indices of each action (control state index) taken in the past (up until the current timestep).prior (
numpy.ndarray
of dtype object, default None) – If provided, the prior beliefs about initial states (at t = 0, relative toinfer_len
). IfNone
, this defaults to a flat (uninformative) prior over hidden states.numiter (int, default 10) – Number of variational iterations.
grad_descent (Bool, default True) – Flag for whether to use gradient descent (free energy gradient updates) instead of fixed point solution to the posterior beliefs
tau (float, default 0.25) – Decay constant for use in
grad_descent
version. Tunes the size of the gradient descent updates to the posterior.last_timestep (Bool, default False) – Flag for whether we are at the last timestep of belief updating
- Returns
qs_seq (
numpy.ndarray
of dtype object) – Posterior beliefs over hidden states under the policy. Nesting structure is timepoints, factors, where e.g.qs_seq[t][f]
stores the marginal belief about factorf
at timepointt
under the policy in question.F (float) – Variational free energy of the policy.
- pymdp.algos.mmp.run_mmp_factorized(lh_seq, mb_dict, B, B_factor_list, policy, prev_actions=None, prior=None, num_iter=10, grad_descent=True, tau=0.25, last_timestep=False)
Marginal message passing scheme for updating marginal posterior beliefs about hidden states over time, conditioned on a particular policy.
- Parameters
lh_seq (
numpy.ndarray
of dtype object) – Log likelihoods of hidden states under a sequence of observations over time. This is assumed to already be log-transformed. Eachlh_seq[t]
contains the log likelihood of hidden states for a particular observation at timet
mb_dict (
Dict
) – Dictionary with two keys (A_factor_list
andA_modality_list
), that stores the factor indices that influence each modality (A_factor_list
) and the modality indices influenced by each factor (A_modality_list
).B (
numpy.ndarray
of dtype object) – Dynamics likelihood mapping or ‘transition model’, mapping from hidden states att
to hidden states att+1
, given some control stateu
. Each elementB[f]
of this object array stores a 3-D tensor for hidden state factorf
, whose entriesB[f][s, v, u]
store the probability of hidden state levels
at the current time, given hidden state levelv
and actionu
at the previous time.B_factor_list (
list
oflist
ofint
) – List of lists of hidden state factors each hidden state factor depends on. Each elementB_factor_list[i]
is a list of the factor indices that factor i’s dynamics depend on.policy (2D
numpy.ndarray
) – Matrix of shape(policy_len, num_control_factors)
that indicates the indices of each action (control state index) upon timestept
and control_factorf` in the element ``policy[t,f]
for a given policy.prev_actions (
numpy.ndarray
, default None) – If provided, should be a matrix of previous actions of shape(infer_len, num_control_factors)
that indicates the indices of each action (control state index) taken in the past (up until the current timestep).prior (
numpy.ndarray
of dtype object, default None) – If provided, the prior beliefs about initial states (at t = 0, relative toinfer_len
). IfNone
, this defaults to a flat (uninformative) prior over hidden states.numiter (int, default 10) – Number of variational iterations.
grad_descent (Bool, default True) – Flag for whether to use gradient descent (free energy gradient updates) instead of fixed point solution to the posterior beliefs
tau (float, default 0.25) – Decay constant for use in
grad_descent
version. Tunes the size of the gradient descent updates to the posterior.last_timestep (Bool, default False) – Flag for whether we are at the last timestep of belief updating
- Returns
qs_seq (
numpy.ndarray
of dtype object) – Posterior beliefs over hidden states under the policy. Nesting structure is timepoints, factors, where e.g.qs_seq[t][f]
stores the marginal belief about factorf
at timepointt
under the policy in question.F (float) – Variational free energy of the policy.
Agent class
- class pymdp.agent.Agent(A, B, C=None, D=None, E=None, H=None, pA=None, pB=None, pD=None, num_controls=None, policy_len=1, inference_horizon=1, control_fac_idx=None, policies=None, gamma=16.0, alpha=16.0, use_utility=True, use_states_info_gain=True, use_param_info_gain=False, action_selection='deterministic', sampling_mode='marginal', inference_algo='VANILLA', inference_params=None, modalities_to_learn='all', lr_pA=1.0, factors_to_learn='all', lr_pB=1.0, lr_pD=1.0, use_BMA=True, policy_sep_prior=False, save_belief_hist=False, A_factor_list=None, B_factor_list=None)
The Agent class, the highest-level API that wraps together processes for action, perception, and learning under active inference.
The basic usage is as follows:
>>> my_agent = Agent(A = A, B = C, <more_params>) >>> observation = env.step(initial_action) >>> qs = my_agent.infer_states(observation) >>> q_pi, G = my_agent.infer_policies() >>> next_action = my_agent.sample_action() >>> next_observation = env.step(next_action)
This represents one timestep of an active inference process. Wrapping this step in a loop with an
Env()
class that returns observations and takes actions as inputs, would entail a dynamic agent-environment interaction.- get_future_qs()
Returns the last
self.policy_len
timesteps of each policy-conditioned belief over hidden states. This is a step of pre-processing that needs to be done before computing the expected free energy of policies. We do this to avoid computing the expected free energy of policies using beliefs about hidden states in the past (so-called “post-dictive” beliefs).- Returns
future_qs_seq – Posterior beliefs over hidden states under a policy, in the future. This is a nested
numpy.ndarray
object array, with one sub-arrayfuture_qs_seq[p_idx]
for each policy. The indexing structure is policy->timepoint–>factor, so thatfuture_qs_seq[p_idx][t_idx][f_idx]
refers to beliefs about marginal factorf_idx
expected under policyp_idx
at future timepointt_idx
, relative to the current timestep.- Return type
numpy.ndarray
of dtype object
- infer_policies()
Perform policy inference by optimizing a posterior (categorical) distribution over policies. This distribution is computed as the softmax of
G * gamma + lnE
whereG
is the negative expected free energy of policies,gamma
is a policy precision andlnE
is the (log) prior probability of policies. This function returns the posterior over policies as well as the negative expected free energy of each policy. In this version of the function, the expected free energy of policies is computed using known factorized structure in the model, which speeds up computation (particular the state information gain calculations).- Returns
q_pi (1D
numpy.ndarray
) – Posterior beliefs over policies, i.e. a vector containing one posterior probability per policy.G (1D
numpy.ndarray
) – Negative expected free energies of each policy, i.e. a vector containing one negative expected free energy per policy.
- infer_policies_old()
Perform policy inference by optimizing a posterior (categorical) distribution over policies. This distribution is computed as the softmax of
G * gamma + lnE
whereG
is the negative expected free energy of policies,gamma
is a policy precision andlnE
is the (log) prior probability of policies. This function returns the posterior over policies as well as the negative expected free energy of each policy.- Returns
q_pi (1D
numpy.ndarray
) – Posterior beliefs over policies, i.e. a vector containing one posterior probability per policy.G (1D
numpy.ndarray
) – Negative expected free energies of each policy, i.e. a vector containing one negative expected free energy per policy.
- infer_states(observation, distr_obs=False)
Update approximate posterior over hidden states by solving variational inference problem, given an observation.
- Parameters
observation (
list
ortuple
of ints) – The observation input. Each entryobservation[m]
stores the index of the discrete observation for modalitym
.distr_obs (
bool
) – Whether the observation is a distribution over possible observations, rather than a single observation.
- Returns
qs – Posterior beliefs over hidden states. Depending on the inference algorithm chosen, the resulting
qs
variable will have additional sub-structure to reflect whether beliefs are additionally conditioned on timepoint and policy. For example, in case theself.inference_algo == 'MMP' `` indexing structure is policy->timepoint-->factor, so that ``qs[p_idx][t_idx][f_idx]
refers to beliefs about marginal factorf_idx
expected under policyp_idx
at timepointt_idx
.- Return type
numpy.ndarray
of dtype object
- reset(init_qs=None)
Resets the posterior beliefs about hidden states of the agent to a uniform distribution, and resets time to first timestep of the simulation’s temporal horizon. Returns the posterior beliefs about hidden states.
- Returns
qs – Initialized posterior over hidden states. Depending on the inference algorithm chosen and other parameters (such as the parameters stored within
edge_handling_paramss), the resulting ``qs
variable will have additional sub-structure to reflect whether beliefs are additionally conditioned on timepoint and policy.For example, in case the
self.inference_algo == 'MMP' `, the indexing structure of ``qs
is policy->timepoint–>factor, so thatqs[p_idx][t_idx][f_idx]
refers to beliefs about marginal factorf_idx
expected under policyp_idx
at timepointt_idx
. In this case, the returnedqs
will only have entries filled out for the first timestep, i.e. forq[p_idx][0]
, for all policy-indicesp_idx
. Subsequent entriesq[:][1, 2, ...]
will be initialized to emptynumpy.ndarray
objects.- Return type
numpy.ndarray
of dtype object
- sample_action()
Sample or select a discrete action from the posterior over control states. This function both sets or cachés the action as an internal variable with the agent and returns it. This function also updates time variable (and thus manages consequences of updating the moving reference frame of beliefs) using
self.step_time()
.- Returns
action – Vector containing the indices of the actions for each control factor
- Return type
1D
numpy.ndarray
- set_latest_beliefs(last_belief=None)
Both sets and returns the penultimate belief before the first timestep of the backwards inference horizon. In the case that the inference horizon includes the first timestep of the simulation, then the
latest_belief
is simply the first belief of the whole simulation, or the prior (self.D
). The particular structure of thelatest_belief
depends on the value ofself.edge_handling_params['use_BMA']
.- Returns
latest_belief – Penultimate posterior beliefs over hidden states at the timestep just before the first timestep of the inference horizon. Depending on the value of
self.edge_handling_params['use_BMA']
, the shape of this output array will differ. Ifself.edge_handling_params['use_BMA'] == True
, thenlatest_belief
will be a Bayesian model average of beliefs about hidden states, where the average is taken with respect to posterior beliefs about policies. Otherwise, latest_belief` will be the full, policy-conditioned belief about hidden states, and will have indexing structure policies->factors, such thatlatest_belief[p_idx][f_idx]
refers to the penultimate belief about marginal factorf_idx
under policyp_idx
.- Return type
numpy.ndarray
of dtype object
- step_time()
Advances time by one step. This involves updating the
self.prev_actions
, and in the case of a moving inference horizon, this also shifts the history of post-dictive beliefs forward in time (usingself.set_latest_beliefs()
), so that the penultimate belief before the beginning of the horizon is correctly indexed.- Returns
curr_timestep – The index in absolute simulation time of the current timestep.
- Return type
int
- update_A(obs)
Update approximate posterior beliefs about Dirichlet parameters that parameterise the observation likelihood or
A
array.- Parameters
observation (
list
ortuple
of ints) – The observation input. Each entryobservation[m]
stores the index of the discrete observation for modalitym
.- Returns
qA – Posterior Dirichlet parameters over observation model (same shape as
A
), after having updated it with observations.- Return type
numpy.ndarray
of dtype object
- update_B(qs_prev)
Update posterior beliefs about Dirichlet parameters that parameterise the transition likelihood
- Parameters
qs_prev (1D
numpy.ndarray
ornumpy.ndarray
of dtype object) – Marginal posterior beliefs over hidden states at previous timepoint.- Returns
qB – Posterior Dirichlet parameters over transition model (same shape as
B
), after having updated it with state beliefs and actions.- Return type
numpy.ndarray
of dtype object
- update_D(qs_t0=None)
Update Dirichlet parameters of the initial hidden state distribution (prior beliefs about hidden states at the beginning of the inference window).
- Parameters
qs_t0 (1D
numpy.ndarray
,numpy.ndarray
of dtype object, orNone
) – Marginal posterior beliefs over hidden states at current timepoint. IfNone
, the value ofqs_t0
is set toself.qs_hist[0]
(i.e. the initial hidden state beliefs at the first timepoint). Ifself.inference_algo == "MMP"
, thenqs_t0
is set to be the Bayesian model average of beliefs about hidden states at the first timestep of the backwards inference horizon, where the average is taken with respect to posterior beliefs about policies.- Returns
qD – Posterior Dirichlet parameters over initial hidden state prior (same shape as
qs_t0
), after having updated it with state beliefs.- Return type
numpy.ndarray
of dtype object
Env
The OpenAIGym-inspired Env
base class is the main API that represents the environmental dynamics or “generative process” with
which agents exchange observations and actions
Base class
- class pymdp.envs.Env
The Env base class, loosely-inspired by the analogous
env
class of the OpenAIGym framework.A typical workflow is as follows:
>>> my_env = MyCustomEnv(<some_params>) >>> initial_observation = my_env.reset(initial_state) >>> my_agent.infer_states(initial_observation) >>> my_agent.infer_policies() >>> next_action = my_agent.sample_action() >>> next_observation = my_env.step(next_action)
This would be the first step of an active inference process, where a sub-class of
Env
,MyCustomEnv
is initialized, an initial observation is produced, and these observations are fed into an instance ofAgent
in order to produce an action, that can then be fed back into the theEnv
instance.
Specific environment implementations
All of the following dynamics inherit from Env
and have the
same general usage as above.
|
2-dimensional grid-world implementation with 5 actions (the 4 cardinal directions and staying put). |
|
1-dimensional grid-world implementation with 3 possible movement actions ("LEFT", "STAY", "RIGHT") |
|
Implementation of the visual foraging environment used for scene construction simulations |
|
Implementation of the 3-arm T-Maze environment |
|
Implementation of the 3-arm T-Maze environment where there is an additional null outcome within the cue modality, so that the agent doesn't get a random cue observation, but a null one, when it visits non-cue locations |
Computing the variational free energy in categorical models
Author: Conor Heins
This notebook walks the user through the computation of the variational free energy (or VFE) for discrete generative models, composed of categorical distributions. We can represent these sorts of generative models easily in array-focused programming languages like NumPy
and MATLAB
using matrices and vectors.
In the discrete state-space formulation, the variational free energy is very straightforward to compute when we take advantage of the NDarray representation of probability distributions. The highly-optimized matrix algebra built into today’s numerical computing paradigms can be exploited to write simple, one-line expressions for information-theoretic terms (like Kullback-Leibler divergences, cross-entropies, log-likelihoods, etc.). All these terms end up being variants of things like inner products, matrix-vector products or matrix-matrix multiplications.
This notebook is inspired by the supplementary MATLAB script for the paper “A Step-by-Step Tutorial on Active Inference Modelling and its Application to Empirical Data” by Ryan Smith, Karl Friston, and Christopher Whyte. The original MATLAB script is called VFE_calculation_example.m
, and can be found on Ryan Smith’s GitHub repository, which also contains a set of tutorial scripts that are supplementary to their paper.
Note: When running this notebook in Google Colab, you may have to run !pip install inferactively-pymdp
at the top of the notebook, before you can import pymdp
. That cell is left commented out below, in case you are running this notebook from Google Colab.
# ! pip install inferactively-pymdp
Imports
First, import pymdp
and the modules we’ll need.
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import display, Latex
from pymdp import utils, maths
A simple generative model composed of a likelihood and a prior
We begin by constructing a simple generative model composed of two sets of categorical distributions: a likelihood distribution $P(o|s)$ and a prior distribution $P(s)$.
Because both of these distributions are categorical distributions, we can represent them easily in code using matrices and vectors. The likelihood distribution is a conditional distribution that tells the probability that the generative model assigns to different observations, given different hidden states. Since observations and hidden states are both discrete, this can be efficiently represented as a matrix, each of whose columns store the conditional probabilities of all the observations (the rows), given a particular hidden state. Each hidden state setting is thus indexed by a column. For this reason, each column of this matrix is a categorical probability distributions and its entries must sum to one.
The prior is represented as a 1-D vector and stores the prior probabilities that the generative model assigns to each hidden state.
Finally, to set ourselves up later for showing model inversion and free energy calculations, we initialize an observation that we assume the agent or generative model has “gathered” or “seen.” In pymdp
we can represent these observations either as one-hot vectors (a 1
in one entry, and 0
’s everywhere else) or more sparsely as simple indices (integers) that represent which of the observations was observed. Because observations are not distributions (or technically, they are Dirac distributions), you don’t need to store a whole vector of elements, you can just store a particular index.
Below we create 3 main variables:
An
observation
that represents a real observation that the agent or generative model “sees”.The
prior
distribution that represents the agent’s prior beliefs about the hidden statesthe
likliheood_dist
that represents the agent’s beliefs about how hidden states probabilistically relate to observations.
## Define a quick helper function for printing arrays nicely (see https://gist.github.com/braingineer/d801735dac07ff3ac4d746e1f218ab75)
def matprint(mat, fmt="g"):
col_maxes = [max([len(("{:"+fmt+"}").format(x)) for x in col]) for col in mat.T]
for x in mat:
for i, y in enumerate(x):
print(("{:"+str(col_maxes[i])+fmt+"}").format(y), end=" ")
print("")
num_obs = 2 # dimensionality of observations (2 possible observation levels)
num_states = 2 # dimensionality of observations (2 possible hidden state levels)
observation = 0 # index representation
# observation = np.array([1, 0]) # one hot representation
# observation = utils.onehot(0, num_obs) # one hot representation, but use one of pymdp's utility functions to create a one-hot
prior = np.array([0.5, 0.5]) # simply set it by hand
# prior = np.ones(num_states) / num_states # create a uniform prior distribution by creating a vector of ones and dividing each by the number of states
# prior = utils.norm_dist(np.ones(num_states)) # use one of pymdp's utility functions to normalize any vector into a probability distribution
likelihood_dist = np.array([ [0.8, 0.2],
[0.2, 0.8] ])
if isinstance(observation, np.ndarray):
print(f'Observation:{observation}\n')
else:
print(f'Observation: {observation}\n')
print(f'Prior:{prior}\n')
print('Likelihood:')
matprint(likelihood_dist)
Observation: 0
Prior:[0.5 0.5]
Likelihood:
0.8 0.2
0.2 0.8
Bayesian inference
Before we define variational free energy for this generative model, let’s step through the computations involved in inverting this model using Bayes’ rule, i.e. computing the optimal Bayesian posterior over hidden states, given our sensory observation (which we will formally refer to as $o_t$). By performing exact Bayesian inference first, we can directly compares its results to that of variational free energy minimization. This will help us better understand the meaning of the free energy calculations that come later.
$ \hspace{60mm} P(s | o _t) = \frac{P(o = o_t | s)P(s)}{P(o = o_t)} $
Computing the posterior via Bayes’ rule (also known as “model inversion”) is the essence of optimal statistical inference, when one has a generative model of how one’s data is caused, and some data.
To start with, let’s computing the likelihood term $P(o = o_t | s)$. We can represent the computation of the likelihood with this line of NumPy
: likelihood_dist[observation,:]
. By doing so, we are providing a numerical answer to the question: “What is the likelihood that State 1 vs State 2 caused the actual sensory data $o_t$ that I’ve observed?”.
Mathematically, our likelihood function $P(o = o_t | s)$ (in the statistics literature also referred to as $L(\theta)$) is a function of the hidden states. The particular function over hidden states we use is defined by the observation $o_t$. So you can think of the likelihood as a multi-dimensional function or simply a space of functions, where the particular function you use for inference is determined by your data.
This act of computing the likelihood via an indexing operation likelihood_dist[observation,:]
, is similar in spirit to the operation of conditioning. We are, in some sense, “conditioning” on a particular observation (slicing out a “row” of our likelihood_dist
matrix), and then looking at the induced likelihood over possible causes (hidden states) that could’ve have been the latent source of that observation. This operation captures the essence of what we mean when we speak of “inverting” generative models. Generative models are referred to as generative because they can be used to generate fictive observations, given some fixed setting of latent parameters. Inversion is thus moving in the opposite direction – given some sensory data, what is the consequent likelihood over the causes or hidden states that might have generated that data?
Note: We can also compute the likelihood using a matrix-vector product, if the observation is represented as a one-hot vector.
likelihood_s = likelihood_dist.T @ observation
You can see think of this as a weighted average of the rows of the likelihood matrix, where the weights are provided by the observations. Since we have insisted that the observations are a one-hot vector, this is equivalent to just selecting a single row of the likelihood matrix (the one indexed by the entry of the observation that is 1.0
).
if isinstance(observation, np.ndarray): # use the matrix-vector version if your observation is a one-hot vector
likelihood_s = likelihood_dist.T @ observation
elif isinstance(observation, int): # use the index version if your observation is an integer index
likelihood_s = likelihood_dist[observation,:]
display(Latex(f'$P(o=o_t|s):$'))
print(f'{likelihood_s}')
print('==================\n')
[0.8 0.2]
==================
2. The joint distribution $P(o = o_t, s) = P(o = o_t|s)P(s)$
To compute the full numerator on the right-hand side of the Bayes’ rule equation, we need to compute the joint distribution $P(o = o_t|s)P(s)$.
The joint distribution, which also fully defines the generative model, is the product of the likelihood and prior distributions. The joint distribution expresses the joint probability of observations and hidden states, given the likelihood distribution and the prior over hidden states. If we condition on an actual observation $o_t$, as we have done here, then this joint distribution is, like the likelihood, a function of hidden states.
joint_prob = likelihood_s * prior # element-wise product of the likelihood of each hidden state, given the observation, with the prior probability assigned to each hidden state
display(Latex(f'$P(o = o_t,s) = P(o = o_t|s)P(s):$'))
print(f'{joint_prob}')
print('==================\n')
[0.4 0.1]
==================
3. Marginal probability or marginal likelihood: $\sum_s P(o = o_t, s) = P(o = o_t)$
Let’s remind ourselves of Bayes’ rule below:
$ \hspace{60mm} P(s | o _t) = \frac{P(o = o_t | s)P(s)}{P(o = o_t)} $
So far, using the likelihood, the observation, and the prior, we have computed the numerator of this term: the unnormalized joint distribution $P(o = o_t,s)$. All that’s left is to do is then divide this term by the denominator $P(o=o_t)$. This term is also referred to as the model evidence or the marginal likelihood.
This can be thought of as the probability of having seen the particular observation you saw, under all possible settings of the hidden states. This can be computed by marginalizing the joint distribution, using the prior probabilities of the hidden states:
$ \hspace{60mm}P(o) = \sum_s P(o|s)P(s)$
p_o = joint_prob.sum()
4. Compute the posterior $P(s|o=o_t)$
Now to compute the Bayes-optimal posterior, simply divide the numerator by the denominator!
posterior = joint_prob / p_o # divide the joint by the marginal
display(Latex('$P(s|o = o_t) = \\frac{P(o = o_t,s)}{P(o=o_t)}$:'))
print(f'Posterior over hidden states: {posterior}')
print('==================')
Posterior over hidden states: [0.8 0.2]
==================
The variational free energy
The variational free energy is an upper bound on surprise: surprise is define as $ - \ln p(o)$, and is also known as the negative (log) evidence. By minimizing free energy, we are minimizing surprise and thus maximizing log evidence. This evidence is simply the (log of the) term we computed in the previous step, the normalizing denominator used in Bayes’ rule. We can thus first compute the surprise associated with the observation we saw earlier:
surprise = - np.log(p_o)
display(Latex(f'$- \ln P(o=o_t)$:'))
print(f'Surprise: {surprise.round(3)}')
print('==================')
Surprise: 0.693
==================
By minimizing variational free energy with respect to the parameters of an arbitrary, parametrized distribution, also known as the approximate or variational posterior $Q(s)$, we are forcing the surprise to go down, and the model evidence to go up. In doing so, the approximate posterior distribution will get “closer” to the true posterior distribution (the $P(s|o)$ we computed above), $Q(s) \approx P(s|o)$.
This idea of minimizing a bound on surprise, is the core principle underlying the variational free energy principle, active inference, and the statistical methodology of variational principles more generally. This comes in handy because in many realistic scenarios where agents or systems need to make inferences about the hidden causes of sensory perturbations, performing Bayesian inference exactly becomes intractable (due mostly to the computation of the $P(o)$ denominator term, also known as the normalizing constant). Variational inference turns this intractable marginalization problem into a tractable optimization problem, where one only needs to change the approximate posterior $Q(s)$ in order to minimize a bound on surprise.
But what is the variational free energy exactly, mathematically? It is commonly written as the expectation, under the approximate posterior, of the log ratio between the approximate posterior and the generative model:
$ \hspace{60mm} \mathcal{F} = \mathbb{E}_{Q(s)}[\ln \frac{Q(s)}{P(o, s)}] $
Because of the rules of logarithms, we can also express this as an expected difference between the log posterior and the log generative model:
$ \hspace{60mm} \mathcal{F} = \mathbb{E}_{Q(s)}[\ln Q(s) - \ln P(o, s)] $
By doing taking advantage of some more mathematical relationships (the factorization of the joint, rules for logarithms of products, and the definition of the conditional expectation), we can re-write this as an upper bound on surprise:
$ \hspace{60mm} \mathcal{F} = \mathbb{E}_{Q(s)}[\ln Q(s) - \ln P(s|o)] - \ln P(o) $
By re-arranging this equality and taking advantage of the fact that the quantity $\mathbb{E}_{Q(s)}[\ln Q(s) - \ln P(s|o)]$ (also known as the Kullback-Leibler divergence) is greater than or equal to 0, we can easily show that the VFE is an upper bound on surprise:
$ \hspace{60mm} \mathcal{F} \geq - \ln P(o) $
Let’s now compute the VFE numerically for our generative model and fixed observation. Given we already defined those two things ($o_t$ and $P(o,s)$), the last thing we need to do, is to define an initial setting of our approximate posterior $Q(s)$, and compute the variational free energy using one of the formulas above.
# to begin with, set our initial Q(s) equal to our prior distribution -- i.e. a flat/uninformative belief about hidden states
initial_qs = np.array([0.5, 0.5])
# redefine generative model and observation, for ease
observation = 0
likelihood_dist = np.array([[0.8, 0.2], [0.2, 0.8]])
prior = utils.norm_dist(np.ones(2))
# compute the joint or generative model using the factorization: P(o=o|s)P(s)
joint_prob = likelihood_dist[observation,:] * prior
# compute the variational free energy using the expected log difference formulation
initial_F = initial_qs.dot(np.log(initial_qs) - np.log(joint_prob))
## @NOTE: because of log-rules, `initial_F` can also be computed using the division inside the logarithm:
# initial_F = initial_qs.dot(np.log(initial_qs/joint_prob))
display(Latex('$\mathcal{F} = \mathbb{E}_{Q(s)}[\ln \\frac{Q(s)}{P(o, s)}] = \mathbb{E}_{Q(s)}[\ln Q(s) - \ln P(o, s)]:$'))
print(f'Variational free energy (F) = {initial_F.round(3)}')
print('==================')
Variational free energy (F) = 0.916
==================
Now let’s then change the approximate posterior $Q(s)$ and see what happens to the variational free energy.
By definition, optimal inference (and ‘perfect’ VFE minimization) obtains when we set the approximate posterior $Q(s)$ to be equal to the true posterior, $Q(s) = P(s|o)$.
As an extreme case, let’s do this and then measure the variational free energy. We can see that it has decreased, relative to it’s initial setting when the approximate posterior was equal to the prior $P(s)$.
final_qs = posterior.copy() # now we just assert that the approximate posterior is equal to the true posterior
final_F = final_qs.dot(np.log(final_qs) - np.log(joint_prob))
display(Latex('$\mathcal{F} = \mathbb{E}_{Q(s)}[\ln \\frac{Q(s)}{P(o, s)}] = \mathbb{E}_{Q(s)}[\ln Q(s) - \ln P(o, s)]:$'))
print(f'F = {final_F.round(3)}')
print('==================')
F = 0.693
==================
Notice that the free energy is 0.693 – if you recall, this is exactly the surprise or negative log marginal likelihood $-\ln P(o = o_t)$, related to the marginal likelihood quantity $P(o=o_t)$ that we had to compute when performing exact Bayesian inference earlier.
It is easy to see why this equality should hold, when we consider the VFE in terms of the surprise bound:
$ \hspace{60mm} \mathcal{F} = \mathbb{E}_{Q(s)}[\ln Q(s) - \ln P(s|o)] - \ln P(o) $
When $Q(s) = P(s|o)$, as we have enforced with the line final_qs = posterior.copy()
above, the first term $\mathbb{E}{Q(s)}[\ln Q(s) - \ln P(s|o)]$ will become 0. This is a property of the Kullback-Leibler divergence between two distributions $\operatorname{D}{KL}(Q \parallel P) = \mathbb{E}_{Q}[\ln Q - \ln P]$.
So since that KL divergence is 0, this is a case when the bound on surprise is exact:
$ \hspace{60mm} \mathcal{F} = 0 - \ln P(o) = - \ln P(o)$
In other words, minimizing free energy “as much as possible” in the case of this generative model, means that the approximate posterior equals the true posterior, and the free energy bound becomes “tight”, i.e. the free energy simply IS the negative log evidence!
# compute the surprise (which we can do analytically for this simple generative model)
p_o = joint_prob.sum()
surprise = - np.log(p_o)
display(Latex('$-\ln P(o):$'))
print(f'{surprise.round(3)}')
print('==================\n')
0.693
==================
Bonus material: auto-differentiation to perform gradient descent on the variational free energy
In this section, we are going to numerically differentiate the variational free energy with respect to the parameters of the approximate posterior. We can take advantage of modern autodifferentiation software, that can calculate the derivatives of arbitrary computer programs, to compute these derivatives without doing any analytic differentiation.
import autograd.numpy as np_auto # Thinly-wrapped version of Numpy that is auto-differentiable
from autograd import grad # this is the function that we use to evaluate derivatives
from functools import partial
First, let’s create a variational free energy function, that takes as input the variational posterior qs
, an observation obs
, the likelihood matrix likelihood
and a prior distribution prior
.
By defining this computation as a function, we can calculate its derivatives automatically using the autodifferentiation package autograd
, which allows users to pass gradients through arbitrary Python code. It includes a special version of numpy
(autograd.numpy
) that has a differentiable back-end. We will thus use this “special” verison of numpy
when we instantiate our arrays and perform numerical operations on them.
# define the variational free energy as a function of the approximate posterior, an observation, and the generative model
def vfe(qs, obs, likelihood, prior):
"""
Quick docstring below on inputs
Arguments:
=========
`qs` [1D np_auto.ndarray]: variational posterior over hidden states
`obs` [int]: index of the observation
`likelihood` [2D np_auto.ndarray]: likelihood distribution P(o|s), relating hidden states probabilistically to observations
`prior` [1D np_auto.ndarray]: prior over hidden states
"""
likelihood_s = likelihood[obs,:]
joint = likelihood_s * prior
vfe = qs @ (np_auto.log(qs) - np_auto.log(joint))
return vfe
# initialize an observation, an initial variational posterior, a prior, and a likelihood matrix
obs = 0
init_qs = np_auto.array([0.5, 0.5])
prior = np_auto.array([0.5, 0.5])
likelihood_dist = np_auto.array([ [0.8, 0.2],
[0.2, 0.8] ])
# this use of `partial` creates a version of the vfe function that is a function of Qs only,
# with the other parameters (the observation, the generative model) fixed as constant parameters
vfe_qs = partial(vfe, obs = obs, likelihood = likelihood_dist, prior = prior)
# By calling `grad` on a function, we get out a function that can be used to compute the gradients of the VFE with respect to its input (in our case, `qs`)
grad_vfe_qs = grad(vfe_qs)
Now that we have a variational free energy function, and a derivative function that will give us the gradients of the VFE, with respect $Q(s)$, we can use it to perform a gradient descent on VFE, with respect to the parameters of $Q(s)$. In other words, we will implement this discretized differential equation to update the approximate posterior:
$ \hspace{60mm} Q(s){t+1} =Q(s){t} - \frac{\partial \mathcal{F}}{\partial Q(s)_{t}} $
Where the gradient term $\frac{\partial \mathcal{F}}{\partial Q(s)_{t}}$ will simply be the output of the gradient function grad_vfe_qs
we’ve defined above.
# number of iterations of gradient descent to perform
n_iter = 40
qs_hist = np_auto.zeros((n_iter, 2))
qs_hist[0,:] = init_qs
vfe_hist = np_auto.zeros(n_iter)
vfe_hist[0] = vfe_qs(qs = init_qs)
learning_rate = 0.1 # learning rate to prevent gradient steps that are too big (overshooting)
for i in range(n_iter-1):
dFdqs = grad_vfe_qs(qs_hist[i,:])
ln_qs = np_auto.log(qs_hist[i,:]) - learning_rate * dFdqs # transform qs to log-space to perform gradient descent
qs_hist[i+1,:] = maths.softmax(ln_qs) # re-normalize to make it a proper, categorical Q(s) again
vfe_hist[i+1] = vfe_qs(qs = qs_hist[i+1,:]) # measure final variational free energy
Since we were storing the history of the variational free energy over the course of the gradient descent, we can plot its trajectory below.
fig = plt.figure(figsize=(8,6))
plt.plot(vfe_hist)
plt.ylabel('$\\mathcal{F}$', fontsize = 22)
plt.xlabel("Iteration", fontsize = 22)
plt.xlim(0, n_iter)
plt.ylim(vfe_hist[-1], vfe_hist[0])
plt.title('Gradient descent on VFE', fontsize = 24)
Text(0.5, 1.0, 'Gradient descent on VFE')

final_qs, initial_F, final_F = qs_hist[-1,:], vfe_hist[0], vfe_hist[-1]
display(Latex('$\mathcal{F} = \mathbb{E}_{Q(s)}[\ln \\frac{Q(s)}{P(o, s)}] = \mathbb{E}_{Q(s)}[\ln Q(s) - \ln P(o, s)]:$'))
print(f'Initial F = {initial_F.round(3)}')
print('==================')
print('Final posterior:')
print(f'{final_qs.round(1)}') # note that because of numerical imprecision in the gradient descent (constant learning rate, etc.), the approximate posterior will not exactly be the optimal posterior
print('==================')
print(f'Final F = {vfe_qs(final_qs).round(3)}')
print('==================')
Initial F = 0.916
==================
Final posterior:
[0.8 0.2]
==================
Final F = 0.693
==================