Tutorial 1: Active inference from scratch

Author: Conor Heins

This tutorial guides you through the construction of an active inference agent “from scratch” in a simple grid-world environment. This tutorial is designed to walk you through the various mathematical operations required for running active inference in discrete state spaces, as well as introduce you to the way discrete probability distributions are represented in pymdp (through the use of numpy vectors and matrices). We also show you how to compute the expected free energy in terms of risk and ambiguity, and how to use it to plan sequences of actions.

By the end of this tutorial, you should be able to write simple active inference models from scratch using the main components of any POMDP generative model: the A, B, C, and D arrays, as well as be able to write the active inference loop for an agent engaged in a perception-action exchange with its environment.

Note: When running this notebook in Google Colab, you may have to run !pip install inferactively-pymdp at the top of the notebook, before you can import pymdp. That cell is left commented out below, in case you are running this notebook from Google Colab.

# ! pip install inferactively-pymdp

Imports

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

Define some auxiliary functions

Here are some plotting functions that will come in handy throughout the tutorial.

def plot_likelihood(matrix, xlabels = list(range(9)), ylabels = list(range(9)), title_str = "Likelihood distribution (A)"):
    """
    Plots a 2-D likelihood matrix as a heatmap
    """

    if not np.isclose(matrix.sum(axis=0), 1.0).all():
      raise ValueError("Distribution not column-normalized! Please normalize (ensure matrix.sum(axis=0) == 1.0 for all columns)")
    
    fig = plt.figure(figsize = (6,6))
    ax = sns.heatmap(matrix, xticklabels = xlabels, yticklabels = ylabels, cmap = 'gray', cbar = False, vmin = 0.0, vmax = 1.0)
    plt.title(title_str)
    plt.show()

def plot_grid(grid_locations, num_x = 3, num_y = 3 ):
    """
    Plots the spatial coordinates of GridWorld as a heatmap, with each (X, Y) coordinate 
    labeled with its linear index (its `state id`)
    """

    grid_heatmap = np.zeros((num_x, num_y))
    for linear_idx, location in enumerate(grid_locations):
      y, x = location
      grid_heatmap[y, x] = linear_idx
    sns.set(font_scale=1.5)
    sns.heatmap(grid_heatmap, annot=True, cbar = False, fmt='.0f', cmap='crest')

def plot_point_on_grid(state_vector, grid_locations):
    """
    Plots the current location of the agent on the grid world
    """
    state_index = np.where(state_vector)[0][0]
    y, x = grid_locations[state_index]
    grid_heatmap = np.zeros((3,3))
    grid_heatmap[y,x] = 1.0
    sns.heatmap(grid_heatmap, cbar = False, fmt='.0f')

def plot_beliefs(belief_dist, title_str=""):
    """
    Plot a categorical distribution or belief distribution, stored in the 1-D numpy vector `belief_dist`
    """

    if not np.isclose(belief_dist.sum(), 1.0):
      raise ValueError("Distribution not normalized! Please normalize")

    plt.grid(zorder=0)
    plt.bar(range(belief_dist.shape[0]), belief_dist, color='r', zorder=3)
    plt.xticks(range(belief_dist.shape[0]))
    plt.title(title_str)
    plt.show()

The Basics: categorical distributions

Let’s start by creating a simple categorical distribution $P(x)$. We will represent this distribution numerically with a 1-D numpy array (i.e. a numpy.ndarray where ndim == 1). The discrete entries of this vector can be thought of as the probabilities of seeing each of the $K$ alternative outcomes (the support of the distribution), if we were to sample once from the distribution. We can write the distribution as a vector of probabilities, follows:

$$ P(x) = \begin{bmatrix} P(x = 0) \ P(x = 1)\ \vdots \ P(x = K) \end{bmatrix} $$

We can use numpy and the norm_dist function from utils library of pymdp to quickly create a random categorical distribution.

from pymdp import utils

my_categorical = np.random.rand(3)
# my_categorical = np.array([0.5, 0.3, 0.8]) # could also just write in your own numbers
my_categorical = utils.norm_dist(my_categorical) # normalizes the distribution so it integrates to 1.0

print(my_categorical.reshape(-1,1)) # we reshape it to display it like a column vector
print(f'Integral of the distribution: {round(my_categorical.sum(), 2)}')

[[0.32596814]
 [0.2566914 ]
 [0.41734046]]
Integral of the distribution: 1.0

We can sample a discrete outcome from this distribution using the sample() function from the utils module

sampled_outcome = utils.sample(my_categorical)
print(f'Sampled outcome: {sampled_outcome}')

Sampled outcome: 0

Now plot the beliefs using our plot_beliefs() function that we defined in the beginning.

plot_beliefs(my_categorical, title_str = "A random (unconditional) Categorical distribution")

../_images/1fab43e26043d4c7e212bc147ec8dae0754c85e7f09760b3d61449342de0d522.png

Now let’s move onto conditional categorical distributions or likelihoods, i.e. how we represent the distribution of one discrete random variable X, conditioned on the settings of another discrete random variable Y.

We would write these mathematically as:

$$ \begin{align} P(X | Y) \end{align} $$

And can represent them using 2-D numpy.ndarrays i.e. matrices

# initialize it with random numbers
p_x_given_y = np.random.rand(3, 4)
print(p_x_given_y.round(3))

[[0.019 0.696 0.755 0.644]
 [0.89  0.965 0.703 0.219]
 [0.721 0.109 0.95  0.075]]

# normalize it
p_x_given_y = utils.norm_dist(p_x_given_y)
print(p_x_given_y.round(3))

[[0.012 0.393 0.313 0.686]
 [0.546 0.545 0.292 0.234]
 [0.442 0.062 0.395 0.08 ]]

Now print the first column (i.e. $P(X | Y = 0)$) and show that it’s a proper categorical distribution

print(p_x_given_y[:,0].reshape(-1,1))
print(f'Integral of P(X|Y=0): {p_x_given_y[:,0].sum()}')

[[0.01165754]
 [0.54609769]
 [0.44224478]]
Integral of P(X|Y=0): 1.0

So column i of the matrix p_x_given_y represents the conditional probability of $X$, given the i-th level of the random variable $Y$, i.e. $P(X | Y = i)$

Next: taking expectations of random variables using matrix-vector products

""" Create a P(Y) and P(X|Y) using the same numbers from the slides """

p_y = np.array([0.75, 0.25]) # this is already normalized - you don't need to `utils.norm_dist()` it!

# the columns here are already normalized - you don't need to `utils.norm_dist()` it!
p_x_given_y = np.array([[0.6, 0.5],
                        [0.15, 0.41], 
                        [0.25, 0.09]])

print(p_y.round(3).reshape(-1,1))
print(p_x_given_y.round(3))

[[0.75]
 [0.25]]
[[0.6  0.5 ]
 [0.15 0.41]
 [0.25 0.09]]

Calculate expected value of $X$, given our current belief about $Y$, i.e. $P(Y)$ using a simple matrix-vector product, of the form $\mathbf{A}\mathbf{x}$.

""" Calculate the expectation using numpy's dot product functionality """

# first version of the dot product (using the method of a numpy array)
E_x_wrt_y = p_x_given_y.dot(p_y)

# second version of the dot product (using the function np.dot with two arguments)
# E_x_wrt_y = np.dot(p_x_given_y, p_y)

Now print it and verify the result is a proper categorical distribution (integrates to 1.0)

print(E_x_wrt_y)
print(f'Integral: {E_x_wrt_y.sum().round(3)}')

[0.575 0.215 0.21 ]
Integral: 1.0

A simple environment: Grid-world

Now that we understand categorical distributions and how to take conditional expectations of random variables, with categorical conditional and prior distributions, let’s move onto a worked example of Active Inference, so-called ‘Grid-World’.

Before we write down the generative model $P(o,s)$ of an active inference agent who will navigate Grid-World, let’s start by establishing what exactly this environment is and how we’re going to represent it mathematically.

import itertools

""" Create  the grid locations in the form of a list of (Y, X) tuples -- HINT: use itertools """
grid_locations = list(itertools.product(range(3), repeat = 2))
print(grid_locations)

[(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)]

plot_grid(grid_locations)

../_images/46b0d87c6788af87531ae2a58066b2b49a73a18dea7d83cb74203d493e045527.png

Building the generative model: $\mathbf{A}$, $\mathbf{B}$, $\mathbf{C}$, and $\mathbf{D}$

1. The A matrix or $P(o\mid s)$.

The generative model’s “prior beliefs” about how hidden states relate to observations

""" Create variables for the storing the dimensionalities of the hidden states and the observations """

n_states = len(grid_locations)
n_observations = len(grid_locations)

print(f'Dimensionality of hidden states: {n_states}')
print(f'Dimensionality of observations: {n_observations}')

Dimensionality of hidden states: 9
Dimensionality of observations: 9

""" Create the A matrix  """

A = np.zeros( (n_states, n_observations) )

""" Create an umambiguous or 'noise-less' mapping between hidden states and observations """

np.fill_diagonal(A, 1.0)

# alternative:
# A = np.eye(n_observations, n_states)

Likelihood matrices show relationships between indices of the state space, which are indexed linearly. Remember the (Y, X) coordinates of the grid world don’t necessarily neatly map onto to the locations in the matrix here, which has ‘unwrapped’ the Grid World’s (Y, X) locations into columns/rows.

plot_likelihood(A, title_str = "A matrix or $P(o|s)$")

../_images/8b907543092ec6db57140680e056da526dcca2efb2ceb847d0852f2d09067654.png

How could we introduce uncertainty into this A matrix? I.e. how do we introduce ambiguity or uncertainty into the agent’s model of the world?

""" Remind yourself of the mapping between linear indices (0 through 8) and grid locations (Y, X) """
plot_grid(grid_locations)

A_noisy = A.copy()

# this line says: the probability of seeing yourself in location 0, given you're in location 0, is 1/3, AKA P(o == 0 | s == 0) = 0.3333....
A_noisy[0,0] = 1 / 3.0 # corresponds to location (0,0)

# this line says: the probability of seeing yourself in location 1, given you're in location 0, is 1/3, AKA P(o == 1 | s == 0) = 0.3333....
A_noisy[1,0] = 1 / 3.0 # corresponds to one step to the right from (0, 1)

# this line says: the probability of seeing yourself in location 3, given you're in location 0, is 1/3, AKA P(o == 3 | s == 0) = 0.3333....
A_noisy[3,0] = 1 / 3.0 # corresponds to one step down from (1, 0)

plot_likelihood(A_noisy, title_str = 'modified A matrix where location (0,0) is "blurry"')

../_images/fad228935979ce417110db8b684076558e99dba074605a6bcbbc60fb1d91cbca.png

""" Let's make ake one grid location "ambiguous" in the sense that it could be easily confused with neighbouring locations """
my_A_noisy = A_noisy.copy()

# locations 3 and 7 are the nearest neighbours to location 6
my_A_noisy[3,6] = 1.0 / 3.0
my_A_noisy[6,6] = 1.0 / 3.0
my_A_noisy[7,6] = 1.0 / 3.0

# Alternatively: you could have the probability spread among locations 3, 4, 6, and 7. This is basically saying, that whole lower-left corner of grid-world is blurry, if you're in location 6
# Remember to make sure the A matrix is column normalized. So if you do it this way, with the probabilities spread among 4 perceived locations, then you'll have to make sure the probabilities sum to 1.0
# my_A_noisy[3,6] = 1.0 / 4.0
# my_A_noisy[4,6] = 1.0 / 4.0
# my_A_noisy[6,6] = 1.0 / 4.0
# my_A_noisy[7,6] = 1.0 / 4.0

Now plot the new A matrix, that now has two “blurry” locations

plot_likelihood(my_A_noisy, title_str = "Noisy A matrix now with TWO ambiguous locations")

../_images/ca3849a9641e6d2bf7c15719b6d447c0e9b5477612648e9f07e84432777a1ea4.png

2. The B matrix or $P(s_{t}\mid s_{t-1}, u_{t-1})$.

The generative model’s “prior beliefs” about (controllable) transitions between hidden states over time. Namely, how do hidden states at time $t$ result from hidden states at some previous time $t-1$. These transition dynamics are further conditioned on some past action $u_t$.

actions = ["UP", "DOWN", "LEFT", "RIGHT", "STAY"]

def create_B_matrix():
  B = np.zeros( (len(grid_locations), len(grid_locations), len(actions)) )

  for action_id, action_label in enumerate(actions):

    for curr_state, grid_location in enumerate(grid_locations):

      y, x = grid_location

      if action_label == "UP":
        next_y = y - 1 if y > 0 else y 
        next_x = x
      elif action_label == "DOWN":
        next_y = y + 1 if y < 2 else y 
        next_x = x
      elif action_label == "LEFT":
        next_x = x - 1 if x > 0 else x 
        next_y = y
      elif action_label == "RIGHT":
        next_x = x + 1 if x < 2 else x 
        next_y = y
      elif action_label == "STAY":
        next_x = x
        next_y = y
      new_location = (next_y, next_x)
      next_state = grid_locations.index(new_location)
      B[next_state, curr_state, action_id] = 1.0
  return B

B = create_B_matrix()

Let’s now explore what it looks to “take” an action, using matrix-vector product of an action-conditioned “slice” of the $\mathbf{B}$ array and a previous state vector

""" Define a starting location""" 
starting_location = (1,0)

"""get the linear index of the state"""
state_index = grid_locations.index(starting_location)

"""  and create a state vector out of it """
starting_state = utils.onehot(state_index, n_states)

plot_point_on_grid(starting_state, grid_locations)

../_images/31536a46fc94904d17851543eb4604a877671e72a306ea1575cb05d6050c721c.png

plot_beliefs(starting_state, "Categorical distribution over the starting state")

../_images/a723ddc92dd0dcd677fbe9e73f0d7847127a80a326eccf34b5f389d21420b8c1.png

Now let’s imagine we’re moving “RIGHT” - write the conditional expectation, that will create the state vector corresponding to the new state after taking a step to the right.

""" Redefine the action here, just for reference """
actions = ["UP", "DOWN", "LEFT", "RIGHT", "STAY"]

""" Generate the next state vector, given the starting state and the B matrix"""
right_action_idx = actions.index("RIGHT") 
next_state = B[:,:, right_action_idx].dot(starting_state) # input the indices to the B matrix

""" Plot the next state, after taking the action """
plot_point_on_grid(next_state, grid_locations)

../_images/60200faaca2f1fb735ce9b779b82d268fb4313ce36a023a7558e93819db6ec87.png

Now let’s imagine we’re moving “DOWN” from where we just landed - write the conditional expectation, that will create the state vector corresponding to the new state after taking a step down.

""" Generate the next state vector, given the previous state and the B matrix"""
prev_state = next_state.copy()
down_action_index = actions.index("DOWN")
next_state = B[:,:,down_action_index].dot(prev_state)

"""  Plot the new state vector, after making the movement """
plot_point_on_grid(next_state, grid_locations)

../_images/291b91fff7dce162b888667f0c47c9ec93f6e5d3ab0d7ddc88a16e27d31c188a.png

3. The prior over observations: the $\mathbf{C}$ vector or $\tilde{P}(o)$

The (biased) generative model’s prior preference for particular observations, encoded in terms of probabilities.

""" Create an empty vector to store the preferences over observations """
C = np.zeros(n_observations)

""" Choose an observation index to be the 'desired' rewarding index, and fill out the C vector accordingly """
desired_location = (2,2) # choose a desired location
desired_location_index = grid_locations.index(desired_location) # get the linear index of the grid location, in terms of 0 through 8

C[desired_location_index] = 1.0 # set the preference for that location to be 100%, i.e. 1.0

"""  Let's look at the prior preference distribution """
plot_beliefs(C, title_str = "Preferences over observations")

../_images/0ef52fe27c40321dc4c5e7465ea91a5d3baa62a68586797c401513a99cd2087b.png

3. The prior over hidden states: the $\mathbf{D}$ vector or $P(s)$

The generative model’s prior belief over hidden states at the first timestep.

""" Create a D vector, basically a belief that the agent has about its own starting location """

# create a one-hot / certain belief about initial state
D = utils.onehot(0, n_states)

# demonstrate hwo belief about initial state can also be uncertain / spread among different possible initial states
# alternative, where you have a degenerate/noisy prior belief
# D = utils.norm_dist(np.ones(n_states))

""" Let's look at the prior over hidden states """
plot_beliefs(D, title_str = "Prior beliefs over states")

../_images/0014df13bb9664fe4800a1fedad5dab3a7210c7bab42014dfe5e906b51e0177e.png

Hidden state inference

Hidden state inference proceeds by finding the setting of the optimal variational posterior $q(s_t)$ that minimizes the variational free energy. For the simple POMDP generative model and posterior we’re considering here, this update reduces to a remarkably simple update rule, that is essentially proportional to Bayes` rule:

$$ \begin{align} q(s_t) = \sigma\left(\ln \mathbf{A}[o,:] + \ln\mathbf{B}[:,:,u] \cdot q(s_{t-1})\right) \end{align} $$

Import the softmax function and a numerically-stable version of np.log from pymdp.maths

from pymdp.maths import softmax
from pymdp.maths import spm_log_single as log_stable

Below we implement the infer_states() function

""" Create an infer states function that implements the math we just discussed"""

def infer_states(observation_index, A, prior):

  """ Implement inference here -- NOTE: prior is already passed in, so you don't need to do anything with the B matrix. """
  """ This function has already been given P(s_t). The conditional expectation that creates "today's prior", using "yesterday's posterior", will happen *before calling* this function"""
  
  log_likelihood = log_stable(A[observation_index,:])

  log_prior = log_stable(prior)

  qs = softmax(log_likelihood + log_prior)
   
  return qs

Let’s do a simulated single timestep of hidden state inference

qs_past = utils.onehot(4, n_states) # agent believes they were at location 4 -- i.e. (1,1) one timestep ago

last_action = "UP" # the agent knew it moved "UP" one timestep ago
action_id = actions.index(last_action) # get the action index for moving "UP"

Get “today’s prior” using the past posterior and the past action, i.e. calculate:

$$ \begin{align} P(s_t) = \mathbf{E}{q(s{t-1})}\left[P(s_t | s_{t-1}, u_{t-1})\right] \end{align} $$

and choose an observation that is consistent with the new location

prior = B[:,:,action_id].dot(qs_past)

observation_index = 1

Now run the infer_states() function, using the observation, the $\mathbf{A}$ matrix, and the prior we just calculated above

qs_new = infer_states(observation_index, A, prior)
plot_beliefs(qs_new, title_str = "Beliefs about hidden states")

../_images/4d8e3adff1aaedd7069b7e8295ee0ad0a9aef0b6d1ce2d663f2e2a1916e8cb5b.png

Let’s see what happens if observation and the prior “disagree” with eachother?

""" Get an observation that 'conflicts' with the prior """
observation_index = 2 # this is like the agent is seeing itself in location (0, 2)
qs_new = infer_states(observation_index, A, prior)
plot_beliefs(qs_new)

../_images/91adc30d8143b5500ea60f1893725cbcb540ba1ea545ee818fdc6cb4ccc0d3c0.png

What happens if likelihood and prior are partially ambiguous?

""" Create an ambiguous A matrix """
A_partially_ambiguous = softmax(A)
print(A_partially_ambiguous.round(3))

[[0.254 0.093 0.093 0.093 0.093 0.093 0.093 0.093 0.093]
 [0.093 0.254 0.093 0.093 0.093 0.093 0.093 0.093 0.093]
 [0.093 0.093 0.254 0.093 0.093 0.093 0.093 0.093 0.093]
 [0.093 0.093 0.093 0.254 0.093 0.093 0.093 0.093 0.093]
 [0.093 0.093 0.093 0.093 0.254 0.093 0.093 0.093 0.093]
 [0.093 0.093 0.093 0.093 0.093 0.254 0.093 0.093 0.093]
 [0.093 0.093 0.093 0.093 0.093 0.093 0.254 0.093 0.093]
 [0.093 0.093 0.093 0.093 0.093 0.093 0.093 0.254 0.093]
 [0.093 0.093 0.093 0.093 0.093 0.093 0.093 0.093 0.254]]

""" ... and a noisy prior """
noisy_prior = softmax(prior) 
plot_beliefs(noisy_prior)

../_images/b074e663912023a31fb8f8100e5f4e31571607e6300436ab5a0ba25c9c2c6bbe.png

""" Do inference with the new, partially-ambiguous A matrix and the noised-up prior """
qs_new = infer_states(observation_index, A_partially_ambiguous, noisy_prior)
plot_beliefs(qs_new)

../_images/6522574bc0e017c1669abf7f64d12251852909f00584fc251c50c86bc8a8821e.png

Action Selection and the Expected Free Energy

Now we’ll write functions to compute the expected states $Q(s_{t+1}|u_t)$, expected observations $Q(o_{t+1}|u_t)$, the entropy of $P(o|s)$: $\mathbf{H}\left[\mathbf{A}\right]$, and the KL divergence between the expected observations and the prior preferences $\mathbf{C}$.

""" define component functions for computing expected free energy """

def get_expected_states(B, qs_current, action):
  """ Compute the expected states one step into the future, given a particular action """
  qs_u = B[:,:,action].dot(qs_current)

  return qs_u

def get_expected_observations(A, qs_u):
  """ Compute the expected observations one step into the future, given a particular action """

  qo_u = A.dot(qs_u)

  return qo_u

def entropy(A):
  """ Compute the entropy of a set of conditional distributions, i.e. one entropy value per column """

  H_A = - (A * log_stable(A)).sum(axis=0)

  return H_A

def kl_divergence(qo_u, C):
  """ Compute the Kullback-Leibler divergence between two 1-D categorical distributions"""
  
  return (log_stable(qo_u) - log_stable(C)).dot(qo_u)

Now let’s imagine we’re in some starting state, like (1,1) N.B. This is the generative process we’re talking about – i.e. the true state of the world

""" Get state index, create state vector for (1,1) """

state_idx = grid_locations.index((1,1))
state_vector = utils.onehot(state_idx, n_states)
plot_point_on_grid(state_vector, grid_locations)

And let’s furthermore assume we start with the (accurate) belief about our location, that we are in location (1,1). So let’s just make our current $Q(s_t)$ equal to the true state vector. You could think of this as if we just did one ‘step’ of inference using our current observation along with precise A/B matrices.

""" Make qs_current identical to the true starting state """ 
qs_current = state_vector.copy()
plot_beliefs(qs_current, title_str ="Where do we believe we are?")

../_images/29cf9d50d2cd346b8b4151c233e81cedb34c74114c550cc64939c074c4151580.png

And we prefer to be in state (1,2). So we express that as high probability over the corresponding entry of the $\mathbf{C}$ vector – that happens to be index 5.

plot_grid(grid_locations)

""" Create a preference to be in (1,2) """

desired_idx = grid_locations.index((1,2))

C = utils.onehot(desired_idx, n_observations)

plot_beliefs(C, title_str = "Preferences")

../_images/2925c5e689de321a22a0462c8c7bf7e4df700fc1c9429c30d12c9ace5ea4d4de.png

Now to keep things simple, let’s evaluate the expected free energies $\mathbf{G}$ of just two actions that we could take from our current position – either moving LEFT or moving RIGHT. Let’s remind ourselves of what action indices those are

left_idx = actions.index("LEFT")
right_idx = actions.index("RIGHT")

print(f'Action index of moving left: {left_idx}')
print(f'Action index of moving right: {right_idx}')

Action index of moving left: 2
Action index of moving right: 3

And now let’s use the functions we’ve just defined, in combination with our A, B, C arrays and our current posterior qs_current, to compute the expected free energies for the two actions

""" Compute the expected free energies for moving left vs. moving right """
G = np.zeros(2) # store the expected free energies for each action in here

"""
Compute G for MOVE LEFT here 
"""

qs_u_left = get_expected_states(B, qs_current, left_idx)
# alternative
# qs_u_left = B[:,:,left_idx].dot(qs_current)

H_A = entropy(A)
qo_u_left = get_expected_observations(A, qs_u_left)
# alternative
# qo_u_left = A.dot(qs_u_left)

predicted_uncertainty_left = H_A.dot(qs_u_left)
predicted_divergence_left = kl_divergence(qo_u_left, C)
G[0] = predicted_uncertainty_left + predicted_divergence_left

"""
Compute G for MOVE RIGHT here 
"""

qs_u_right = get_expected_states(B, qs_current, right_idx)
# alternative
# qs_u_right = B[:,:,right_idx].dot(qs_current)

H_A = entropy(A)
qo_u_right = get_expected_observations(A, qs_u_right)
# alternative
# qo_u_right = A.dot(qs_u_right)

predicted_uncertainty_right = H_A.dot(qs_u_right)
predicted_divergence_right = kl_divergence(qo_u_right, C)
G[1] = predicted_uncertainty_right + predicted_divergence_right



""" Now let's print the expected free energies for the two actions, that we just calculated """
print(f'Expected free energy of moving left: {G[0]}\n')
print(f'Expected free energy of moving right: {G[1]}\n')

Expected free energy of moving left: 36.84136148790473

Expected free energy of moving right: 0.0

Now let’s use formula for the posterior over actions, i.e.

$$ \begin{align} Q(u_t) = \sigma(-\mathbf{G}) \end{align} $$

to compute the probabilities of each action

Q_u = softmax(-G)

""" and print the probability of each action """
print(f'Probability of moving left: {Q_u[0]}')
print(f'Probability of moving right: {Q_u[1]}')

Probability of moving left: 1.0000000000000037e-16
Probability of moving right: 1.0

For usefulness later on, let’s wrap the expected free energy calculations into a function

def calculate_G(A, B, C, qs_current, actions):

  G = np.zeros(len(actions)) # vector of expected free energies, one per action

  H_A = entropy(A) # entropy of the observation model, P(o|s)

  for action_i in range(len(actions)):
    
    qs_u = get_expected_states(B, qs_current, action_i) # expected states, under the action we're currently looping over
    qo_u = get_expected_observations(A, qs_u)           # expected observations, under the action we're currently looping over

    pred_uncertainty = H_A.dot(qs_u) # predicted uncertainty, i.e. expected entropy of the A matrix
    pred_div = kl_divergence(qo_u, C) # predicted divergence

    G[action_i] = pred_uncertainty + pred_div # sum them together to get expected free energy
  
  return G

Complete Recipe for Active Inference

Sample an observation $o_t$ from the current state of the environment
Perform inference over hidden states i.e., optimize $q(s)$ through free-energy minimization
Calculate expected free energy of actions $\mathbf{G}$
Sample action from the posterior over actions $Q(u_t) \sim \sigma(-\mathbf{G})$.
Use the sampled action $a_t$ to perturb the generative process and go back to step 1.

Let’s start by creating a class that will represent the Grid World environment (i.e., generative process) that our active inference agent will navigate within. Note that we don’t need to specify the generative process in terms of A and B matrices - the generative process will be as arbitrary and complex as the environment is. The A and B matrices are just the agent’s representation of the world and the task, which in this case happen to capture the Markovian, noiseless dynamics of the world perfectly.

class GridWorldEnv():
    
    def __init__(self,starting_state = (0,0)):

        self.init_state = starting_state
        self.current_state = self.init_state
        print(f'Starting state is {starting_state}')
    
    def step(self,action_label):

        (Y, X) = self.current_state

        if action_label == "UP": 
          
          Y_new = Y - 1 if Y > 0 else Y
          X_new = X

        elif action_label == "DOWN": 

          Y_new = Y + 1 if Y < 2 else Y
          X_new = X

        elif action_label == "LEFT": 
          Y_new = Y
          X_new = X - 1 if X > 0 else X

        elif action_label == "RIGHT": 
          Y_new = Y
          X_new = X +1 if X < 2 else X

        elif action_label == "STAY":
          Y_new, X_new = Y, X 
        
        self.current_state = (Y_new, X_new) # store the new grid location

        obs = self.current_state # agent always directly observes the grid location they're in 

        return obs

    def reset(self):
        self.current_state = self.init_state
        print(f'Re-initialized location to {self.init_state}')
        obs = self.current_state
        print(f'..and sampled observation {obs}')

        return obs
    
env = GridWorldEnv()

Starting state is (0, 0)

Now equipped with an environment and a generative model (our A, B, C, and D), we can code up the entire active inference loop

To have everything in one place, let’s re-create the whole generative model

""" Fill out the components of the generative model """

A = np.eye(n_observations, n_states)

B = create_B_matrix()

C = utils.onehot(grid_locations.index( (2, 2) ), n_observations) # make the agent prefer location (2,2) (lower right corner of grid world)

D = utils.onehot(grid_locations.index( (1,2) ), n_states) # start the agent with the prior belief that it starts in location (1,2) 

actions = ["UP", "DOWN", "LEFT", "RIGHT", "STAY"]

Now let’s initialize the environment with starting state (1,2), so that the agent has accurate beliefs about where it’s starting.

env = GridWorldEnv(starting_state = (1,2))

Starting state is (1, 2)

Let’s write a function that runs the whole active inference loop, and then run it for T = 5 timesteps.

""" Write a function that, when called, runs the entire active inference loop for a desired number of timesteps"""

def run_active_inference_loop(A, B, C, D, actions, env, T = 5):

  """ Initialize the prior that will be passed in during inference to be the same as `D` """
  prior = D.copy() # initial prior should be the D vector

  """ Initialize the observation that will be passed in during inference - hint use env.reset()"""
  obs = env.reset() # initialize the `obs` variable to be the first observation you sample from the environment, before `step`-ing it.

  for t in range(T):

    print(f'Time {t}: Agent observes itself in location: {obs}')

    # convert the observation into the agent's observational state space (in terms of 0 through 8)
    obs_idx = grid_locations.index(obs)

    # perform inference over hidden states
    qs_current = infer_states(obs_idx, A, prior)

    plot_beliefs(qs_current, title_str = f"Beliefs about location at time {t}")

    # calculate expected free energy of actions
    G = calculate_G(A, B, C, qs_current, actions)
    
    # compute action posterior
    Q_u = softmax(-G)

    # sample action from probability distribution over actions
    chosen_action = utils.sample(Q_u)

    # compute prior for next timestep of inference
    prior = B[:,:,chosen_action].dot(qs_current) 

    # update generative process
    action_label = actions[chosen_action]

    obs = env.step(action_label)
  
  return qs_current

""" Run the function we just wrote, for T = 5 timesteps """
qs = run_active_inference_loop(A, B, C, D, actions, env, T = 5)

Re-initialized location to (1, 2)
..and sampled observation (1, 2)
Time 0: Agent observes itself in location: (1, 2)

../_images/d8261b3652ec5a4c478182e30f1471b62eeee71ddb9b1ad5b22db42236f7394d.png

Time 1: Agent observes itself in location: (2, 2)

../_images/15f9a1c3cc1d2548eeffc5fba960346b4e8999f272e333a877121de584860fc6.png

Time 2: Agent observes itself in location: (2, 2)

../_images/186bfe2d3adcfcfa4f742465d2332d4eae45e5b9832b4e7e496e9b860f0a868c.png

Time 3: Agent observes itself in location: (2, 2)

../_images/d514e476da21251a18dc6b288032c00e48b0392661bbd2f04d58478f5f449237.png

Time 4: Agent observes itself in location: (2, 2)

../_images/4cbd89ca0dd0fe14f4ca0c896ca045441d7fa2e5b54a10939f22ff1d07873c28.png

Planning

But there’s a problem. Imagine we started the agent at the top left corner (coordinate (0,0)) rather than one step away from the target location

D = utils.onehot(grid_locations.index((0,0)), n_states) # let's have the agent believe it starts in location (0,0)

env = GridWorldEnv(starting_state = (0,0))
qs = run_active_inference_loop(A, B, C, D, actions, env, T = 5)

Starting state is (0, 0)
Re-initialized location to (0, 0)
..and sampled observation (0, 0)
Time 0: Agent observes itself in location: (0, 0)

../_images/a94e0fec52adff8d24b03f2417c40dd601d043e350e6c6c88ed9d695037a97c0.png

Time 1: Agent observes itself in location: (0, 1)

../_images/27a0858276e35abaebbbaadfbea801adfb1c66ade235eb94a16c6cf7021839d4.png

Time 2: Agent observes itself in location: (0, 0)

../_images/330b16896839e7cbbc2d94a55277071a8ea9aaef7f16461ef48efccf228ec80f.png

Time 3: Agent observes itself in location: (0, 1)

../_images/485382c37db11d1030911b055001efa2673083b2a849d6c944d52ffffea9049d.png

Time 4: Agent observes itself in location: (0, 1)

../_images/1fb8bd89f4b1152036b560046661b1602dfb428c6d7b606698680d75f136772a.png

Because the expected free energy of each action is only evaluated for one timestep in the future, the agent has no way of knowing which action to initially take, to get closer to its final “goal” of (2,2). This is because the expected divergence term for all actions $\operatorname{D}_{KL}(Q(o|u) \parallel \mathbf{C})$ is identical for all considered actions (which are just 1-step moves away from starting from (0,0) ). This speaks to the importance of planning, or multiple timestep policies.

Now let’s do active inference with multi-step policies

We can rely on a useful function from pymdp’s control module called construct_policies() to automatically generate a list of all the policies we want to entertain, for a given number of control states (actions) and a desired temporal horizon.

from pymdp.control import construct_policies

policy_len = 4
n_actions = len(actions)

# we have to wrap `n_states` and `n_actions` in a list for reasons that will become clear in Part II
all_policies = construct_policies([n_states], [n_actions], policy_len = policy_len)

print(f'Total number of policies for {n_actions} possible actions and a planning horizon of {policy_len}: {len(all_policies)}')

Total number of policies for 5 possible actions and a planning horizon of 4: 625

Let’s re-write our expected free energy function, but now we loop over policies (sequences of actions), rather than actions (1-step policies)

def calculate_G_policies(A, B, C, qs_current, policies):

  G = np.zeros(len(policies)) # initialize the vector of expected free energies, one per policy
  H_A = entropy(A)            # can calculate the entropy of the A matrix beforehand, since it'll be the same for all policies

  for policy_id, policy in enumerate(policies): # loop over policies - policy_id will be the linear index of the policy (0, 1, 2, ...) and `policy` will be a column vector where `policy[t,0]` indexes the action entailed by that policy at time `t`

    t_horizon = policy.shape[0] # temporal depth of the policy

    G_pi = 0.0 # initialize expected free energy for this policy

    for t in range(t_horizon): # loop over temporal depth of the policy

      action = policy[t,0] # action entailed by this particular policy, at time `t`

      # get the past predictive posterior - which is either your current posterior at the current time (not the policy time) or the predictive posterior entailed by this policy, one timstep ago (in policy time)
      if t == 0:
        qs_prev = qs_current 
      else:
        qs_prev = qs_pi_t
        
      qs_pi_t = get_expected_states(B, qs_prev, action) # expected states, under the action entailed by the policy at this particular time
      qo_pi_t = get_expected_observations(A, qs_pi_t)   # expected observations, under the action entailed by the policy at this particular time

      kld = kl_divergence(qo_pi_t, C) # Kullback-Leibler divergence between expected observations and the prior preferences C

      G_pi_t = H_A.dot(qs_pi_t) + kld # predicted uncertainty + predicted divergence, for this policy & timepoint

      G_pi += G_pi_t # accumulate the expected free energy for each timepoint into the overall EFE for the policy

    G[policy_id] += G_pi
  
  return G

Let’s write a function for computing the action posterior, given the posterior probability of each policy:

def compute_prob_actions(actions, policies, Q_pi):
  P_u = np.zeros(len(actions)) # initialize the vector of probabilities of each action

  for policy_id, policy in enumerate(policies):
    P_u[int(policy[0,0])] += Q_pi[policy_id] # get the marginal probability for the given action, entailed by this policy at the first timestep
  
  P_u = utils.norm_dist(P_u) # normalize the action probabilities
  
  return P_u

Now we can write a new active inference function, that uses temporally-deep planning

def active_inference_with_planning(A, B, C, D, n_actions, env, policy_len = 2, T = 5):

  """ Initialize prior, first observation, and policies """
  
  prior = D # initial prior should be the D vector

  obs = env.reset() # get the initial observation

  policies = construct_policies([n_states], [n_actions], policy_len = policy_len)

  for t in range(T):

    print(f'Time {t}: Agent observes itself in location: {obs}')

    # convert the observation into the agent's observational state space (in terms of 0 through 8)
    obs_idx = grid_locations.index(obs)

    # perform inference over hidden states
    qs_current = infer_states(obs_idx, A, prior)
    plot_beliefs(qs_current, title_str = f"Beliefs about location at time {t}")

    # calculate expected free energy of actions
    G = calculate_G_policies(A, B, C, qs_current, policies)

    # to get action posterior, we marginalize P(u|pi) with the probabilities of each policy Q(pi), given by \sigma(-G)
    Q_pi = softmax(-G)

    # compute the probability of each action
    P_u = compute_prob_actions(actions, policies, Q_pi)

    # sample action from probability distribution over actions
    chosen_action = utils.sample(P_u)

    # compute prior for next timestep of inference
    prior = B[:,:,chosen_action].dot(qs_current) 

    # step the generative process and get new observation
    action_label = actions[chosen_action]
    obs = env.step(action_label)
  
  return qs_current

Now run it to see if the agent is able to navigate to location (2,2)

D = utils.onehot(grid_locations.index((0,0)), n_states) # let's have the agent believe it starts in location (0,0) (upper left corner) 
env = GridWorldEnv(starting_state = (0,0))
qs_final = active_inference_with_planning(A, B, C, D, n_actions, env, policy_len = 3, T = 10)

Starting state is (0, 0)
Re-initialized location to (0, 0)
..and sampled observation (0, 0)
Time 0: Agent observes itself in location: (0, 0)

Time 1: Agent observes itself in location: (0, 0)

../_images/df416ae9be6058e9b83eee53102add72c0ffecec7a782b307109a6d1ba146b7e.png

Time 2: Agent observes itself in location: (0, 1)

../_images/324a5663982ab4fc14c08804a3fcdbfcf6683848f18a6afdfac07d9c87faa284.png

Time 3: Agent observes itself in location: (1, 1)

../_images/23b520edfe74f8c14c52bd4ad7b86fbe6344bc5b03d7b747eba140848eb3bbec.png

Time 4: Agent observes itself in location: (1, 2)

../_images/cf1297a58066558afd5d27970e6bfa97e45c6ecc8aca9411b64f5bbde60082c6.png

Time 5: Agent observes itself in location: (2, 2)

../_images/c07eb235e4c00a106fbbf56df9db81b45620bb235e03c4a5e43059b51015cb66.png

Time 6: Agent observes itself in location: (2, 2)

../_images/9dffb987fb29e9beaed82bbb2d99c31e8a2196e4eb804a16b50385d64aa3f852.png

Time 7: Agent observes itself in location: (2, 2)

../_images/21ab444fb6a67272e487c33f681b6a3e5f60813c497f3c45872fbef21322a588.png

Time 8: Agent observes itself in location: (2, 2)

../_images/468d71cb9f55288e89b3df18bc5baa074550af43c19de6af3a9079fd8bb171b8.png

Time 9: Agent observes itself in location: (2, 2)

../_images/b5b4e9602a329b3bf1641bc2f60311f4c3fbce0e007eec4f9d53556a5b0aa089.png