Control

The control.py module contains the functions for performing inference of policies (sequences of control states) in POMDP generative models, according to active inference.

pymdp.control.calc_expected_utility(qo_pi, C)

Computes the expected utility of a policy, using the observation distribution expected under that policy and a prior preference vector.

Parameters
  • qo_pi (list of numpy.ndarray of dtype object) – Predictive posterior beliefs over observations expected under the policy, where qo_pi[t] stores the beliefs about observations expected under the policy at time t

  • C (numpy.ndarray of dtype object) – Prior over observations or ‘prior preferences’, storing the “value” of each outcome in terms of relative log probabilities. This is softmaxed to form a proper probability distribution before being used to compute the expected utility.

Returns

expected_util – Utility (reward) expected under the policy in question

Return type

float

pymdp.control.calc_pA_info_gain(pA, qo_pi, qs_pi)

Compute expected Dirichlet information gain about parameters pA under a policy

Parameters
  • pA (numpy.ndarray of dtype object) – Dirichlet parameters over observation model (same shape as A)

  • qo_pi (list of numpy.ndarray of dtype object) – Predictive posterior beliefs over observations expected under the policy, where qo_pi[t] stores the beliefs about observations expected under the policy at time t

  • qs_pi (list of numpy.ndarray of dtype object) – Predictive posterior beliefs over hidden states expected under the policy, where qs_pi[t] stores the beliefs about hidden states expected under the policy at time t

Returns

infogain_pA – Surprise (about Dirichlet parameters) expected under the policy in question

Return type

float

pymdp.control.calc_pB_info_gain(pB, qs_pi, qs_prev, policy)

Compute expected Dirichlet information gain about parameters pB under a given policy

Parameters
  • pB (numpy.ndarray of dtype object) – Dirichlet parameters over transition model (same shape as B)

  • qs_pi (list of numpy.ndarray of dtype object) – Predictive posterior beliefs over hidden states expected under the policy, where qs_pi[t] stores the beliefs about hidden states expected under the policy at time t

  • qs_prev (numpy.ndarray of dtype object) – Posterior over hidden states at beginning of trajectory (before receiving observations)

  • policy (2D numpy.ndarray) – Array that stores actions entailed by a policy over time. Shape is (num_timesteps, num_factors) where num_timesteps is the temporal depth of the policy and num_factors is the number of control factors.

Returns

infogain_pB – Surprise (about dirichlet parameters) expected under the policy in question

Return type

float

pymdp.control.calc_states_info_gain(A, qs_pi)

Computes the Bayesian surprise or information gain about states of a policy, using the observation model and the hidden state distribution expected under that policy.

Parameters
  • A (numpy.ndarray of dtype object) – Sensory likelihood mapping or ‘observation model’, mapping from hidden states to observations. Each element A[m] of stores an numpy.ndarray multidimensional array for observation modality m, whose entries A[m][i, j, k, ...] store the probability of observation level i given hidden state levels j, k, ...

  • qs_pi (list of numpy.ndarray of dtype object) – Predictive posterior beliefs over hidden states expected under the policy, where qs_pi[t] stores the beliefs about hidden states expected under the policy at time t

Returns

states_surprise – Bayesian surprise (about states) or salience expected under the policy in question

Return type

float

pymdp.control.construct_policies(num_states, num_controls=None, policy_len=1, control_fac_idx=None)

Generate a list of policies. The returned array policies is a list that stores one policy per entry. A particular policy (policies[i]) has shape (num_timesteps, num_factors) where num_timesteps is the temporal depth of the policy and num_factors is the number of control factors.

Parameters
  • num_states (list of int) – list of the dimensionalities of each hidden state factor

  • num_controls (list of int, default None) – list of the dimensionalities of each control state factor. If None, then is automatically computed as the dimensionality of each hidden state factor that is controllable

  • policy_len (int, default 1) – temporal depth (“planning horizon”) of policies

  • control_fac_idx (list of int) – list of indices of the hidden state factors that are controllable (i.e. those state factors i where num_controls[i] > 1)

Returns

policieslist that stores each policy as a 2D array in policies[p_idx]. Shape of policies[p_idx] is (num_timesteps, num_factors) where num_timesteps is the temporal depth of the policy and num_factors is the number of control factors.

Return type

list of 2D numpy.ndarray

pymdp.control.get_expected_obs(qs_pi, A)

Compute the expected observations under a policy, also known as the posterior predictive density over observations

Parameters
  • qs_pi (list of numpy.ndarray of dtype object) – Predictive posterior beliefs over hidden states expected under the policy, where qs_pi[t] stores the beliefs about hidden states expected under the policy at time t

  • A (numpy.ndarray of dtype object) – Sensory likelihood mapping or ‘observation model’, mapping from hidden states to observations. Each element A[m] of stores an numpy.ndarray multidimensional array for observation modality m, whose entries A[m][i, j, k, ...] store the probability of observation level i given hidden state levels j, k, ...

Returns

qo_pi – Predictive posterior beliefs over observations expected under the policy, where qo_pi[t] stores the beliefs about observations expected under the policy at time t

Return type

list of numpy.ndarray of dtype object

pymdp.control.get_expected_states(qs, B, policy)

Compute the expected states under a policy, also known as the posterior predictive density over states

Parameters
  • qs (numpy.ndarray of dtype object) – Marginal posterior beliefs over hidden states at a given timepoint.

  • B (numpy.ndarray of dtype object) – Dynamics likelihood mapping or ‘transition model’, mapping from hidden states at t to hidden states at t+1, given some control state u. Each element B[f] of this object array stores a 3-D tensor for hidden state factor f, whose entries B[f][s, v, u] store the probability of hidden state level s at the current time, given hidden state level v and action u at the previous time.

  • policy (2D numpy.ndarray) – Array that stores actions entailed by a policy over time. Shape is (num_timesteps, num_factors) where num_timesteps is the temporal depth of the policy and num_factors is the number of control factors.

Returns

qs_pi – Predictive posterior beliefs over hidden states expected under the policy, where qs_pi[t] stores the beliefs about hidden states expected under the policy at time t

Return type

list of numpy.ndarray of dtype object

pymdp.control.get_num_controls_from_policies(policies)

Calculates the list of dimensionalities of control factors (num_controls) from the list or array of policies. This assumes a policy space such that for each control factor, there is at least one policy that entails taking the action with the maximum index along that control factor.

Parameters

policies (list of 2D numpy.ndarray) – list that stores each policy as a 2D array in policies[p_idx]. Shape of policies[p_idx] is (num_timesteps, num_factors) where num_timesteps is the temporal depth of the policy and num_factors is the number of control factors.

Returns

num_controlslist of the dimensionalities of each control state factor, computed here automatically from a list of policies.

Return type

list of int

pymdp.control.sample_action(q_pi, policies, num_controls, action_selection='deterministic', alpha=16.0)

Computes the marginal posterior over actions and then samples an action from it, one action per control factor.

Parameters
  • q_pi (1D numpy.ndarray) – Posterior beliefs over policies, i.e. a vector containing one posterior probability per policy.

  • policies (list of 2D numpy.ndarray) – list that stores each policy as a 2D array in policies[p_idx]. Shape of policies[p_idx] is (num_timesteps, num_factors) where num_timesteps is the temporal depth of the policy and num_factors is the number of control factors.

  • num_controls (list of int) – list of the dimensionalities of each control state factor.

  • action_selection (str, default “deterministic”) – String indicating whether whether the selected action is chosen as the maximum of the posterior over actions, or whether it’s sampled from the posterior marginal over actions

  • alpha (float, default 16.0) – Action selection precision – the inverse temperature of the softmax that is used to scale the action marginals before sampling. This is only used if action_selection argument is “stochastic”

Returns

selected_policy – Vector containing the indices of the actions for each control factor

Return type

1D numpy.ndarray

pymdp.control.sample_policy(q_pi, policies, num_controls, action_selection='deterministic', alpha=16.0)

Samples a policy from the posterior over policies, taking the action (per control factor) entailed by the first timestep of the selected policy.

Parameters
  • q_pi (1D numpy.ndarray) – Posterior beliefs over policies, i.e. a vector containing one posterior probability per policy.

  • policies (list of 2D numpy.ndarray) – list that stores each policy as a 2D array in policies[p_idx]. Shape of policies[p_idx] is (num_timesteps, num_factors) where num_timesteps is the temporal depth of the policy and num_factors is the number of control factors.

  • num_controls (list of int) – list of the dimensionalities of each control state factor.

  • action_selection (string, default "deterministic") – String indicating whether whether the selected policy is chosen as the maximum of the posterior over policies, or whether it’s sampled from the posterior over policies.

  • alpha (float, default 16.0) – Action selection precision – the inverse temperature of the softmax that is used to scale the policy posterior before sampling. This is only used if action_selection argument is “stochastic”

Returns

selected_policy – Vector containing the indices of the actions for each control factor

Return type

1D numpy.ndarray

pymdp.control.select_highest(options_array)

Selects the highest value among the provided ones. If the higher value is more than once and they’re closer than 1e-5, a random choice is made. :param options_array: The array to examine :type options_array: numpy.ndarray

Returns

Return type

The highest value in the given list

pymdp.control.update_posterior_policies(qs, A, B, C, policies, use_utility=True, use_states_info_gain=True, use_param_info_gain=False, pA=None, pB=None, E=None, gamma=16.0)

Update posterior beliefs about policies by computing expected free energy of each policy and integrating that with the prior over policies E. This is intended to be used in conjunction with the update_posterior_states method of the inference module, since only the posterior about the hidden states at the current timestep qs is assumed to be provided, unconditional on policies. The predictive posterior over hidden states under all policies Q(s, pi) is computed using the starting posterior about states at the current timestep qs and the generative model (e.g. A, B, C)

Parameters
  • qs (numpy.ndarray of dtype object) – Marginal posterior beliefs over hidden states at current timepoint (unconditioned on policies)

  • A (numpy.ndarray of dtype object) – Sensory likelihood mapping or ‘observation model’, mapping from hidden states to observations. Each element A[m] of stores an numpy.ndarray multidimensional array for observation modality m, whose entries A[m][i, j, k, ...] store the probability of observation level i given hidden state levels j, k, ...

  • B (numpy.ndarray of dtype object) – Dynamics likelihood mapping or ‘transition model’, mapping from hidden states at t to hidden states at t+1, given some control state u. Each element B[f] of this object array stores a 3-D tensor for hidden state factor f, whose entries B[f][s, v, u] store the probability of hidden state level s at the current time, given hidden state level v and action u at the previous time.

  • C (numpy.ndarray of dtype object) – Prior over observations or ‘prior preferences’, storing the “value” of each outcome in terms of relative log probabilities. This is softmaxed to form a proper probability distribution before being used to compute the expected utility term of the expected free energy.

  • policies (list of 2D numpy.ndarray) – list that stores each policy in policies[p_idx]. Shape of policies[p_idx] is (num_timesteps, num_factors) where num_timesteps is the temporal depth of the policy and num_factors is the number of control factors.

  • use_utility (Bool, default True) – Boolean flag that determines whether expected utility should be incorporated into computation of EFE.

  • use_states_info_gain (Bool, default True) – Boolean flag that determines whether state epistemic value (info gain about hidden states) should be incorporated into computation of EFE.

  • use_param_info_gain (Bool, default False) – Boolean flag that determines whether parameter epistemic value (info gain about generative model parameters) should be incorporated into computation of EFE.

  • pA (numpy.ndarray of dtype object, optional) – Dirichlet parameters over observation model (same shape as A)

  • pB (numpy.ndarray of dtype object, optional) – Dirichlet parameters over transition model (same shape as B)

  • E (1D numpy.ndarray, optional) – Vector of prior probabilities of each policy (what’s referred to in the active inference literature as “habits”)

  • gamma (float, default 16.0) – Prior precision over policies, scales the contribution of the expected free energy to the posterior over policies

Returns

  • q_pi (1D numpy.ndarray) – Posterior beliefs over policies, i.e. a vector containing one posterior probability per policy.

  • G (1D numpy.ndarray) – Negative expected free energies of each policy, i.e. a vector containing one negative expected free energy per policy.

pymdp.control.update_posterior_policies_full(qs_seq_pi, A, B, C, policies, use_utility=True, use_states_info_gain=True, use_param_info_gain=False, prior=None, pA=None, pB=None, F=None, E=None, gamma=16.0)

Update posterior beliefs about policies by computing expected free energy of each policy and integrating that with the variational free energy of policies F and prior over policies E. This is intended to be used in conjunction with the update_posterior_states_full method of inference.py, since the full posterior over future timesteps, under all policies, is assumed to be provided in the input array qs_seq_pi.

Parameters
  • qs_seq_pi (numpy.ndarray of dtype object) – Posterior beliefs over hidden states for each policy. Nesting structure is policies, timepoints, factors, where e.g. qs_seq_pi[p][t][f] stores the marginal belief about factor f at timepoint t under policy p.

  • A (numpy.ndarray of dtype object) – Sensory likelihood mapping or ‘observation model’, mapping from hidden states to observations. Each element A[m] of stores an numpy.ndarray multidimensional array for observation modality m, whose entries A[m][i, j, k, ...] store the probability of observation level i given hidden state levels j, k, ...

  • B (numpy.ndarray of dtype object) – Dynamics likelihood mapping or ‘transition model’, mapping from hidden states at t to hidden states at t+1, given some control state u. Each element B[f] of this object array stores a 3-D tensor for hidden state factor f, whose entries B[f][s, v, u] store the probability of hidden state level s at the current time, given hidden state level v and action u at the previous time.

  • C (numpy.ndarray of dtype object) – Prior over observations or ‘prior preferences’, storing the “value” of each outcome in terms of relative log probabilities. This is softmaxed to form a proper probability distribution before being used to compute the expected utility term of the expected free energy.

  • policies (list of 2D numpy.ndarray) – list that stores each policy in policies[p_idx]. Shape of policies[p_idx] is (num_timesteps, num_factors) where num_timesteps is the temporal depth of the policy and num_factors is the number of control factors.

  • use_utility (Bool, default True) – Boolean flag that determines whether expected utility should be incorporated into computation of EFE.

  • use_states_info_gain (Bool, default True) – Boolean flag that determines whether state epistemic value (info gain about hidden states) should be incorporated into computation of EFE.

  • use_param_info_gain (Bool, default False) – Boolean flag that determines whether parameter epistemic value (info gain about generative model parameters) should be incorporated into computation of EFE.

  • prior (numpy.ndarray of dtype object, default None) – If provided, this is a numpy object array with one sub-array per hidden state factor, that stores the prior beliefs about initial states. If None, this defaults to a flat (uninformative) prior over hidden states.

  • pA (numpy.ndarray of dtype object, default None) – Dirichlet parameters over observation model (same shape as A)

  • pB (numpy.ndarray of dtype object, default None) – Dirichlet parameters over transition model (same shape as B)

  • F (1D numpy.ndarray, default None) – Vector of variational free energies for each policy

  • E (1D numpy.ndarray, default None) – Vector of prior probabilities of each policy (what’s referred to in the active inference literature as “habits”). If None, this defaults to a flat (uninformative) prior over policies.

  • gamma (float, default 16.0) – Prior precision over policies, scales the contribution of the expected free energy to the posterior over policies

Returns

  • q_pi (1D numpy.ndarray) – Posterior beliefs over policies, i.e. a vector containing one posterior probability per policy.

  • G (1D numpy.ndarray) – Negative expected free energies of each policy, i.e. a vector containing one negative expected free energy per policy.