Control
The control.py
module contains the functions for performing inference of policies (sequences of control states) in POMDP generative models,
according to active inference.
- pymdp.control.calc_expected_utility(qo_pi, C)
Computes the expected utility of a policy, using the observation distribution expected under that policy and a prior preference vector.
- Parameters
qo_pi (
list
ofnumpy.ndarray
of dtype object) – Predictive posterior beliefs over observations expected under the policy, whereqo_pi[t]
stores the beliefs about observations expected under the policy at timet
C (
numpy.ndarray
of dtype object) – Prior over observations or ‘prior preferences’, storing the “value” of each outcome in terms of relative log probabilities. This is softmaxed to form a proper probability distribution before being used to compute the expected utility.
- Returns
expected_util – Utility (reward) expected under the policy in question
- Return type
float
- pymdp.control.calc_pA_info_gain(pA, qo_pi, qs_pi)
Compute expected Dirichlet information gain about parameters
pA
under a policy- Parameters
pA (
numpy.ndarray
of dtype object) – Dirichlet parameters over observation model (same shape asA
)qo_pi (
list
ofnumpy.ndarray
of dtype object) – Predictive posterior beliefs over observations expected under the policy, whereqo_pi[t]
stores the beliefs about observations expected under the policy at timet
qs_pi (
list
ofnumpy.ndarray
of dtype object) – Predictive posterior beliefs over hidden states expected under the policy, whereqs_pi[t]
stores the beliefs about hidden states expected under the policy at timet
- Returns
infogain_pA – Surprise (about Dirichlet parameters) expected under the policy in question
- Return type
float
- pymdp.control.calc_pB_info_gain(pB, qs_pi, qs_prev, policy)
Compute expected Dirichlet information gain about parameters
pB
under a given policy- Parameters
pB (
numpy.ndarray
of dtype object) – Dirichlet parameters over transition model (same shape asB
)qs_pi (
list
ofnumpy.ndarray
of dtype object) – Predictive posterior beliefs over hidden states expected under the policy, whereqs_pi[t]
stores the beliefs about hidden states expected under the policy at timet
qs_prev (
numpy.ndarray
of dtype object) – Posterior over hidden states at beginning of trajectory (before receiving observations)policy (2D
numpy.ndarray
) – Array that stores actions entailed by a policy over time. Shape is(num_timesteps, num_factors)
wherenum_timesteps
is the temporal depth of the policy andnum_factors
is the number of control factors.
- Returns
infogain_pB – Surprise (about dirichlet parameters) expected under the policy in question
- Return type
float
- pymdp.control.calc_states_info_gain(A, qs_pi)
Computes the Bayesian surprise or information gain about states of a policy, using the observation model and the hidden state distribution expected under that policy.
- Parameters
A (
numpy.ndarray
of dtype object) – Sensory likelihood mapping or ‘observation model’, mapping from hidden states to observations. Each elementA[m]
of stores annumpy.ndarray
multidimensional array for observation modalitym
, whose entriesA[m][i, j, k, ...]
store the probability of observation leveli
given hidden state levelsj, k, ...
qs_pi (
list
ofnumpy.ndarray
of dtype object) – Predictive posterior beliefs over hidden states expected under the policy, whereqs_pi[t]
stores the beliefs about hidden states expected under the policy at timet
- Returns
states_surprise – Bayesian surprise (about states) or salience expected under the policy in question
- Return type
float
- pymdp.control.construct_policies(num_states, num_controls=None, policy_len=1, control_fac_idx=None)
Generate a
list
of policies. The returned arraypolicies
is alist
that stores one policy per entry. A particular policy (policies[i]
) has shape(num_timesteps, num_factors)
wherenum_timesteps
is the temporal depth of the policy andnum_factors
is the number of control factors.- Parameters
num_states (
list
ofint
) –list
of the dimensionalities of each hidden state factornum_controls (
list
ofint
, defaultNone
) –list
of the dimensionalities of each control state factor. IfNone
, then is automatically computed as the dimensionality of each hidden state factor that is controllablepolicy_len (
int
, default 1) – temporal depth (“planning horizon”) of policiescontrol_fac_idx (
list
ofint
) –list
of indices of the hidden state factors that are controllable (i.e. those state factorsi
wherenum_controls[i] > 1
)
- Returns
policies –
list
that stores each policy as a 2D array inpolicies[p_idx]
. Shape ofpolicies[p_idx]
is(num_timesteps, num_factors)
wherenum_timesteps
is the temporal depth of the policy andnum_factors
is the number of control factors.- Return type
list
of 2Dnumpy.ndarray
- pymdp.control.get_expected_obs(qs_pi, A)
Compute the expected observations under a policy, also known as the posterior predictive density over observations
- Parameters
qs_pi (
list
ofnumpy.ndarray
of dtype object) – Predictive posterior beliefs over hidden states expected under the policy, whereqs_pi[t]
stores the beliefs about hidden states expected under the policy at timet
A (
numpy.ndarray
of dtype object) – Sensory likelihood mapping or ‘observation model’, mapping from hidden states to observations. Each elementA[m]
of stores annumpy.ndarray
multidimensional array for observation modalitym
, whose entriesA[m][i, j, k, ...]
store the probability of observation leveli
given hidden state levelsj, k, ...
- Returns
qo_pi – Predictive posterior beliefs over observations expected under the policy, where
qo_pi[t]
stores the beliefs about observations expected under the policy at timet
- Return type
list
ofnumpy.ndarray
of dtype object
- pymdp.control.get_expected_states(qs, B, policy)
Compute the expected states under a policy, also known as the posterior predictive density over states
- Parameters
qs (
numpy.ndarray
of dtype object) – Marginal posterior beliefs over hidden states at a given timepoint.B (
numpy.ndarray
of dtype object) – Dynamics likelihood mapping or ‘transition model’, mapping from hidden states att
to hidden states att+1
, given some control stateu
. Each elementB[f]
of this object array stores a 3-D tensor for hidden state factorf
, whose entriesB[f][s, v, u]
store the probability of hidden state levels
at the current time, given hidden state levelv
and actionu
at the previous time.policy (2D
numpy.ndarray
) – Array that stores actions entailed by a policy over time. Shape is(num_timesteps, num_factors)
wherenum_timesteps
is the temporal depth of the policy andnum_factors
is the number of control factors.
- Returns
qs_pi – Predictive posterior beliefs over hidden states expected under the policy, where
qs_pi[t]
stores the beliefs about hidden states expected under the policy at timet
- Return type
list
ofnumpy.ndarray
of dtype object
- pymdp.control.get_num_controls_from_policies(policies)
Calculates the
list
of dimensionalities of control factors (num_controls
) from thelist
or array of policies. This assumes a policy space such that for each control factor, there is at least one policy that entails taking the action with the maximum index along that control factor.- Parameters
policies (
list
of 2Dnumpy.ndarray
) –list
that stores each policy as a 2D array inpolicies[p_idx]
. Shape ofpolicies[p_idx]
is(num_timesteps, num_factors)
wherenum_timesteps
is the temporal depth of the policy andnum_factors
is the number of control factors.- Returns
num_controls –
list
of the dimensionalities of each control state factor, computed here automatically from alist
of policies.- Return type
list
ofint
- pymdp.control.sample_action(q_pi, policies, num_controls, action_selection='deterministic', alpha=16.0)
Computes the marginal posterior over actions and then samples an action from it, one action per control factor.
- Parameters
q_pi (1D
numpy.ndarray
) – Posterior beliefs over policies, i.e. a vector containing one posterior probability per policy.policies (
list
of 2Dnumpy.ndarray
) –list
that stores each policy as a 2D array inpolicies[p_idx]
. Shape ofpolicies[p_idx]
is(num_timesteps, num_factors)
wherenum_timesteps
is the temporal depth of the policy andnum_factors
is the number of control factors.num_controls (
list
ofint
) –list
of the dimensionalities of each control state factor.action_selection (
str
, default “deterministic”) – String indicating whether whether the selected action is chosen as the maximum of the posterior over actions, or whether it’s sampled from the posterior marginal over actionsalpha (
float
, default 16.0) – Action selection precision – the inverse temperature of the softmax that is used to scale the action marginals before sampling. This is only used ifaction_selection
argument is “stochastic”
- Returns
selected_policy – Vector containing the indices of the actions for each control factor
- Return type
1D
numpy.ndarray
- pymdp.control.sample_policy(q_pi, policies, num_controls, action_selection='deterministic', alpha=16.0)
Samples a policy from the posterior over policies, taking the action (per control factor) entailed by the first timestep of the selected policy.
- Parameters
q_pi (1D
numpy.ndarray
) – Posterior beliefs over policies, i.e. a vector containing one posterior probability per policy.policies (
list
of 2Dnumpy.ndarray
) –list
that stores each policy as a 2D array inpolicies[p_idx]
. Shape ofpolicies[p_idx]
is(num_timesteps, num_factors)
wherenum_timesteps
is the temporal depth of the policy andnum_factors
is the number of control factors.num_controls (
list
ofint
) –list
of the dimensionalities of each control state factor.action_selection (string, default "deterministic") – String indicating whether whether the selected policy is chosen as the maximum of the posterior over policies, or whether it’s sampled from the posterior over policies.
alpha (float, default 16.0) – Action selection precision – the inverse temperature of the softmax that is used to scale the policy posterior before sampling. This is only used if
action_selection
argument is “stochastic”
- Returns
selected_policy – Vector containing the indices of the actions for each control factor
- Return type
1D
numpy.ndarray
- pymdp.control.select_highest(options_array)
Selects the highest value among the provided ones. If the higher value is more than once and they’re closer than 1e-5, a random choice is made. :param options_array: The array to examine :type options_array:
numpy.ndarray
- Returns
- Return type
The highest value in the given list
- pymdp.control.update_posterior_policies(qs, A, B, C, policies, use_utility=True, use_states_info_gain=True, use_param_info_gain=False, pA=None, pB=None, E=None, gamma=16.0)
Update posterior beliefs about policies by computing expected free energy of each policy and integrating that with the prior over policies
E
. This is intended to be used in conjunction with theupdate_posterior_states
method of theinference
module, since only the posterior about the hidden states at the current timestepqs
is assumed to be provided, unconditional on policies. The predictive posterior over hidden states under all policies Q(s, pi) is computed using the starting posterior about states at the current timestepqs
and the generative model (e.g.A
,B
,C
)- Parameters
qs (
numpy.ndarray
of dtype object) – Marginal posterior beliefs over hidden states at current timepoint (unconditioned on policies)A (
numpy.ndarray
of dtype object) – Sensory likelihood mapping or ‘observation model’, mapping from hidden states to observations. Each elementA[m]
of stores annumpy.ndarray
multidimensional array for observation modalitym
, whose entriesA[m][i, j, k, ...]
store the probability of observation leveli
given hidden state levelsj, k, ...
B (
numpy.ndarray
of dtype object) – Dynamics likelihood mapping or ‘transition model’, mapping from hidden states att
to hidden states att+1
, given some control stateu
. Each elementB[f]
of this object array stores a 3-D tensor for hidden state factorf
, whose entriesB[f][s, v, u]
store the probability of hidden state levels
at the current time, given hidden state levelv
and actionu
at the previous time.C (
numpy.ndarray
of dtype object) – Prior over observations or ‘prior preferences’, storing the “value” of each outcome in terms of relative log probabilities. This is softmaxed to form a proper probability distribution before being used to compute the expected utility term of the expected free energy.policies (
list
of 2Dnumpy.ndarray
) –list
that stores each policy inpolicies[p_idx]
. Shape ofpolicies[p_idx]
is(num_timesteps, num_factors)
where num_timesteps is the temporal depth of the policy andnum_factors
is the number of control factors.use_utility (
Bool
, defaultTrue
) – Boolean flag that determines whether expected utility should be incorporated into computation of EFE.use_states_info_gain (
Bool
, defaultTrue
) – Boolean flag that determines whether state epistemic value (info gain about hidden states) should be incorporated into computation of EFE.use_param_info_gain (
Bool
, defaultFalse
) – Boolean flag that determines whether parameter epistemic value (info gain about generative model parameters) should be incorporated into computation of EFE.pA (
numpy.ndarray
of dtype object, optional) – Dirichlet parameters over observation model (same shape asA
)pB (
numpy.ndarray
of dtype object, optional) – Dirichlet parameters over transition model (same shape asB
)E (1D
numpy.ndarray
, optional) – Vector of prior probabilities of each policy (what’s referred to in the active inference literature as “habits”)gamma (float, default 16.0) – Prior precision over policies, scales the contribution of the expected free energy to the posterior over policies
- Returns
q_pi (1D
numpy.ndarray
) – Posterior beliefs over policies, i.e. a vector containing one posterior probability per policy.G (1D
numpy.ndarray
) – Negative expected free energies of each policy, i.e. a vector containing one negative expected free energy per policy.
- pymdp.control.update_posterior_policies_full(qs_seq_pi, A, B, C, policies, use_utility=True, use_states_info_gain=True, use_param_info_gain=False, prior=None, pA=None, pB=None, F=None, E=None, gamma=16.0)
Update posterior beliefs about policies by computing expected free energy of each policy and integrating that with the variational free energy of policies
F
and prior over policiesE
. This is intended to be used in conjunction with theupdate_posterior_states_full
method ofinference.py
, since the full posterior over future timesteps, under all policies, is assumed to be provided in the input arrayqs_seq_pi
.- Parameters
qs_seq_pi (
numpy.ndarray
of dtype object) – Posterior beliefs over hidden states for each policy. Nesting structure is policies, timepoints, factors, where e.g.qs_seq_pi[p][t][f]
stores the marginal belief about factorf
at timepointt
under policyp
.A (
numpy.ndarray
of dtype object) – Sensory likelihood mapping or ‘observation model’, mapping from hidden states to observations. Each elementA[m]
of stores annumpy.ndarray
multidimensional array for observation modalitym
, whose entriesA[m][i, j, k, ...]
store the probability of observation leveli
given hidden state levelsj, k, ...
B (
numpy.ndarray
of dtype object) – Dynamics likelihood mapping or ‘transition model’, mapping from hidden states att
to hidden states att+1
, given some control stateu
. Each elementB[f]
of this object array stores a 3-D tensor for hidden state factorf
, whose entriesB[f][s, v, u]
store the probability of hidden state levels
at the current time, given hidden state levelv
and actionu
at the previous time.C (
numpy.ndarray
of dtype object) – Prior over observations or ‘prior preferences’, storing the “value” of each outcome in terms of relative log probabilities. This is softmaxed to form a proper probability distribution before being used to compute the expected utility term of the expected free energy.policies (
list
of 2Dnumpy.ndarray
) –list
that stores each policy inpolicies[p_idx]
. Shape ofpolicies[p_idx]
is(num_timesteps, num_factors)
where num_timesteps is the temporal depth of the policy andnum_factors
is the number of control factors.use_utility (
Bool
, defaultTrue
) – Boolean flag that determines whether expected utility should be incorporated into computation of EFE.use_states_info_gain (
Bool
, defaultTrue
) – Boolean flag that determines whether state epistemic value (info gain about hidden states) should be incorporated into computation of EFE.use_param_info_gain (
Bool
, defaultFalse
) – Boolean flag that determines whether parameter epistemic value (info gain about generative model parameters) should be incorporated into computation of EFE.prior (
numpy.ndarray
of dtype object, defaultNone
) – If provided, this is anumpy
object array with one sub-array per hidden state factor, that stores the prior beliefs about initial states. IfNone
, this defaults to a flat (uninformative) prior over hidden states.pA (
numpy.ndarray
of dtype object, defaultNone
) – Dirichlet parameters over observation model (same shape asA
)pB (
numpy.ndarray
of dtype object, defaultNone
) – Dirichlet parameters over transition model (same shape asB
)F (1D
numpy.ndarray
, defaultNone
) – Vector of variational free energies for each policyE (1D
numpy.ndarray
, defaultNone
) – Vector of prior probabilities of each policy (what’s referred to in the active inference literature as “habits”). IfNone
, this defaults to a flat (uninformative) prior over policies.gamma (
float
, default 16.0) – Prior precision over policies, scales the contribution of the expected free energy to the posterior over policies
- Returns
q_pi (1D
numpy.ndarray
) – Posterior beliefs over policies, i.e. a vector containing one posterior probability per policy.G (1D
numpy.ndarray
) – Negative expected free energies of each policy, i.e. a vector containing one negative expected free energy per policy.