Control
The control.py
module contains the functions for performing inference of policies (sequences of control states) in POMDP generative models,
according to active inference.
- pymdp.control.backwards_induction(H, B, B_factor_list, threshold, depth)
Runs backwards induction of reaching a goal state H given a transition model B.
- Parameters
H (
numpy.ndarray
of dtype object) – Prior over statesB (
numpy.ndarray
of dtype object) – Dynamics likelihood mapping or ‘transition model’, mapping from hidden states att
to hidden states att+1
, given some control stateu
. Each elementB[f]
of this object array stores a 3-D tensor for hidden state factorf
, whose entriesB[f][s, v, u]
store the probability of hidden state levels
at the current time, given hidden state levelv
and actionu
at the previous time.B_factor_list (
list
oflist
ofint
) – List of lists of hidden state factors each hidden state factor depends on. Each elementB_factor_list[i]
is a list of the factor indices that factor i’s dynamics depend on.threshold (
float
) – The threshold for pruning transitions that are below a certain probabilitydepth (
int
) – The temporal depth of the backward induction
- Returns
I – For each state factor, contains a 2D
numpy.ndarray
whose element i,j yields the probability of reaching the goal state backwards from state j after i steps.- Return type
numpy.ndarray
of dtype object
- pymdp.control.calc_ambiguity_factorized(qs_pi, A, A_factor_list)
Computes the Ambiguity term.
- Parameters
qs_pi (
list
ofnumpy.ndarray
of dtype object) – Predictive posterior beliefs over hidden states expected under the policy, whereqs_pi[t]
stores the beliefs about hidden states expected under the policy at timet
A (
numpy.ndarray
of dtype object) – Sensory likelihood mapping or ‘observation model’, mapping from hidden states to observations. Each elementA[m]
of stores annumpy.ndarray
multidimensional array for observation modalitym
, whose entriesA[m][i, j, k, ...]
store the probability of observation leveli
given hidden state levelsj, k, ...
A_factor_list (
list
oflist
ofint
) – List of lists, whereA_factor_list[m]
is a list of the hidden state factor indices that observation modality with the indexm
depends on
- Returns
ambiguity
- Return type
float
- pymdp.control.calc_expected_utility(qo_pi, C)
Computes the expected utility of a policy, using the observation distribution expected under that policy and a prior preference vector.
- Parameters
qo_pi (
list
ofnumpy.ndarray
of dtype object) – Predictive posterior beliefs over observations expected under the policy, whereqo_pi[t]
stores the beliefs about observations expected under the policy at timet
C (
numpy.ndarray
of dtype object) – Prior over observations or ‘prior preferences’, storing the “value” of each outcome in terms of relative log probabilities. This is softmaxed to form a proper probability distribution before being used to compute the expected utility.
- Returns
expected_util – Utility (reward) expected under the policy in question
- Return type
float
- pymdp.control.calc_inductive_cost(qs, qs_pi, I, epsilon=0.001)
Computes the inductive cost of a state.
- Parameters
qs (
numpy.ndarray
of dtype object) – Marginal posterior beliefs over hidden states at a given timepoint.qs_pi (
list
ofnumpy.ndarray
of dtype object) – Predictive posterior beliefs over hidden states expected under the policy, whereqs_pi[t]
stores the beliefs about states expected under the policy at timet
I (
numpy.ndarray
of dtype object) – For each state factor, contains a 2Dnumpy.ndarray
whose element i,j yields the probability of reaching the goal state backwards from state j after i steps.
- Returns
inductive_cost – Cost of visited this state using backwards induction under the policy in question
- Return type
float
- pymdp.control.calc_pA_info_gain(pA, qo_pi, qs_pi)
Compute expected Dirichlet information gain about parameters
pA
under a policy- Parameters
pA (
numpy.ndarray
of dtype object) – Dirichlet parameters over observation model (same shape asA
)qo_pi (
list
ofnumpy.ndarray
of dtype object) – Predictive posterior beliefs over observations expected under the policy, whereqo_pi[t]
stores the beliefs about observations expected under the policy at timet
qs_pi (
list
ofnumpy.ndarray
of dtype object) – Predictive posterior beliefs over hidden states expected under the policy, whereqs_pi[t]
stores the beliefs about hidden states expected under the policy at timet
- Returns
infogain_pA – Surprise (about Dirichlet parameters) expected under the policy in question
- Return type
float
- pymdp.control.calc_pA_info_gain_factorized(pA, qo_pi, qs_pi, A_factor_list)
Compute expected Dirichlet information gain about parameters
pA
under a policy. In this version of the function, we assume that the observation model is factorized, i.e. that each observation modality depends on a subset of the hidden state factors.- Parameters
pA (
numpy.ndarray
of dtype object) – Dirichlet parameters over observation model (same shape asA
)qo_pi (
list
ofnumpy.ndarray
of dtype object) – Predictive posterior beliefs over observations expected under the policy, whereqo_pi[t]
stores the beliefs about observations expected under the policy at timet
qs_pi (
list
ofnumpy.ndarray
of dtype object) – Predictive posterior beliefs over hidden states expected under the policy, whereqs_pi[t]
stores the beliefs about hidden states expected under the policy at timet
A_factor_list (
list
oflist
ofint
) – List of lists, whereA_factor_list[m]
is a list of the hidden state factor indices that observation modality with the indexm
depends on
- Returns
infogain_pA – Surprise (about Dirichlet parameters) expected under the policy in question
- Return type
float
- pymdp.control.calc_pB_info_gain(pB, qs_pi, qs_prev, policy)
Compute expected Dirichlet information gain about parameters
pB
under a given policy- Parameters
pB (
numpy.ndarray
of dtype object) – Dirichlet parameters over transition model (same shape asB
)qs_pi (
list
ofnumpy.ndarray
of dtype object) – Predictive posterior beliefs over hidden states expected under the policy, whereqs_pi[t]
stores the beliefs about hidden states expected under the policy at timet
qs_prev (
numpy.ndarray
of dtype object) – Posterior over hidden states at beginning of trajectory (before receiving observations)policy (2D
numpy.ndarray
) – Array that stores actions entailed by a policy over time. Shape is(num_timesteps, num_factors)
wherenum_timesteps
is the temporal depth of the policy andnum_factors
is the number of control factors.
- Returns
infogain_pB – Surprise (about dirichlet parameters) expected under the policy in question
- Return type
float
- pymdp.control.calc_pB_info_gain_interactions(pB, qs_pi, qs_prev, B_factor_list, policy)
Compute expected Dirichlet information gain about parameters
pB
under a given policy- Parameters
pB (
numpy.ndarray
of dtype object) – Dirichlet parameters over transition model (same shape asB
)qs_pi (
list
ofnumpy.ndarray
of dtype object) – Predictive posterior beliefs over hidden states expected under the policy, whereqs_pi[t]
stores the beliefs about hidden states expected under the policy at timet
qs_prev (
numpy.ndarray
of dtype object) – Posterior over hidden states at beginning of trajectory (before receiving observations)B_factor_list (
list
oflist
ofint
) – List of lists, whereB_factor_list[f]
is a list of the hidden state factor indices that hidden state factor with the indexf
depends onpolicy (2D
numpy.ndarray
) – Array that stores actions entailed by a policy over time. Shape is(num_timesteps, num_factors)
wherenum_timesteps
is the temporal depth of the policy andnum_factors
is the number of control factors.
- Returns
infogain_pB – Surprise (about dirichlet parameters) expected under the policy in question
- Return type
float
- pymdp.control.calc_states_info_gain(A, qs_pi)
Computes the Bayesian surprise or information gain about states of a policy, using the observation model and the hidden state distribution expected under that policy.
- Parameters
A (
numpy.ndarray
of dtype object) – Sensory likelihood mapping or ‘observation model’, mapping from hidden states to observations. Each elementA[m]
of stores annumpy.ndarray
multidimensional array for observation modalitym
, whose entriesA[m][i, j, k, ...]
store the probability of observation leveli
given hidden state levelsj, k, ...
qs_pi (
list
ofnumpy.ndarray
of dtype object) – Predictive posterior beliefs over hidden states expected under the policy, whereqs_pi[t]
stores the beliefs about hidden states expected under the policy at timet
- Returns
states_surprise – Bayesian surprise (about states) or salience expected under the policy in question
- Return type
float
- pymdp.control.calc_states_info_gain_factorized(A, qs_pi, A_factor_list)
Computes the Bayesian surprise or information gain about states of a policy, using the observation model and the hidden state distribution expected under that policy.
- Parameters
A (
numpy.ndarray
of dtype object) – Sensory likelihood mapping or ‘observation model’, mapping from hidden states to observations. Each elementA[m]
of stores annumpy.ndarray
multidimensional array for observation modalitym
, whose entriesA[m][i, j, k, ...]
store the probability of observation leveli
given hidden state levelsj, k, ...
qs_pi (
list
ofnumpy.ndarray
of dtype object) – Predictive posterior beliefs over hidden states expected under the policy, whereqs_pi[t]
stores the beliefs about hidden states expected under the policy at timet
A_factor_list (
list
oflist
ofint
) – List of lists, whereA_factor_list[m]
is a list of the hidden state factor indices that observation modality with the indexm
depends on
- Returns
states_surprise – Bayesian surprise (about states) or salience expected under the policy in question
- Return type
float
- pymdp.control.construct_policies(num_states, num_controls=None, policy_len=1, control_fac_idx=None)
Generate a
list
of policies. The returned arraypolicies
is alist
that stores one policy per entry. A particular policy (policies[i]
) has shape(num_timesteps, num_factors)
wherenum_timesteps
is the temporal depth of the policy andnum_factors
is the number of control factors.- Parameters
num_states (
list
ofint
) –list
of the dimensionalities of each hidden state factornum_controls (
list
ofint
, defaultNone
) –list
of the dimensionalities of each control state factor. IfNone
, then is automatically computed as the dimensionality of each hidden state factor that is controllablepolicy_len (
int
, default 1) – temporal depth (“planning horizon”) of policiescontrol_fac_idx (
list
ofint
) –list
of indices of the hidden state factors that are controllable (i.e. those state factorsi
wherenum_controls[i] > 1
)
- Returns
policies –
list
that stores each policy as a 2D array inpolicies[p_idx]
. Shape ofpolicies[p_idx]
is(num_timesteps, num_factors)
wherenum_timesteps
is the temporal depth of the policy andnum_factors
is the number of control factors.- Return type
list
of 2Dnumpy.ndarray
- pymdp.control.get_expected_obs(qs_pi, A)
Compute the expected observations under a policy, also known as the posterior predictive density over observations
- Parameters
qs_pi (
list
ofnumpy.ndarray
of dtype object) – Predictive posterior beliefs over hidden states expected under the policy, whereqs_pi[t]
stores the beliefs about hidden states expected under the policy at timet
A (
numpy.ndarray
of dtype object) – Sensory likelihood mapping or ‘observation model’, mapping from hidden states to observations. Each elementA[m]
of stores annumpy.ndarray
multidimensional array for observation modalitym
, whose entriesA[m][i, j, k, ...]
store the probability of observation leveli
given hidden state levelsj, k, ...
- Returns
qo_pi – Predictive posterior beliefs over observations expected under the policy, where
qo_pi[t]
stores the beliefs about observations expected under the policy at timet
- Return type
list
ofnumpy.ndarray
of dtype object
- pymdp.control.get_expected_obs_factorized(qs_pi, A, A_factor_list)
Compute the expected observations under a policy, also known as the posterior predictive density over observations
- Parameters
qs_pi (
list
ofnumpy.ndarray
of dtype object) – Predictive posterior beliefs over hidden states expected under the policy, whereqs_pi[t]
stores the beliefs about hidden states expected under the policy at timet
A (
numpy.ndarray
of dtype object) – Sensory likelihood mapping or ‘observation model’, mapping from hidden states to observations. Each elementA[m]
of stores annumpy.ndarray
multidimensional array for observation modalitym
, whose entriesA[m][i, j, k, ...]
store the probability of observation leveli
given hidden state levelsj, k, ...
A_factor_list (
list
oflist
ofint
) – List of lists of hidden state factor indices that each observation modality depends on. Each elementA_factor_list[i]
is a list of the factor indices that modality i’s observation model depends on.
- Returns
qo_pi – Predictive posterior beliefs over observations expected under the policy, where
qo_pi[t]
stores the beliefs about observations expected under the policy at timet
- Return type
list
ofnumpy.ndarray
of dtype object
- pymdp.control.get_expected_states(qs, B, policy)
Compute the expected states under a policy, also known as the posterior predictive density over states
- Parameters
qs (
numpy.ndarray
of dtype object) – Marginal posterior beliefs over hidden states at a given timepoint.B (
numpy.ndarray
of dtype object) – Dynamics likelihood mapping or ‘transition model’, mapping from hidden states att
to hidden states att+1
, given some control stateu
. Each elementB[f]
of this object array stores a 3-D tensor for hidden state factorf
, whose entriesB[f][s, v, u]
store the probability of hidden state levels
at the current time, given hidden state levelv
and actionu
at the previous time.policy (2D
numpy.ndarray
) – Array that stores actions entailed by a policy over time. Shape is(num_timesteps, num_factors)
wherenum_timesteps
is the temporal depth of the policy andnum_factors
is the number of control factors.
- Returns
qs_pi – Predictive posterior beliefs over hidden states expected under the policy, where
qs_pi[t]
stores the beliefs about hidden states expected under the policy at timet
- Return type
list
ofnumpy.ndarray
of dtype object
- pymdp.control.get_expected_states_interactions(qs, B, B_factor_list, policy)
Compute the expected states under a policy, also known as the posterior predictive density over states
- Parameters
qs (
numpy.ndarray
of dtype object) – Marginal posterior beliefs over hidden states at a given timepoint.B (
numpy.ndarray
of dtype object) – Dynamics likelihood mapping or ‘transition model’, mapping from hidden states att
to hidden states att+1
, given some control stateu
. Each elementB[f]
of this object array stores a 3-D tensor for hidden state factorf
, whose entriesB[f][s, v, u]
store the probability of hidden state levels
at the current time, given hidden state levelv
and actionu
at the previous time.B_factor_list (
list
oflist
ofint
) – List of lists of hidden state factors each hidden state factor depends on. Each elementB_factor_list[i]
is a list of the factor indices that factor i’s dynamics depend on.policy (2D
numpy.ndarray
) – Array that stores actions entailed by a policy over time. Shape is(num_timesteps, num_factors)
wherenum_timesteps
is the temporal depth of the policy andnum_factors
is the number of control factors.
- Returns
qs_pi – Predictive posterior beliefs over hidden states expected under the policy, where
qs_pi[t]
stores the beliefs about hidden states expected under the policy at timet
- Return type
list
ofnumpy.ndarray
of dtype object
- pymdp.control.get_num_controls_from_policies(policies)
Calculates the
list
of dimensionalities of control factors (num_controls
) from thelist
or array of policies. This assumes a policy space such that for each control factor, there is at least one policy that entails taking the action with the maximum index along that control factor.- Parameters
policies (
list
of 2Dnumpy.ndarray
) –list
that stores each policy as a 2D array inpolicies[p_idx]
. Shape ofpolicies[p_idx]
is(num_timesteps, num_factors)
wherenum_timesteps
is the temporal depth of the policy andnum_factors
is the number of control factors.- Returns
num_controls –
list
of the dimensionalities of each control state factor, computed here automatically from alist
of policies.- Return type
list
ofint
- pymdp.control.sample_action(q_pi, policies, num_controls, action_selection='deterministic', alpha=16.0)
Computes the marginal posterior over actions and then samples an action from it, one action per control factor.
- Parameters
q_pi (1D
numpy.ndarray
) – Posterior beliefs over policies, i.e. a vector containing one posterior probability per policy.policies (
list
of 2Dnumpy.ndarray
) –list
that stores each policy as a 2D array inpolicies[p_idx]
. Shape ofpolicies[p_idx]
is(num_timesteps, num_factors)
wherenum_timesteps
is the temporal depth of the policy andnum_factors
is the number of control factors.num_controls (
list
ofint
) –list
of the dimensionalities of each control state factor.action_selection (
str
, default “deterministic”) – String indicating whether whether the selected action is chosen as the maximum of the posterior over actions, or whether it’s sampled from the posterior marginal over actionsalpha (
float
, default 16.0) – Action selection precision – the inverse temperature of the softmax that is used to scale the action marginals before sampling. This is only used ifaction_selection
argument is “stochastic”
- Returns
selected_policy – Vector containing the indices of the actions for each control factor
- Return type
1D
numpy.ndarray
- pymdp.control.sample_policy(q_pi, policies, num_controls, action_selection='deterministic', alpha=16.0)
Samples a policy from the posterior over policies, taking the action (per control factor) entailed by the first timestep of the selected policy.
- Parameters
q_pi (1D
numpy.ndarray
) – Posterior beliefs over policies, i.e. a vector containing one posterior probability per policy.policies (
list
of 2Dnumpy.ndarray
) –list
that stores each policy as a 2D array inpolicies[p_idx]
. Shape ofpolicies[p_idx]
is(num_timesteps, num_factors)
wherenum_timesteps
is the temporal depth of the policy andnum_factors
is the number of control factors.num_controls (
list
ofint
) –list
of the dimensionalities of each control state factor.action_selection (string, default "deterministic") – String indicating whether whether the selected policy is chosen as the maximum of the posterior over policies, or whether it’s sampled from the posterior over policies.
alpha (float, default 16.0) – Action selection precision – the inverse temperature of the softmax that is used to scale the policy posterior before sampling. This is only used if
action_selection
argument is “stochastic”
- Returns
selected_policy – Vector containing the indices of the actions for each control factor
- Return type
1D
numpy.ndarray
- pymdp.control.select_highest(options_array)
Selects the highest value among the provided ones. If the higher value is more than once and they’re closer than 1e-5, a random choice is made. :param options_array: The array to examine :type options_array:
numpy.ndarray
- Returns
- Return type
The highest value in the given list
- pymdp.control.sophisticated_inference_search(qs, policies, A, B, C, A_factor_list, B_factor_list, I=None, horizon=1, policy_prune_threshold=0.0625, state_prune_threshold=0.0625, prune_penalty=512, gamma=16, inference_params={'compute_vfe': False, 'dF': 1.0, 'dF_tol': 0.001, 'num_iter': 10}, n=0)
Performs sophisticated inference to find the optimal policy for a given generative model and prior preferences.
- Parameters
qs (
numpy.ndarray
of dtype object) – Marginal posterior beliefs over hidden states at a given timepoint.policies (
list
of 1Dnumpy.ndarray
inference_params = {“num_iter”: 10, “dF”: 1.0, “dF_tol”: 0.001, “compute_vfe”: False}) –list
that stores each policy as a 1D array inpolicies[p_idx]
. Shape ofpolicies[p_idx]
is(num_factors)
wherenum_factors
is the number of control factors.A (
numpy.ndarray
of dtype object) – Sensory likelihood mapping or ‘observation model’, mapping from hidden states to observations. Each elementA[m]
of stores annumpy.ndarray
multidimensional array for observation modalitym
, whose entriesA[m][i, j, k, ...]
store the probability of observation leveli
given hidden state levelsj, k, ...
B (
numpy.ndarray
of dtype object) – Dynamics likelihood mapping or ‘transition model’, mapping from hidden states att
to hidden states att+1
, given some control stateu
. Each elementB[f]
of this object array stores a 3-D tensor for hidden state factorf
, whose entriesB[f][s, v, u]
store the probability of hidden state levels
at the current time, given hidden state levelv
and actionu
at the previous time.C (
numpy.ndarray
of dtype object) – Prior over observations or ‘prior preferences’, storing the “value” of each outcome in terms of relative log probabilities. This is softmaxed to form a proper probability distribution before being used to compute the expected utility term of the expected free energy.A_factor_list (
list
oflist
ofint
) – List of lists, whereA_factor_list[m]
is a list of the hidden state factor indices that observation modality with the indexm
depends onB_factor_list (
list
oflist
ofint
) – List of lists of hidden state factors each hidden state factor depends on. Each elementB_factor_list[i]
is a list of the factor indices that factor i’s dynamics depend on.I (
numpy.ndarray
of dtype object) – For each state factor, contains a 2Dnumpy.ndarray
whose element i,j yields the probability of reaching the goal state backwards from state j after i steps.horizon (
int
) – The temporal depth of the policypolicy_prune_threshold (
float
) – The threshold for pruning policies that are below a certain probabilitystate_prune_threshold (
float
) – The threshold for pruning states in the expectation that are below a certain probabilityprune_penalty (
float
) – Penalty to add to the EFE when a policy is prunedgamma (
float
, default 16.0) – Prior precision over policies, scales the contribution of the expected free energy to the posterior over policiesn (
int
) – timestep in the future we are calculating
- Returns
q_pi (1D
numpy.ndarray
) – Posterior beliefs over policies, i.e. a vector containing one posterior probability per policy.G (1D
numpy.ndarray
) – Negative expected free energies of each policy, i.e. a vector containing one negative expected free energy per policy.
- pymdp.control.update_posterior_policies(qs, A, B, C, policies, use_utility=True, use_states_info_gain=True, use_param_info_gain=False, pA=None, pB=None, E=None, I=None, gamma=16.0)
Update posterior beliefs about policies by computing expected free energy of each policy and integrating that with the prior over policies
E
. This is intended to be used in conjunction with theupdate_posterior_states
method of theinference
module, since only the posterior about the hidden states at the current timestepqs
is assumed to be provided, unconditional on policies. The predictive posterior over hidden states under all policies Q(s, pi) is computed using the starting posterior about states at the current timestepqs
and the generative model (e.g.A
,B
,C
)- Parameters
qs (
numpy.ndarray
of dtype object) – Marginal posterior beliefs over hidden states at current timepoint (unconditioned on policies)A (
numpy.ndarray
of dtype object) – Sensory likelihood mapping or ‘observation model’, mapping from hidden states to observations. Each elementA[m]
of stores annumpy.ndarray
multidimensional array for observation modalitym
, whose entriesA[m][i, j, k, ...]
store the probability of observation leveli
given hidden state levelsj, k, ...
B (
numpy.ndarray
of dtype object) – Dynamics likelihood mapping or ‘transition model’, mapping from hidden states att
to hidden states att+1
, given some control stateu
. Each elementB[f]
of this object array stores a 3-D tensor for hidden state factorf
, whose entriesB[f][s, v, u]
store the probability of hidden state levels
at the current time, given hidden state levelv
and actionu
at the previous time.C (
numpy.ndarray
of dtype object) – Prior over observations or ‘prior preferences’, storing the “value” of each outcome in terms of relative log probabilities. This is softmaxed to form a proper probability distribution before being used to compute the expected utility term of the expected free energy.policies (
list
of 2Dnumpy.ndarray
) –list
that stores each policy inpolicies[p_idx]
. Shape ofpolicies[p_idx]
is(num_timesteps, num_factors)
where num_timesteps is the temporal depth of the policy andnum_factors
is the number of control factors.use_utility (
Bool
, defaultTrue
) – Boolean flag that determines whether expected utility should be incorporated into computation of EFE.use_states_info_gain (
Bool
, defaultTrue
) – Boolean flag that determines whether state epistemic value (info gain about hidden states) should be incorporated into computation of EFE.use_param_info_gain (
Bool
, defaultFalse
) – Boolean flag that determines whether parameter epistemic value (info gain about generative model parameters) should be incorporated into computation of EFE.pA (
numpy.ndarray
of dtype object, optional) – Dirichlet parameters over observation model (same shape asA
)pB (
numpy.ndarray
of dtype object, optional) – Dirichlet parameters over transition model (same shape asB
)E (1D
numpy.ndarray
, optional) – Vector of prior probabilities of each policy (what’s referred to in the active inference literature as “habits”)I (
numpy.ndarray
of dtype object) – For each state factor, contains a 2Dnumpy.ndarray
whose element i,j yields the probability of reaching the goal state backwards from state j after i steps.gamma (float, default 16.0) – Prior precision over policies, scales the contribution of the expected free energy to the posterior over policies
- Returns
q_pi (1D
numpy.ndarray
) – Posterior beliefs over policies, i.e. a vector containing one posterior probability per policy.G (1D
numpy.ndarray
) – Negative expected free energies of each policy, i.e. a vector containing one negative expected free energy per policy.
- pymdp.control.update_posterior_policies_factorized(qs, A, B, C, A_factor_list, B_factor_list, policies, use_utility=True, use_states_info_gain=True, use_param_info_gain=False, pA=None, pB=None, E=None, I=None, gamma=16.0)
Update posterior beliefs about policies by computing expected free energy of each policy and integrating that with the prior over policies
E
. This is intended to be used in conjunction with theupdate_posterior_states
method of theinference
module, since only the posterior about the hidden states at the current timestepqs
is assumed to be provided, unconditional on policies. The predictive posterior over hidden states under all policies Q(s, pi) is computed using the starting posterior about states at the current timestepqs
and the generative model (e.g.A
,B
,C
)- Parameters
qs (
numpy.ndarray
of dtype object) – Marginal posterior beliefs over hidden states at current timepoint (unconditioned on policies)A (
numpy.ndarray
of dtype object) – Sensory likelihood mapping or ‘observation model’, mapping from hidden states to observations. Each elementA[m]
of stores annumpy.ndarray
multidimensional array for observation modalitym
, whose entriesA[m][i, j, k, ...]
store the probability of observation leveli
given hidden state levelsj, k, ...
B (
numpy.ndarray
of dtype object) – Dynamics likelihood mapping or ‘transition model’, mapping from hidden states att
to hidden states att+1
, given some control stateu
. Each elementB[f]
of this object array stores a 3-D tensor for hidden state factorf
, whose entriesB[f][s, v, u]
store the probability of hidden state levels
at the current time, given hidden state levelv
and actionu
at the previous time.C (
numpy.ndarray
of dtype object) – Prior over observations or ‘prior preferences’, storing the “value” of each outcome in terms of relative log probabilities. This is softmaxed to form a proper probability distribution before being used to compute the expected utility term of the expected free energy.A_factor_list (
list
oflist``s of ``int
) –list
that stores the indices of the hidden state factor indices that each observation modality depends on. For example, ifA_factor_list[m] = [0, 1]
, then observation modalitym
depends on hidden state factors 0 and 1.B_factor_list (
list
oflist``s of ``int
) –list
that stores the indices of the hidden state factor indices that each hidden state factor depends on. For example, ifB_factor_list[f] = [0, 1]
, then the transitions in hidden state factorf
depend on hidden state factors 0 and 1.policies (
list
of 2Dnumpy.ndarray
) –list
that stores each policy inpolicies[p_idx]
. Shape ofpolicies[p_idx]
is(num_timesteps, num_factors)
where num_timesteps is the temporal depth of the policy andnum_factors
is the number of control factors.use_utility (
Bool
, defaultTrue
) – Boolean flag that determines whether expected utility should be incorporated into computation of EFE.use_states_info_gain (
Bool
, defaultTrue
) – Boolean flag that determines whether state epistemic value (info gain about hidden states) should be incorporated into computation of EFE.use_param_info_gain (
Bool
, defaultFalse
) – Boolean flag that determines whether parameter epistemic value (info gain about generative model parameters) should be incorporated into computation of EFE.pA (
numpy.ndarray
of dtype object, optional) – Dirichlet parameters over observation model (same shape asA
)pB (
numpy.ndarray
of dtype object, optional) – Dirichlet parameters over transition model (same shape asB
)E (1D
numpy.ndarray
, optional) – Vector of prior probabilities of each policy (what’s referred to in the active inference literature as “habits”)I (
numpy.ndarray
of dtype object) – For each state factor, contains a 2Dnumpy.ndarray
whose element i,j yields the probability of reaching the goal state backwards from state j after i steps.gamma (float, default 16.0) – Prior precision over policies, scales the contribution of the expected free energy to the posterior over policies
- Returns
q_pi (1D
numpy.ndarray
) – Posterior beliefs over policies, i.e. a vector containing one posterior probability per policy.G (1D
numpy.ndarray
) – Negative expected free energies of each policy, i.e. a vector containing one negative expected free energy per policy.
- pymdp.control.update_posterior_policies_full(qs_seq_pi, A, B, C, policies, use_utility=True, use_states_info_gain=True, use_param_info_gain=False, prior=None, pA=None, pB=None, F=None, E=None, I=None, gamma=16.0)
Update posterior beliefs about policies by computing expected free energy of each policy and integrating that with the variational free energy of policies
F
and prior over policiesE
. This is intended to be used in conjunction with theupdate_posterior_states_full
method ofinference.py
, since the full posterior over future timesteps, under all policies, is assumed to be provided in the input arrayqs_seq_pi
.- Parameters
qs_seq_pi (
numpy.ndarray
of dtype object) – Posterior beliefs over hidden states for each policy. Nesting structure is policies, timepoints, factors, where e.g.qs_seq_pi[p][t][f]
stores the marginal belief about factorf
at timepointt
under policyp
.A (
numpy.ndarray
of dtype object) – Sensory likelihood mapping or ‘observation model’, mapping from hidden states to observations. Each elementA[m]
of stores annumpy.ndarray
multidimensional array for observation modalitym
, whose entriesA[m][i, j, k, ...]
store the probability of observation leveli
given hidden state levelsj, k, ...
B (
numpy.ndarray
of dtype object) – Dynamics likelihood mapping or ‘transition model’, mapping from hidden states att
to hidden states att+1
, given some control stateu
. Each elementB[f]
of this object array stores a 3-D tensor for hidden state factorf
, whose entriesB[f][s, v, u]
store the probability of hidden state levels
at the current time, given hidden state levelv
and actionu
at the previous time.C (
numpy.ndarray
of dtype object) – Prior over observations or ‘prior preferences’, storing the “value” of each outcome in terms of relative log probabilities. This is softmaxed to form a proper probability distribution before being used to compute the expected utility term of the expected free energy.policies (
list
of 2Dnumpy.ndarray
) –list
that stores each policy inpolicies[p_idx]
. Shape ofpolicies[p_idx]
is(num_timesteps, num_factors)
where num_timesteps is the temporal depth of the policy andnum_factors
is the number of control factors.use_utility (
Bool
, defaultTrue
) – Boolean flag that determines whether expected utility should be incorporated into computation of EFE.use_states_info_gain (
Bool
, defaultTrue
) – Boolean flag that determines whether state epistemic value (info gain about hidden states) should be incorporated into computation of EFE.use_param_info_gain (
Bool
, defaultFalse
) – Boolean flag that determines whether parameter epistemic value (info gain about generative model parameters) should be incorporated into computation of EFE.prior (
numpy.ndarray
of dtype object, defaultNone
) – If provided, this is anumpy
object array with one sub-array per hidden state factor, that stores the prior beliefs about initial states. IfNone
, this defaults to a flat (uninformative) prior over hidden states.pA (
numpy.ndarray
of dtype object, defaultNone
) – Dirichlet parameters over observation model (same shape asA
)pB (
numpy.ndarray
of dtype object, defaultNone
) – Dirichlet parameters over transition model (same shape asB
)F (1D
numpy.ndarray
, defaultNone
) – Vector of variational free energies for each policyE (1D
numpy.ndarray
, defaultNone
) – Vector of prior probabilities of each policy (what’s referred to in the active inference literature as “habits”). IfNone
, this defaults to a flat (uninformative) prior over policies.I (
numpy.ndarray
of dtype object) – For each state factor, contains a 2Dnumpy.ndarray
whose element i,j yields the probability of reaching the goal state backwards from state j after i steps.gamma (
float
, default 16.0) – Prior precision over policies, scales the contribution of the expected free energy to the posterior over policies
- Returns
q_pi (1D
numpy.ndarray
) – Posterior beliefs over policies, i.e. a vector containing one posterior probability per policy.G (1D
numpy.ndarray
) – Negative expected free energies of each policy, i.e. a vector containing one negative expected free energy per policy.
- pymdp.control.update_posterior_policies_full_factorized(qs_seq_pi, A, B, C, A_factor_list, B_factor_list, policies, use_utility=True, use_states_info_gain=True, use_param_info_gain=False, prior=None, pA=None, pB=None, F=None, E=None, I=None, gamma=16.0)
Update posterior beliefs about policies by computing expected free energy of each policy and integrating that with the variational free energy of policies
F
and prior over policiesE
. This is intended to be used in conjunction with theupdate_posterior_states_full
method ofinference.py
, since the full posterior over future timesteps, under all policies, is assumed to be provided in the input arrayqs_seq_pi
.- Parameters
qs_seq_pi (
numpy.ndarray
of dtype object) – Posterior beliefs over hidden states for each policy. Nesting structure is policies, timepoints, factors, where e.g.qs_seq_pi[p][t][f]
stores the marginal belief about factorf
at timepointt
under policyp
.A (
numpy.ndarray
of dtype object) – Sensory likelihood mapping or ‘observation model’, mapping from hidden states to observations. Each elementA[m]
of stores annumpy.ndarray
multidimensional array for observation modalitym
, whose entriesA[m][i, j, k, ...]
store the probability of observation leveli
given hidden state levelsj, k, ...
B (
numpy.ndarray
of dtype object) – Dynamics likelihood mapping or ‘transition model’, mapping from hidden states att
to hidden states att+1
, given some control stateu
. Each elementB[f]
of this object array stores a 3-D tensor for hidden state factorf
, whose entriesB[f][s, v, u]
store the probability of hidden state levels
at the current time, given hidden state levelv
and actionu
at the previous time.C (
numpy.ndarray
of dtype object) – Prior over observations or ‘prior preferences’, storing the “value” of each outcome in terms of relative log probabilities. This is softmaxed to form a proper probability distribution before being used to compute the expected utility term of the expected free energy.A_factor_list (
list
oflist``s of ``int
) –list
that stores the indices of the hidden state factor indices that each observation modality depends on. For example, ifA_factor_list[m] = [0, 1]
, then observation modalitym
depends on hidden state factors 0 and 1.B_factor_list (
list
oflist``s of ``int
) –list
that stores the indices of the hidden state factor indices that each hidden state factor depends on. For example, ifB_factor_list[f] = [0, 1]
, then the transitions in hidden state factorf
depend on hidden state factors 0 and 1.policies (
list
of 2Dnumpy.ndarray
) –list
that stores each policy inpolicies[p_idx]
. Shape ofpolicies[p_idx]
is(num_timesteps, num_factors)
where num_timesteps is the temporal depth of the policy andnum_factors
is the number of control factors.use_utility (
Bool
, defaultTrue
) – Boolean flag that determines whether expected utility should be incorporated into computation of EFE.use_states_info_gain (
Bool
, defaultTrue
) – Boolean flag that determines whether state epistemic value (info gain about hidden states) should be incorporated into computation of EFE.use_param_info_gain (
Bool
, defaultFalse
) – Boolean flag that determines whether parameter epistemic value (info gain about generative model parameters) should be incorporated into computation of EFE.prior (
numpy.ndarray
of dtype object, defaultNone
) – If provided, this is anumpy
object array with one sub-array per hidden state factor, that stores the prior beliefs about initial states. IfNone
, this defaults to a flat (uninformative) prior over hidden states.pA (
numpy.ndarray
of dtype object, defaultNone
) – Dirichlet parameters over observation model (same shape asA
)pB (
numpy.ndarray
of dtype object, defaultNone
) – Dirichlet parameters over transition model (same shape asB
)F (1D
numpy.ndarray
, defaultNone
) – Vector of variational free energies for each policyE (1D
numpy.ndarray
, defaultNone
) – Vector of prior probabilities of each policy (what’s referred to in the active inference literature as “habits”). IfNone
, this defaults to a flat (uninformative) prior over policies.I (
numpy.ndarray
of dtype object) – For each state factor, contains a 2Dnumpy.ndarray
whose element i,j yields the probability of reaching the goal state backwards from state j after i steps.gamma (
float
, default 16.0) – Prior precision over policies, scales the contribution of the expected free energy to the posterior over policies
- Returns
q_pi (1D
numpy.ndarray
) – Posterior beliefs over policies, i.e. a vector containing one posterior probability per policy.G (1D
numpy.ndarray
) – Negative expected free energies of each policy, i.e. a vector containing one negative expected free energy per policy.