pymdp.envs.rollout¶
pymdp.envs.rollout
¶
Utilities for running active-inference loops against environment dynamics.
The two primary public entry points are:
- :func:infer_and_plan for one-step inference/planning/action selection
- :func:rollout for multi-step scanned execution with optional online learning
infer_and_plan(agent: Agent, qs_prev: list[Array], observation: list[Array] | list[int], action_prev: Array | None = None, rng_key: Array | None = None, policy_search: Callable[[Agent, list[Array], Array], tuple[Array, dict[str, Array]]] | None = None, past_actions: Array | None = None, empirical_prior: list[Array] | None = None, learning_observations: list[Array] | Array | None = None, learning_actions: Array | None = None, learning_beliefs: list[Array] | None = None, valid_steps: int | Array | None = None) -> tuple[Agent, Array, list[Array], dict[str, Any]]
¶
Run one active-inference step (state update, policy inference, action sample).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
agent
|
Agent
|
Active inference agent instance. |
required |
qs_prev
|
list[Array]
|
Previous posterior beliefs over hidden states. |
required |
observation
|
list[Array] | list[int]
|
Current environment observation. |
required |
action_prev
|
Array | None
|
Previous action. If |
None
|
rng_key
|
Array
|
PRNG key used by policy search and action sampling. |
None
|
policy_search
|
callable | None
|
Optional custom policy-search function. Defaults to expected-free-energy policy inference. |
None
|
past_actions
|
Array | None
|
Optional action history for sequence inference methods. |
None
|
empirical_prior
|
list[Array] | None
|
Optional override for the empirical prior. |
None
|
learning_observations
|
optional
|
Optional learning observation buffer; defaults to current observation. |
None
|
learning_actions
|
optional
|
Optional learning action buffer. |
None
|
learning_beliefs
|
optional
|
Optional learning belief buffer for smoothing-based updates. |
None
|
valid_steps
|
int | Array | None
|
Number of valid timesteps in padded fixed windows. |
None
|
Returns:
| Type | Description |
|---|---|
tuple
|
|
rollout(agent: Agent, env: Env, num_timesteps: int, rng_key: Array, initial_carry: dict[str, Any] | None = None, policy_search: Callable[[Agent, list[Array], Array], tuple[Array, dict[str, Array]]] | None = None, env_params: Any = None) -> tuple[dict[str, Any], dict[str, Any]]
¶
Roll out an active-inference agent/environment loop for num_timesteps.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
agent
|
Agent
|
Active inference agent. |
required |
env
|
Env
|
Environment implementing |
required |
num_timesteps
|
int
|
Number of timesteps to simulate. |
required |
rng_key
|
Array
|
Root PRNG key; internally split per-step and per-batch. |
required |
initial_carry
|
dict | None
|
Optional carry overrides for warm-starting from existing state. |
None
|
policy_search
|
callable | None
|
Optional custom policy-search routine. |
None
|
env_params
|
pytree | None
|
Optional batched environment parameters. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
last |
dict
|
Final carry state after the final timestep. |
info |
dict
|
Time-indexed rollout traces (actions, observations, beliefs, etc.). |