pymdp.envs.rollout

pymdp.envs.rollout

Utilities for running active-inference loops against environment dynamics.

The two primary public entry points are: - :func:infer_and_plan for one-step inference/planning/action selection - :func:rollout for multi-step scanned execution with optional online learning

infer_and_plan(agent: Agent, qs_prev: list[Array], observation: list[Array] | list[int], action_prev: Array | None = None, rng_key: Array | None = None, policy_search: Callable[[Agent, list[Array], Array], tuple[Array, dict[str, Array]]] | None = None, past_actions: Array | None = None, empirical_prior: list[Array] | None = None, learning_observations: list[Array] | Array | None = None, learning_actions: Array | None = None, learning_beliefs: list[Array] | None = None, valid_steps: int | Array | None = None) -> tuple[Agent, Array, list[Array], dict[str, Any]]

Run one active-inference step (state update, policy inference, action sample).

Parameters:

Name Type Description Default
agent Agent

Active inference agent instance.

required
qs_prev list[Array]

Previous posterior beliefs over hidden states.

required
observation list[Array] | list[int]

Current environment observation.

required
action_prev Array | None

Previous action. If None, agent.D is used as empirical prior.

None
rng_key Array

PRNG key used by policy search and action sampling.

None
policy_search callable | None

Optional custom policy-search function. Defaults to expected-free-energy policy inference.

None
past_actions Array | None

Optional action history for sequence inference methods.

None
empirical_prior list[Array] | None

Optional override for the empirical prior.

None
learning_observations optional

Optional learning observation buffer; defaults to current observation.

None
learning_actions optional

Optional learning action buffer.

None
learning_beliefs optional

Optional learning belief buffer for smoothing-based updates.

None
valid_steps int | Array | None

Number of valid timesteps in padded fixed windows.

None

Returns:

Type Description
tuple

(updated_agent, action, qs, info) where info contains policy posterior and additional policy-search diagnostics.

rollout(agent: Agent, env: Env, num_timesteps: int, rng_key: Array, initial_carry: dict[str, Any] | None = None, policy_search: Callable[[Agent, list[Array], Array], tuple[Array, dict[str, Array]]] | None = None, env_params: Any = None) -> tuple[dict[str, Any], dict[str, Any]]

Roll out an active-inference agent/environment loop for num_timesteps.

Parameters:

Name Type Description Default
agent Agent

Active inference agent.

required
env Env

Environment implementing reset and step.

required
num_timesteps int

Number of timesteps to simulate.

required
rng_key Array

Root PRNG key; internally split per-step and per-batch.

required
initial_carry dict | None

Optional carry overrides for warm-starting from existing state.

None
policy_search callable | None

Optional custom policy-search routine.

None
env_params pytree | None

Optional batched environment parameters.

None

Returns:

Name Type Description
last dict

Final carry state after the final timestep.

info dict

Time-indexed rollout traces (actions, observations, beliefs, etc.).