Wikipedia April 24, 2026 at 06:43 AM

AIXI

2 corrections found

Claim

The AIXI agent is associated with a stochastic policy π : ( A × E ) ∗ → A

Correction

This mixes up deterministic and stochastic policies. A map from histories directly to actions, π:(A×E)*→A, is deterministic, not stochastic.

Full reasoning

In the AIXI literature, a deterministic policy maps a history directly to an action. That is exactly the type signature shown here: π : (A × E)* → A.

Primary sources describe it this way:

Veness et al. state: "Formally, a policy is a function that maps a history to an action" and then write π:(A×X)*→A.
An official AIXI overview on Hutter's site distinguishes the two cases explicitly: in general a stochastic policy is a distribution over actions, π:(A×E)*→ΔA, while the Bayes-optimal/AIXI-style policy is deterministic.

So the article's sentence is internally inconsistent: it calls π a stochastic policy while simultaneously giving the type of a deterministic policy.

2 sources

A Monte-Carlo AIXI Approximation
Formally, a policy is a function that maps a history to an action ... the m-horizon expected future reward of an agent acting under policy π:(A×X)*→A ...
AIXIjs: General reinforcement learning in the browser
We identify an agent with its policy, which in general is a distribution over actions π(a_t|ae_{<t}), π:(A×E)*→ΔA ... AIξ ... it is a deterministic policy.

Claim

μ : ( A × E ) ∗ × A → E

Correction

This gives the wrong mathematical type for the environment. In AIXI, the environment maps histories and actions to a distribution over percepts, not directly to a percept.

Full reasoning

This type signature is incorrect for a stochastic environment.

Earlier in the article, the environment is correctly described as a probability distribution over percepts conditioned on the history and next action. For that reason, its codomain cannot be E itself; it must be a space of probability measures over E.

Primary sources define the environment that way:

The official AIXI overview on Hutter's site writes the environment as a distribution over percepts with type ν:(A×E)*×A→ΔE.
Veness et al. define an environment as a sequence of conditional probability functions and call it a probability distribution over possible observation-reward sequences conditioned on actions.

So writing μ : (A × E)* × A → E is mathematically wrong for the stochastic AIXI setting. That type would describe a deterministic environment, not the probability distribution the text is discussing.

2 sources

AIXIjs: General reinforcement learning in the browser
An environment is a distribution over percepts ν(e_t|ae_{<t}a_t) with ν:(A×E)*×A→ΔE.
A Monte-Carlo AIXI Approximation
The following definition states that the environment takes the form of a probability distribution over possible observation-reward sequences conditioned on actions taken by the agent. Definition 2. An environment is a sequence of conditional probability functions ...

Model: OPENAI_GPT_5 Prompt: v1.16.0