All corrections
Wikipedia April 24, 2026 at 06:43 AM

en.wikipedia.org/wiki/AIXI

2 corrections found

1
Claim
The AIXI agent is associated with a stochastic policy π : ( A × E ) ∗ → A
Correction

This mixes up deterministic and stochastic policies. A map from histories directly to actions, π:(A×E)*→A, is deterministic, not stochastic.

Full reasoning

In the AIXI literature, a deterministic policy maps a history directly to an action. That is exactly the type signature shown here: π : (A × E)* → A.

Primary sources describe it this way:

  • Veness et al. state: "Formally, a policy is a function that maps a history to an action" and then write π:(A×X)*→A.
  • An official AIXI overview on Hutter's site distinguishes the two cases explicitly: in general a stochastic policy is a distribution over actions, π:(A×E)*→ΔA, while the Bayes-optimal/AIXI-style policy is deterministic.

So the article's sentence is internally inconsistent: it calls π a stochastic policy while simultaneously giving the type of a deterministic policy.

2 sources
2
Claim
μ : ( A × E ) ∗ × A → E
Correction

This gives the wrong mathematical type for the environment. In AIXI, the environment maps histories and actions to a distribution over percepts, not directly to a percept.

Full reasoning

This type signature is incorrect for a stochastic environment.

Earlier in the article, the environment is correctly described as a probability distribution over percepts conditioned on the history and next action. For that reason, its codomain cannot be E itself; it must be a space of probability measures over E.

Primary sources define the environment that way:

  • The official AIXI overview on Hutter's site writes the environment as a distribution over percepts with type ν:(A×E)*×A→ΔE.
  • Veness et al. define an environment as a sequence of conditional probability functions and call it a probability distribution over possible observation-reward sequences conditioned on actions.

So writing μ : (A × E)* × A → E is mathematically wrong for the stochastic AIXI setting. That type would describe a deterministic environment, not the probability distribution the text is discussing.

2 sources
  • AIXIjs: General reinforcement learning in the browser

    An environment is a distribution over percepts ν(e_t|ae_{<t}a_t) with ν:(A×E)*×A→ΔE.

  • A Monte-Carlo AIXI Approximation

    The following definition states that the environment takes the form of a probability distribution over possible observation-reward sequences conditioned on actions taken by the agent. Definition 2. An environment is a sequence of conditional probability functions ...

Model: OPENAI_GPT_5 Prompt: v1.16.0