en.wikipedia.org/wiki/AIXI
2 corrections found
The AIXI agent is associated with a stochastic policy π : ( A × E ) ∗ → A
This mixes up deterministic and stochastic policies. A map from histories directly to actions, π:(A×E)*→A, is deterministic, not stochastic.
Full reasoning
In the AIXI literature, a deterministic policy maps a history directly to an action. That is exactly the type signature shown here: π : (A × E)* → A.
Primary sources describe it this way:
- Veness et al. state: "Formally, a policy is a function that maps a history to an action" and then write
π:(A×X)*→A. - An official AIXI overview on Hutter's site distinguishes the two cases explicitly: in general a stochastic policy is a distribution over actions,
π:(A×E)*→ΔA, while the Bayes-optimal/AIXI-style policy is deterministic.
So the article's sentence is internally inconsistent: it calls π a stochastic policy while simultaneously giving the type of a deterministic policy.
2 sources
- A Monte-Carlo AIXI Approximation
Formally, a policy is a function that maps a history to an action ... the m-horizon expected future reward of an agent acting under policy π:(A×X)*→A ...
- AIXIjs: General reinforcement learning in the browser
We identify an agent with its policy, which in general is a distribution over actions π(a_t|ae_{<t}), π:(A×E)*→ΔA ... AIξ ... it is a deterministic policy.
μ : ( A × E ) ∗ × A → E
This gives the wrong mathematical type for the environment. In AIXI, the environment maps histories and actions to a distribution over percepts, not directly to a percept.
Full reasoning
This type signature is incorrect for a stochastic environment.
Earlier in the article, the environment is correctly described as a probability distribution over percepts conditioned on the history and next action. For that reason, its codomain cannot be E itself; it must be a space of probability measures over E.
Primary sources define the environment that way:
- The official AIXI overview on Hutter's site writes the environment as a distribution over percepts with type
ν:(A×E)*×A→ΔE. - Veness et al. define an environment as a sequence of conditional probability functions and call it a probability distribution over possible observation-reward sequences conditioned on actions.
So writing μ : (A × E)* × A → E is mathematically wrong for the stochastic AIXI setting. That type would describe a deterministic environment, not the probability distribution the text is discussing.
2 sources
- AIXIjs: General reinforcement learning in the browser
An environment is a distribution over percepts ν(e_t|ae_{<t}a_t) with ν:(A×E)*×A→ΔE.
- A Monte-Carlo AIXI Approximation
The following definition states that the environment takes the form of a probability distribution over possible observation-reward sequences conditioned on actions taken by the agent. Definition 2. An environment is a sequence of conditional probability functions ...