LessWrong March 7, 2026 at 03:09 PM

current-activation-oracles-are-hard-to...

1 correction found

Claim

Activations were taken at the 50% layer (layer 32/64 for 32B, layer 16/32 for 8B) based on a layer sweep.

Correction

Qwen3-8B is not a 32-layer model. Its official config lists 36 hidden layers, so describing the 8B model’s halfway point as “layer 16/32” is incorrect.

Full reasoning

The post says the 8B model's 50% layer is "layer 16/32," but the official Hugging Face config for Qwen/Qwen3-8B lists num_hidden_layers: 36, not 32.

So while the 32B model's "32/64" description matches the official config, the 8B model's denominator is wrong. If the experiments really used the 50%-depth layer, it would be based on a 36-layer model rather than a 32-layer one.

This matters because the post presents these exact layer counts as implementation details tied to the layer sweep and the evaluation setup.

2 sources

Qwen/Qwen3-8B config.json
"model_type": "qwen3", "num_hidden_layers": 36
Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers
For evaluation, we use activations from 50% depth (see Appendix C.5 for ablations).

Model: OPENAI_GPT_5 Prompt: v1.16.0