www.lesswrong.com/posts/skKYznZyRtN87tHbB/neuronpedia
1 correction found
Neuronpedia is an AI safety game that documents and explains each neuron in modern AI models.
Neuronpedia no longer describes itself this way. The project officially pivoted in 2024 from a crowdsourced neuron-explanation game to a mechanistic-interpretability research platform focused on sparse autoencoders and related tools.
Full reasoning
This statement is outdated and contradicted by Neuronpedia’s later official materials.
- In Neuronpedia’s March 25, 2024 announcement post, the authors explicitly say: “Neuronpedia is a platform for mechanistic interpretability research. It was previously focused on crowdsourcing explanations of neurons, but we've pivoted to accelerating researchers for Sparse Autoencoders (SAEs) by hosting models, feature dashboards, data visualizations, tooling, and more.”
- Neuronpedia’s current homepage likewise describes it as “an open source interpretability platform” where users can “Explore, steer, and experiment on AI models,” not as an AI-safety game.
- The homepage also says Neuronpedia now supports “probes, latents/features, custom vectors, concepts, and more,” which directly conflicts with the narrower description that it documents and explains “each neuron.”
So while this may have described an early 2023 version of the project, it is not an accurate description of what Neuronpedia is now.
2 sources
- Announcing Neuronpedia: Platform for accelerating research into Sparse Autoencoders - LessWrong
TL;DR Neuronpedia is a platform for mechanistic interpretability research. It was previously focused on crowdsourcing explanations of neurons, but we've pivoted to accelerating researchers for Sparse Autoencoders (SAEs) by hosting models, feature dashboards, data visualizations, tooling, and more.
- Neuronpedia
Neuronpedia is an open source interpretability platform. Explore, steer, and experiment on AI models. ... Neuronpedia supports probes, latents/features, custom vectors, concepts, and more.