www.lesswrong.com/posts/ioZxrP7BhS5ArK59w/did-claude-3-opus-align-itself-via-gra...
2 corrections found
Under ReLU (and, very nearly, the more widely used GeLU), neurons with negative pre-activations have their final activations clipped to zero. This means that, for an inactive neuron, the error signal at that neuron is also clipped to zero. This prevents gradient from flowing through it, both to update the weights feeding into it and to propagate further back to earlier layers
This is true for ReLU, but not for GELU. GELU is defined as x·Φ(x), so negative inputs are generally not clipped to zero and their gradients are not zeroed out in the ReLU sense.
Full reasoning
The post treats GeLU as if it behaved like ReLU on negative inputs, but that is not how GeLU works.
ReLU is a hard gating function: (\mathrm{ReLU}(x)=\max(0,x)), so negative inputs become exactly 0 and the backward signal is zero on that side.
By contrast, the original GELU paper defines GELU as (x\Phi(x)), where (\Phi(x)) is the standard normal CDF. PyTorch's documentation gives the same formula. Because (\Phi(x)) is positive for all finite (x), GELU does not clip all negative pre-activations to zero. For example, when (x<0), (x\Phi(x)) is typically a small negative number, not 0. So the claim that negative pre-activations are "clipped to zero" under GeLU is mathematically incorrect.
That also undermines the next two sentences: if the activation is not zeroed out in the ReLU way, then the neuron is not generally "inactive" in the same sense, and the gradient is not generally clipped to zero either. GELU is smooth and differentiable, so gradient can still flow for negative inputs.
A narrower claim limited to ReLU would be fine; extending it to GeLU is the factual error.
2 sources
- Gaussian Error Linear Units (GELUs)
The GELU activation function is xΦ(x), where Φ(x) the standard Gaussian cumulative distribution function. The GELU nonlinearity weights inputs by their value, rather than gates inputs by their sign as in ReLUs (x1_{x>0}).
- torch.nn.functional.gelu — PyTorch documentation
When the approximate argument is 'none', it applies element-wise the function GELU(x) = x * Φ(x).
Daniella Amodei
Anthropic’s cofounder and president is Daniela Amodei, not “Daniella” Amodei.
Full reasoning
This sentence misspells Anthropic president and cofounder Daniela Amodei's first name as Daniella. Anthropic's own site identifies the company's leadership as "Dario Amodei (CEO) and Daniela Amodei (President)."
1 source
- Anthropic raises $124 million to build more reliable, general AI systems
The company is led by siblings Dario Amodei (CEO) and Daniela Amodei (President).