www.lesswrong.com/posts/XRADGH4BpRKaoyqcs/the-first-confirmed-instance-of-an-llm...
1 correction found
noticed odd behaviors from their resource usage metrics
The paper says the first warning came from security telemetry and firewall alerts, not from resource-usage metrics.
Full reasoning
This summary reverses the paper’s stated sequence of discovery. In §3.1.4, the authors explicitly say their first signal was security telemetry: Alibaba Cloud’s managed firewall flagged security-policy violations from the training servers. They then correlated those firewall timestamps with system telemetry and RL traces. So the initial anomaly was not discovered via “resource usage metrics”; it was discovered via security alerts, and only later investigated alongside other telemetry.
2 sources
- Let It Flow: Agentic Crafting on Rock and Roll
"Our first signal came not from training curves but from production-grade security telemetry." The paper then says Alibaba Cloud’s managed firewall flagged the violations.
- [2512.24873] Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem
The arXiv entry identifies the paper being summarized; the quoted §3.1.4 passage in the paper describes the first signal as security telemetry from Alibaba Cloud’s managed firewall.