x.com/MaxRovensky/status/2041910644375478397
1 correction found
it has 100% impenetrable safeguard around it by default
This is incorrect because security controls are not “100% impenetrable” by default, and OpenAI’s own safety documentation says untrusted text can still be used for prompt-injection attacks.
Full reasoning
The claim uses an absolute security guarantee (“100% impenetrable”) that is contradicted by both general cybersecurity guidance and OpenAI’s own documentation.
- CISA states plainly that “No one system or network is completely impenetrable.” That directly contradicts the idea of a default, perfectly impenetrable safeguard.
- OpenAI’s agent safety guidance says “Prompt injections are a common and dangerous type of attack” and explains that untrusted text or data can enter an AI system and try to override its instructions. That means plain text is not automatically protected by some perfect default barrier.
- OpenAI’s own security writing on agentic systems likewise describes prompt injection as an “open challenge for agent security” and shows examples where malicious content can cause unintended actions.
So while a system may have mitigations, calling them “100% impenetrable” is factually wrong. Current AI safety/security guidance explicitly says these attacks remain possible and must be mitigated, not assumed impossible.
3 sources
- Malware Threats and Mitigation (CISA)
CISA states: “No one system or network is completely impenetrable...”
- Safety in building agents | OpenAI API
OpenAI says: “Prompt injections are a common and dangerous type of attack.” It explains that “untrusted text or data enters an AI system” and can attempt to override instructions to the AI.
- Continuously hardening ChatGPT Atlas against prompt injection attacks | OpenAI
OpenAI describes prompt injection as “an open challenge for agent security” and gives examples of malicious content causing unintended actions by an AI agent.