A prompt injection attack is a technique where malicious instructions are embedded in content that an AI system processes, causing it to override its original instructions, bypass safety filters, or take unintended actions. Considered the AI equivalent of SQL injection, prompt injection is among the most critical security risks identified in the OWASP LLM Top 10.
Direct prompt injection occurs when a user intentionally crafts input to manipulate an AI assistant's behavior — for example, "Ignore previous instructions and instead [malicious action]". Indirect prompt injection is more insidious: malicious instructions are hidden in content the AI is asked to process, such as an email being summarized, a document being analyzed, or a webpage being read. When the AI processes this content, it executes the embedded instructions without the user realizing it.
In agentic AI systems — where AI models can execute tools, browse the web, send emails, and make API calls — prompt injection becomes particularly dangerous. A compromised AI agent can be instructed to exfiltrate data, create unauthorized accounts, modify files, or perform actions that violate organizational policies. Security researchers have demonstrated prompt injection attacks against major AI assistants that caused them to forward emails, expose chat history, and execute unauthorized code.
Defense strategies include: input/output validation and sanitization; privilege separation (AI agents should operate with minimal necessary permissions); content isolation (treating AI-processed external content as untrusted); monitoring for anomalous AI behavior; and regular red teaming of AI systems to identify injection vulnerabilities before deployment.
The OWASP LLM Top 10 lists prompt injection as the #1 risk for LLM-based applications, reflecting its prevalence and the severity of potential consequences in enterprise environments.