What is Prompt Injection? AI Security Attack Explained

What is Prompt Injection?

An attack technique where malicious instructions are inserted into AI prompts to manipulate the model's behavior or extract sensitive information.

Prompt injection is a security vulnerability in AI language models where an attacker crafts input that causes the AI to ignore its original instructions and follow new, malicious ones instead. It is one of the most significant security risks in AI-powered applications.

There are two main types: direct prompt injection, where the attacker directly inputs malicious prompts to the AI, and indirect prompt injection, where malicious instructions are embedded in external data sources (websites, documents, emails) that the AI processes.

Examples include: instructing an AI chatbot to reveal its system prompt or confidential instructions, manipulating AI-powered email filters to ignore spam, tricking AI code assistants into generating vulnerable code, and extracting training data or private information through carefully crafted prompts.

Mitigation strategies include input validation and sanitization, output filtering, instruction hierarchy (system prompts with higher priority), sandboxing AI responses, human review of AI actions, and regular security testing of AI-powered applications.

Protect Your Organization from AI Risks

Aona AI provides automated Shadow AI discovery, real-time policy enforcement, and comprehensive AI governance for enterprises.

Book a Demo ← Back to Glossary

What is Prompt Injection?

Related Terms

AI Security

Data Leakage (AI)

AI Red Teaming

Protect Your Organization from AI Risks

Quick Links

Others

Resources

Socials

Contact