Prompt Injection Attacks Explained

Prompt injection attacks work by smuggling malicious instructions into user inputs or external content that an LLM reads. Because the model treats instructions and data similarly, it can be tricked into ignoring your system prompt or policies.

Typical examples include:

  • Users telling the model to “ignore previous instructions” and reveal secrets.
  • Malicious content embedded in web pages or documents you ask the model to summarize.
  • Subtle instruction patterns that bypass naive filters.

The AI Security pillar page explains how prompt‑injection testing fits into broader red teaming and defense‑in‑depth strategies.