Tag

#red-team

13 posts tagged red-team.

technique-analysis

How LLM Jailbreaks Work: Techniques, Success Rates, and Defender Responses

A practitioner's breakdown of how LLM jailbreaks work — from roleplay conditioning and encoding tricks to multi-turn manipulation — with attack success rates from peer-reviewed research.
June 20, 2026
technique-analysis

DAN Prompt Jailbreak Explained: How 'Do Anything Now' Attacks Work

DAN (Do Anything Now) is the most replicated persona-injection jailbreak in LLM history. Here's the mechanism, why it worked, what version evolution
June 12, 2026
red-team

ArtPrompt Post-Mortem: Why ASCII-Art Bypasses Worked

A defender-vs-attacker walkthrough of the ArtPrompt ASCII-art jailbreak. Where it slipped past safety training, which model families patched and how, and
May 10, 2026
tooling

Garak in 2026: what it's actually good for, what it isn't

An honest practitioner review of NVIDIA's Garak LLM vulnerability scanner — what its probes catch, where the noise is, and where it slots into a real
May 10, 2026
red-team

Indirect Prompt Injection in LLM Agents: Shipped Failures

Tool-using LLM agents amplify every indirect prompt injection vector. A red-team walkthrough of the exploit classes that have landed against production
May 10, 2026
red-team

Model Behavior Fingerprinting: Identifying a Wrapped LLM

Before you can attack an LLM app effectively, you need to know what model is under the hood. A practitioner walkthrough of behavioral fingerprinting
May 10, 2026
red-team

Multi-Turn Role-Play Attacks: Why One Safe Turn Gets Unsafe

Crescendo, Many-Shot, and gradual context manipulation. How multi-turn jailbreaks evade single-turn classifiers, what's still landing in 2026, and where
May 10, 2026
red-team

Multimodal jailbreaks: image and audio attack surfaces in 2026

Vision and audio inputs are a separate attack channel from text. A practitioner survey of multimodal jailbreaks that still land in 2026 — typographic
May 10, 2026
red-team

Prompt Injection in IDE Coding Agents: Copilot and Cursor

Coding assistants read everything in your repo and increasingly act on it. A red-team walkthrough of the prompt-injection variants that have shipped
May 10, 2026
red-team

Prompt Injection via Retrieved Documents: The RAG Attack Surface

How attacker-controlled content reaches the model through retrieval pipelines, the variants that still land against production RAG stacks, and the
May 10, 2026
red-team

Scoping an AI Red-Team Engagement: The Questions That Matter

A working methodology for scoping LLM red-team engagements — the threat-model conversation, surface inventory, success criteria, and the four scoping
May 10, 2026
red-team

System prompt extraction: the techniques that still leak in 2026

A red-team walkthrough of how system prompts get exfiltrated from production LLM apps — direct extraction, indirect inference, behavioral fingerprinting —
May 10, 2026
red-team

Jailbreak Technique Catalog: Working as of 2026 Q2

Which jailbreak technique classes still work against current production LLMs, what's been hardened, and the cost-of-attack trend. Indexed for practitioners.
May 6, 2026