Name: FIND EVIL!
Start: 2026-04-15T12:00:00.000-04:00
End: 2026-06-15T23:45:00.000-04:00
Location: FIND EVIL!

The Speed Problem

An AI-powered adversary can go from initial access to full domain control in under 8 minutes. CrowdStrike's fastest observed breakout time: 7 minutes. Horizon3's autonomous agent: 60 seconds to full privilege escalation. MIT's 2024 research: AI-driven attack workflows running 47 times faster than human operators.

Meanwhile, a human incident responder is still pulling up their toolkit.

That gap is the most dangerous problem in cybersecurity. Find Evil! challenges you to close it.

The Mission

You'll build autonomous AI agents on the SANS SIFT Workstation --- 200+ incident response tools on a single platform, 18 years of community development, 60K+ annual downloads. Protocol SIFT, the proof-of-concept framework that connects AI agents to those tools through Model Context Protocol (MCP).

Protocol SIFT works. It also hallucinates more than we'd like. (That's exactly why this hackathon exists.) Unlike offensive teams that operate with three or four people in secret, we're putting the entire practitioner community on this problem simultaneously. Your job: teach an AI agent to think like a senior analyst --- how to sequence its approach, recognize when something doesn't add up, and self-correct when it gets it wrong.

Who Should Join

You don't need to be an incident response expert. The SIFT Workstation handles the domain tooling. You need curiosity and building skills.

IR/Security professionals: You've been finding evil manually for years. Build the AI partner you wish you had at 3 AM during an active incident.
AI/ML engineers: Apply your skills to a domain where speed determines whether attackers win. Real case data, real tools, no toy datasets.
Students and early-career builders: No IR background required. The SIFT Workstation is your on-ramp to the most in-demand intersection in tech.
Open-source contributors: Every submission lives on as a community tool. Build something thousands of responders will use.

Four supported architectural approaches: Direct Agent Extension (Claude Code or OpenClaw), Custom MCP Server, Multi-Agent Frameworks (AutoGen, CrewAI, LangGraph), or Alternative Agentic IDEs (Cursor, Cline, Aider). Teams up to 5. Solo permitted. April 15 -- June 15, 2026. $22,000+ in prizes.

About the Challenge

Why this exists

In November 2025, Anthropic's security team published findings on GTG-1002 --- a Chinese state-sponsored operation where attackers used Claude Code to run autonomous reconnaissance, exploitation, and lateral movement at 80-90% autonomy. The AI handled everything at request rates Anthropic described as "physically impossible" for human operators.

That was the offensive side. The SIFT Workstation is the defensive platform. Protocol SIFT demonstrated what's possible when you connect AI agents to that platform through MCP. This hackathon is how the community makes it real.

The DFIR community built the SIFT Workstation 18 years ago to give every practitioner access to professional-grade tools. Find Evil! extends that mission: give every responder an AI co-pilot that can triage incidents at the speed adversaries now operate.

The gap we're closing

Manual command-line incident response cannot compete with autonomous agents executing thousands of requests. Adversaries move at machine speed. Defenders still look up command-line flags during active incidents. Your goal: build AI systems on the SIFT Workstation that match that velocity --- triaging, correlating, and reporting at the pace the threat demands.

This hackathon is how.

Get Started

Register on Devpost (you're here)
Join the Protocol SIFT Slack --- this is where questions get answered, teams form, and mentors hang out -
Download the SIFT Workstation from sans.org/tools/sift-workstation
Install Protocol SIFT Package to demonstrate automated analysis, To install Protocol SIFT, after you download SIFT OVA, and login, run this command from your terminal:: $ curl -fsSL https://raw.githubusercontent.com/teamdfir/protocol-sift/main/install.sh | bash
Review the starter resources: sample case data (hard drives, memory images), example submission.
Pick a problem and start building. See "What to Build" for project ideas and supported architectural approaches to get past the blank-screen problem.

Requirements

What to Build

One goal: Make Protocol SIFT a fully autonomous incident response agent.

Your submission must improve how Protocol SIFT processes case data --- any case data. Disk images, memory captures, remote endpoints via MCP, log files, network captures. The data type doesn't define the track. The quality of autonomous execution does.

Teach the agent how a senior analyst thinks. How they sequence their approach. How they recognize when something doesn't add up. How they adjust.

Supported Architectural Approaches

You can build on any of these patterns. The platform matters less than how your architecture enforces evidence integrity and enables genuine self-correction.

1. Direct Agent Extension (Claude Code / OpenClaw) --- Extend Protocol SIFT's existing agent loop. Better prompt engineering, smarter tool sequencing, self-correction routines, accuracy validation. This is the on-ramp for most participants and the fastest path to a working submission. OpenClaw's extensible architecture also makes it a natural fit for building custom MCP tool wrappers directly into the tool chain.

2. Custom MCP Server --- Build a purpose-built MCP server that exposes structured functions instead of generic shell commands. Instead of giving the AI execute_shell_cmd, expose typed functions like get_amcache(), extract_mft_timeline(), analyze_prefetch(). The agent physically cannot run destructive commands because the server doesn't have those tools. The MCP server handles raw tool output natively and can parse it before returning to the LLM, preventing context window overload from massive text dumps. (This is the most sound architecture in the evaluation. It's also the most work.)

3. Multi-Agent Frameworks (AutoGen, CrewAI, LangGraph) --- Decompose the analysis into specialized, communicating agents. One agent reviews memory artifacts, another parses disk timelines, a third synthesizes findings. No single model holds all raw data in its context window, which prevents context degradation on complex cases. Agent-to-agent communication is logged programmatically with timestamps and token usage, creating structured execution records. Warning: agent loops can get stuck in infinite conversational spirals without careful termination conditions. Build in max-iteration caps and graceful degradation.

4. Alternative Agentic IDEs (Cursor, Cline, Aider) --- AI-native development environments with their own rule systems. Excellent UI/UX and built-in diff viewing, but designed for software development, not incident response. These tools rely on prompt adherence for evidence protection, not architectural enforcement. If your submission uses an alternative IDE, your accuracy report must document what happens when the model ignores read-only rules.

(If another agentic framework can do the job, we won't disqualify it. But Claude Code, OpenClaw, and the four approaches above are the primary targets. Build for those.)

Starter Ideas (Not Prescriptions)

Two months is enough time to build something real, but the hardest part is always the first hour. These are starting points. The best submissions will go beyond these in directions we haven't considered.

1. The Self-Correcting Triage Agent --- Build an agent that runs initial triage on a disk image, evaluates its own output for logical consistency, identifies gaps in its analysis, and autonomously re-runs with adjusted parameters. Success metric: fewer hallucinated findings than Protocol SIFT's current baseline.

2. Multi-Source Correlation Engine --- Given a disk image and a memory capture from the same system, build an agent that cross-references findings between the two sources and flags discrepancies. If the disk timeline says one thing and memory says another, the agent should catch it.

3. MCP-Connected Live Triage --- Build an MCP server that connects Protocol SIFT to a remote endpoint or SIEM, then create an agent workflow that pulls live data, analyzes it against SIFT's tool library, and produces a real-time triage report.

4. The Analyst Training Loop --- Build an agent that not only analyzes case data but explains its reasoning at each step --- which tool it chose, why, what it expected to find, and what it actually found. Designed to train junior analysts by making the agent's decision-making process transparent.

5. Accuracy Benchmarking Framework --- Create a test harness that runs Protocol SIFT against known-good data with documented ground truth, then scores accuracy, false positive rates, and hallucination frequency. The community needs this benchmark to measure progress.

6. The Purpose-Built MCP Server --- Wrap SIFT's 200+ tools as structured, type-safe functions exposed through a custom MCP server. The agent physically cannot run destructive commands because the server doesn't expose them. Success metric: zero evidence spoliation risk, with the same or better analytical output as the baseline Protocol SIFT agent. (This is the architecture that would make a practitioner comfortable standing behind the results.)

7. The Persistent Learning Loop --- Build a self-correcting execution loop that iterates on a task until verifiable success criteria are met. The agent logs failures to a progress file, learns from its own execution traces across iterations, and course-corrects without human intervention. Must include a hard --max-iterations cap to prevent runaway execution. Success metric: demonstrable improvement in accuracy between first iteration and final iteration on the same data, with full execution traces preserved.

(These are meant to get past the blank-screen problem. The winning submission will almost certainly be something none of us predicted.)

What to Submit

All eight components required. Missing any one means elimination.

1. Code Repository --- GitHub (public). Open-source license (MIT or Apache 2.0).

2. Demo Video (5 min max) --- Screencast of live terminal execution with audio narration. Show the agent working against real case data, including at least one self-correction sequence.

3. Architecture Diagram --- How components connect: the agent, SIFT tools, MCP servers, data sources, output pipeline. Your diagram must identify which architectural pattern you're using and document where security boundaries are enforced. Prompt-based guardrails and architectural guardrails must be clearly distinguished. Judges need to understand your system and its trust boundaries at a glance.

4. Written Project Description --- Devpost project story format: What it does, How you built it, Challenges, What you learned, What's next. Be specific about design decisions, tradeoffs, and which qualities of autonomous execution your submission addresses.

5. Dataset Documentation --- What the agent was tested against, source of data, and what it found. Reproducibility starts here.

6. Accuracy Report --- Self-assessment of findings accuracy. False positives, missed artifacts, hallucinated claims. Include a section documenting your evidence integrity approach: how does your architecture prevent original data from being modified? If you're using prompt-based restrictions rather than architectural enforcement, document what happens when the model ignores the restriction. Did you test for spoliation? (If you found failure modes, document them. That's signal, not weakness.)

7. Try-It-Out Instructions --- Live deployment URL or step-by-step instructions for judges to run your agent locally on the downloadable SIFT workstation. If local setup requires specific tools or dependencies, document them clearly in the README.

8. Agent Execution Logs --- Structured logs showing the full agent communication and tool execution sequence. For multi-agent submissions: agent-to-agent message logs with timestamps. For single-agent submissions: tool execution logs with timestamps and token usage. For persistent loop submissions: iteration-over-iteration traces showing how the agent's approach changed. Judges must be able to trace any finding back to the specific tool execution that produced it.

Hackathon Sponsors

Prizes

$22,000 in prizes

1st Place - SLAYED EVIL

$10,000 in cash

1 winner

SANS Summit pass + hotel (each member) + SANS OnDemand course (each member)

Presentation on SANS Webcast/Livestream broadcast to the SANS Community

2nd Place - HUNTED EVIL

$7,500 in cash

1 winner

SANS Summit pass + hotel (each member) + SANS OnDemand course (each member)

Presentation on SANS Webcast/Livestream broadcast to the SANS Community

3rd Place - FOUND EVIL

$4,500 in cash

1 winner

SANS OnDemand course (each member)

Judges

Rob T. Lee
CAIO, SANS INSTITUTE

Ahmed AbuGharbia
Founder, cyberdojo.ai

Brad Edwards
Domain Consultant SecOps, Palo Alto Networks

Teri Green
VP of Technology, Elevate

Yevhen Pervushyn
Founder & Adversarial AI Security Researcher, Red Asgard

Harish Vundavalli
Sr. Technical Architect, Strategic Education INC

Nimitt Jhaveri
CEO, BitScore Cybertech LLP

Narrayanan MKL
VP Cyber Defence, Standard Chartered Bank

Roshan Varghese
Sr. Information Security Manager, Incident Response

Jens Ernstberger
Security Researcher, Kontext Security

Jeroen Hoof
Freelance Lead Analyst, SANS Instructor

Sneha Parmar
Director EDR, Deutsche Bank

Nodirjon Umurkulov
Security Researcher/Engineer

Pedro Jimenez Argente del Castillo
SOC Chapter Lead, ING Hubs Spain

Michael Barclay
Principal Security Researcher, Origin Security

Ovie Carroll
Director DOJ Cybercrime Lab

Joshua McCray
Sr. Lead Cyber Security Analyst, Hilton

Cheri Carr
Principal Consultant & Owner, Aspen Forensics

Kellep Charles
Cybersecurity Chair, Capitol Technology University

Dorian Oliver Collier
National CSIRT Lead & DFIR Specialist

Brett Cumming
CISO, Skechers

Yotam Perkal
Director Security Research, Pluto Security

Marc Brawner
Managing Partner, Auxiris

Adam Nasreldin
Senior IR Consultant, Google Mandiant

Georgios Kapoglis
Staff Detection & Response Engineer, Roblox

Steve Cobb
CISO, SecurityScorecard

Maximilian Gutowski
Head of Threat Detection & Response, Deutsche Telekom Security

John Wilson
CISO & President of Forensics, HaystackID

Jason Garman
Principal Security Specialist, AWS

Richard Nathan Smith
Enterprise Architect

Saurabh Naik
Head of Red Team, Lockheed Martin

Dr. Stephen Coston
Lead Security Architect AI and Cybersecurity

Mathieu Alcaina
SOC L3 / DFIR Analyst, Onepoint

Monish Alur Gowdru
Technical Security Lead, UltraViolet Cyber

Jon Stewart
Managing Director, LevelBlue

Sumit Ranjan
AI Security Advisor & Ex CTO

Amanda Rankhorn
FBI Special Agent/Senior Forensics Examiner (Retired)

Khushi Gupta, Ph.D.
Asst. Professor of Cybersecurity, University of North Georgia

Preston Fitzgerald
Cybersecurity SME, SANS Institute

Muhammad Shera
DFIR Consultant

Sandeep Bachhas
Sr. Manager, Cyber Threat Hunting

Judging Criteria

1. Autonomous Execution Quality (tiebreaker)
Does the agent reason about next steps, handle failures, and self-correct in real time?
2. IR Accuracy
Are findings correct? Hallucinations caught and flagged? Confirmed findings distinguished from inferences?
3. Breadth and Depth of Analysis
How much case data can the agent handle? Depth on fewer types beats shallow coverage of many.
4. Constraint Implementation
Are guardrails architectural or prompt-based? Judges evaluate where security boundaries are enforced and whether they were tested for bypass.
5. Audit Trail Quality
Can judges trace any finding back to the specific tool execution that produced it?
6. Usability and Documentation
Can another practitioner deploy and build on this?

Questions? Email the hackathon manager

Hackathon sponsors

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Online	Public
$22,000 in cash	4115 participants

FIND EVIL!

FIND EVIL!

AI threats strike in minutes. Build the defender that responds in seconds.

Who can participate

Requirements

Hackathon Sponsors

Prizes

1st Place - SLAYED EVIL

2nd Place - HUNTED EVIL

3rd Place - FOUND EVIL

Devpost Achievements

Judges

Judging Criteria

Hackathon sponsors