Riftcheck simulates adversarial interactions between your AI agents, catching deadlocks, data leaks, and cascading failures in a test suite — not in production.
From zero to an interactive trace viewer in under a minute. Riftcheck scans your existing agent code, generates test scenarios, and runs them with statistical confidence.
Point riftcheck init at your project directory. It AST-scans your Python files, detects CrewAI / LangGraph / AutoGen agents, and generates a scenario.yaml + agents.py with smart defaults.
Riftcheck runs your scenario N times (default 5) and reports pass rates with Wilson score confidence intervals — not a single pass/fail. A result like 80% [36%–97%] tells you both what happened and how much to trust it. Wide interval? Run more iterations.
The --view flag opens an interactive HTML viewer in your browser. Three viewing modes — Spotlight (cinematic replay), Constellation (agent graph), and Timeline (swimlane) — let you inspect every message, fault, and property check. Try the live demo →
Agents never call each other directly. All communication routes through the engine, which records every message, applies faults, and runs property checks.
Every checker is stateless — it receives a complete trace and returns a pass/fail with evidence. Aligned with the MAST failure taxonomy (NeurIPS 2025).
| Checker | What It Catches | Category |
|---|---|---|
| no_information_leak | Private setup data (API keys, budgets, deadlines) appearing in other agents' messages | Safety |
| role_boundary | Agent performing actions outside its assigned role (e.g., executor approving its own work) | Safety |
| reasoning_action_consistency | Agent states one intent ("I will NOT share the data") then does the opposite | Safety |
| Checker | What It Catches | Category |
|---|---|---|
| converges_within | Agents don't reach agreement or completion within the turn limit | Workflow |
| no_premature_termination | Agent declares "done" before required milestones are hit | Workflow |
| ensures_information_flow | Required data didn't transfer between agents along expected paths | Workflow |
| stays_on_task | Agent semantically drifts off-topic over consecutive messages | Workflow |
| task_specification_compliance | Agent output doesn't contain required terms or contains forbidden terms | Workflow |
| Checker | What It Catches | Category |
|---|---|---|
| communication_quality | Messages too short, too long, or lacking acknowledgements (composite score) | Quality |
| asks_for_clarification | Agent proceeds without asking when given ambiguous instructions | Quality |
| respects_peer_input | Agent ignores corrections or feedback from other agents | Quality |
| state_continuity | Agent forgets prior context or contradicts its earlier statements | Quality |
| no_conversation_reset | Agent state fingerprint reverts to near-initial (amnesia) | Quality |
| Checker | What It Catches | Category |
|---|---|---|
| no_deadlock | All-same loop (last N messages identical) or ping-pong alternating pair | Structure |
| step_repetition | Structurally identical messages from the same agent (detected via fingerprinting) | Structure |
| output_schema | Final message isn't valid JSON or doesn't match a provided JSON Schema | Structure |
Faults intercept messages between send and receive — they modify what an agent sees without touching agent internals. Five fault types cover the real-world failure space.
Replaces or appends garbled text to a message. Simulates network errors, encoding failures, or truncated transmissions.
Drops the message entirely and delivers a fallback. Simulates message queue failures, timeouts, and lost packets.
Adds real time delay before delivery. Tests timeout logic, retry behavior, and patience under slow responses.
Injects fabricated facts or swaps identifiers. Tests whether agents cross-check state vs. incoming data.
Injects conflicting instructions opposing the agent's current task. Critical for safety and alignment testing.
Every command is designed to fit into the test-iterate-integrate loop. Click any command below to see its full reference.
The primary command. Loads a scenario YAML, resolves the agents file, runs N simulations, applies property checks, and prints aggregate statistics. Optionally opens the interactive HTML viewer.
AST-scans your project directory, detects CrewAI / LangGraph / AutoGen agents, extracts topology, and generates a starter scenario.yaml + agents.py with smart assertion selection.
Converts a saved JSONL trace into a self-contained HTML file and opens it in your browser. No server required — the entire React app is embedded.
Compare a baseline trace with a fault-injected or modified trace. Shows differences turn-by-turn with the viewer in diff mode.
Reads all *.jsonl files in a directory and builds a single-page dashboard showing pass rates, topology, and failure patterns across all your scenarios.
Apply new or modified property checkers to previously recorded traces without re-running the simulation. Perfect for iterating on assertion parameters without burning API credits.
Export a JSONL trace to either the interactive HTML viewer or MAST-compatible annotation format (NeurIPS 2025). MAST output integrates with external annotation pipelines for failure mode categorization.
Riftcheck wraps your existing agents — it doesn't replace them. Four adapters cover the most popular multi-agent frameworks plus a universal raw Python option.
The fastest way to prototype. No framework dependencies — just a Python function that takes a message dict and mutable state, and returns a string.
Wraps a crewai.Agent. Riftcheck creates a one-shot Task + Crew per message and captures the kickoff result. Install: pip install crewai
Two adapters: LangGraphAgent wraps a compiled graph; LangGraphNodeAgent wraps individual node functions (recommended — avoids graph construction errors). Install: pip install langgraph
Wraps AG2/AutoGen ConversableAgent or AssistantAgent. Uses the agent's built-in reply mechanism. Install: pip install "ag2[anthropic]"
Riftcheck ships with a pytest plugin that discovers scenario YAML files and runs them as test cases. Drop scenarios into your test directory and they run with pytest.
riftcheck init ./my_project — Scan, detect, generate.
Edit scenario.yaml — adjust assertions, add setup data, inject faults, set milestones.
riftcheck run scenario.yaml --view — Execute N times, open the trace viewer.
riftcheck replay trace.jsonl --scenario scenario.yaml — Re-check properties at zero LLM cost.
riftcheck diff baseline.jsonl modified.jsonl — Side-by-side before/after.
Copy to tests/riftcheck/ and run with pytest in CI.
riftcheck dashboard .riftcheck/traces/ — Aggregate health across all scenarios.