Investigating Trajectories¶
Once you've captured trajectories, you can investigate them to understand agent behavior and identify issues.
The Investigation Flow¶
- Browse trajectories at lunette.dev
- Filter by score, task, model, or metadata to find interesting cases
- Launch an investigation on a trajectory or set of trajectories
- Review findings — investigators create issues with evidence and confidence scores
What Investigators Do¶
When you launch an investigation, an AI agent analyzes your trajectories. The investigator:
- Reads the trajectory — Understands what the agent was trying to do
- Analyzes the execution — Identifies where things went wrong (or right)
- Accesses the environment (if available) — Runs commands, inspects files, reproduces errors
- Creates issues — Documents findings with evidence and confidence scores
Types of Issues¶
Investigators look for several categories of problems:
Test Mis-specification¶
- Under-specification: Tests are too permissive, accepting incorrect solutions
- Over-specification: Tests are too restrictive, rejecting valid solutions
- Mis-alignment: Tests contradict the problem description
Environment Problems¶
- Missing files or dependencies
- Broken tools or inconsistent behavior
- Configuration issues
Agent Behavior¶
- Reward hacking (attempting to cheat)
- Unusual failure patterns
- Systematic errors
Investigation Output¶
Each issue includes:
- Name: Brief description of the problem
- Description: Detailed explanation with message references
- Proof: Evidence demonstrating the issue is real
- Confidence: Score from 0.0-1.0 based on evidence strength
- Trajectory references: Links to specific messages
Best Practices¶
- Use sandboxes — Investigators are much more effective when they can access the environment
- Filter before investigating — Focus on failed or interesting trajectories
- Review confidence scores — Higher confidence means stronger evidence
- Check the proof — Investigators include reproduction steps when possible