Skip to content

Lunette

Investigating Trajectories

Sign up

fulcrumresearch/lunette

Investigating Trajectories¶

Once you've captured trajectories, you can investigate them to understand agent behavior and identify issues.

The Investigation Flow¶

Browse trajectories at lunette.dev
Filter by score, task, model, or metadata to find interesting cases
Launch an investigation on a trajectory or set of trajectories
Review findings — investigators create issues with evidence and confidence scores

What Investigators Do¶

When you launch an investigation, an AI agent analyzes your trajectories. The investigator:

Reads the trajectory — Understands what the agent was trying to do
Analyzes the execution — Identifies where things went wrong (or right)
Accesses the environment (if available) — Runs commands, inspects files, reproduces errors
Creates issues — Documents findings with evidence and confidence scores

Types of Issues¶

Investigators look for several categories of problems:

Test Mis-specification¶

Under-specification: Tests are too permissive, accepting incorrect solutions
Over-specification: Tests are too restrictive, rejecting valid solutions
Mis-alignment: Tests contradict the problem description

Environment Problems¶

Missing files or dependencies
Broken tools or inconsistent behavior
Configuration issues

Agent Behavior¶

Reward hacking (attempting to cheat)
Unusual failure patterns
Systematic errors

Investigation Output¶

Each issue includes:

Name: Brief description of the problem
Description: Detailed explanation with message references
Proof: Evidence demonstrating the issue is real
Confidence: Score from 0.0-1.0 based on evidence strength
Trajectory references: Links to specific messages

Best Practices¶

Use sandboxes — Investigators are much more effective when they can access the environment
Filter before investigating — Focus on failed or interesting trajectories
Review confidence scores — Higher confidence means stronger evidence
Check the proof — Investigators include reproduction steps when possible