Skip to main content

Runtime Observability

SquireX can detect runtime capability drift by correlating OpenTelemetry (OTel) session traces against your static metadata. This bridges the gap between what your agent should be able to do (metadata) and what it actually does at runtime.

How It Worksโ€‹

Static Metadata (scan-request.json)
โ†“
Semantic Graph โ†’ Declared tool capabilities
โ†•
Correlation Engine โ†’ Drift detection
โ†‘
Runtime Traces (OTel JSON)
โ†“
Drift Report (findings + score)

Drift Typesโ€‹

TypeSeverityDescription
PHANTOM_TOOLCriticalTool executed at runtime but not declared in static metadata
SCOPE_CREEPHighAgent accessed tools owned by another agent
FREQUENCY_ANOMALYMediumAbnormal invocation frequency (>50 calls/session)
POLICY_BYPASSHighRuntime invocation bypassed expected gateway policy

CLI Usageโ€‹

Basic Drift Detectionโ€‹

# Correlate static metadata against runtime traces
squireinterp observe scan-request.json --traces session.json

# Save drift report to file
squireinterp observe scan-request.json --traces session.json --output drift-report.json

The observe command exits with code 2 if drift findings are detected, enabling CI gate integration.

A/B Security Regressionโ€‹

# Generate regression eval spec from before/after violation sets
squireinterp eval-diff --before violations-main.json --after violations-pr.json

# Save to file
squireinterp eval-diff --before violations-main.json --after violations-pr.json --output regression.yaml

The eval-diff command:

  • Computes introduced, resolved, and persisted violations
  • Generates Testing Center YAML specs for regression tests
  • Assesses regression risk (none, low, medium, high)
  • Exits with code 2 on high-risk regressions

Drift Report Formatโ€‹

{
"generatedAt": "2026-04-24T10:00:00Z",
"agentName": "ServiceBot",
"tracesAnalyzed": 5,
"summary": {
"totalFindings": 2,
"bySeverity": { "critical": 1, "medium": 1 },
"byType": { "PHANTOM_TOOL": 1, "FREQUENCY_ANOMALY": 1 },
"phantomToolCount": 1,
"scopeCreepCount": 0,
"driftScore": 0.65
},
"staticCoverage": {
"totalRuntimeTools": 10,
"staticallyCovered": 8,
"coveragePercentage": 80.0,
"uncoveredTools": ["Delete_All_Records", "Custom_API_Call"]
},
"findings": [...]
}

Drift Scoreโ€‹

The drift score ranges from 0.0 (clean) to 1.0 (severe drift):

ScoreMeaning
0.0Perfect alignment between static and runtime
0.0โ€“0.3Minor drift, likely benign
0.3โ€“0.6Moderate drift, investigate
0.6โ€“1.0Severe drift, likely security issue

OTel Trace Formatโ€‹

SquireX expects traces in this JSON format:

[{
"traceId": "abc123",
"sessionId": "session-1",
"agentName": "ServiceBot",
"startTime": "2026-04-24T10:00:00Z",
"endTime": "2026-04-24T10:05:00Z",
"spans": [
{
"spanId": "span-1",
"name": "Submit_Case",
"kind": "tool_call",
"startTime": "2026-04-24T10:00:01Z",
"duration": 150000000,
"status": "ok",
"attributes": { "object": "Case" }
}
]
}]

Span Kindsโ€‹

KindWhat It Represents
tool_callTool invocation (function, plugin)
action_invokeGenAiFunction action execution
llm_requestLLM API call (ignored for drift detection)
http_calloutExternal HTTP request