Runtime Observability
SquireX can detect runtime capability drift by correlating OpenTelemetry (OTel) session traces against your static metadata. This bridges the gap between what your agent should be able to do (metadata) and what it actually does at runtime.
How It Worksโ
Static Metadata (scan-request.json)
โ
Semantic Graph โ Declared tool capabilities
โ
Correlation Engine โ Drift detection
โ
Runtime Traces (OTel JSON)
โ
Drift Report (findings + score)
Drift Typesโ
| Type | Severity | Description |
|---|---|---|
PHANTOM_TOOL | Critical | Tool executed at runtime but not declared in static metadata |
SCOPE_CREEP | High | Agent accessed tools owned by another agent |
FREQUENCY_ANOMALY | Medium | Abnormal invocation frequency (>50 calls/session) |
POLICY_BYPASS | High | Runtime invocation bypassed expected gateway policy |
CLI Usageโ
Basic Drift Detectionโ
# Correlate static metadata against runtime traces
squireinterp observe scan-request.json --traces session.json
# Save drift report to file
squireinterp observe scan-request.json --traces session.json --output drift-report.json
The observe command exits with code 2 if drift findings are detected, enabling CI gate integration.
A/B Security Regressionโ
# Generate regression eval spec from before/after violation sets
squireinterp eval-diff --before violations-main.json --after violations-pr.json
# Save to file
squireinterp eval-diff --before violations-main.json --after violations-pr.json --output regression.yaml
The eval-diff command:
- Computes introduced, resolved, and persisted violations
- Generates Testing Center YAML specs for regression tests
- Assesses regression risk (
none,low,medium,high) - Exits with code 2 on high-risk regressions
Drift Report Formatโ
{
"generatedAt": "2026-04-24T10:00:00Z",
"agentName": "ServiceBot",
"tracesAnalyzed": 5,
"summary": {
"totalFindings": 2,
"bySeverity": { "critical": 1, "medium": 1 },
"byType": { "PHANTOM_TOOL": 1, "FREQUENCY_ANOMALY": 1 },
"phantomToolCount": 1,
"scopeCreepCount": 0,
"driftScore": 0.65
},
"staticCoverage": {
"totalRuntimeTools": 10,
"staticallyCovered": 8,
"coveragePercentage": 80.0,
"uncoveredTools": ["Delete_All_Records", "Custom_API_Call"]
},
"findings": [...]
}
Drift Scoreโ
The drift score ranges from 0.0 (clean) to 1.0 (severe drift):
| Score | Meaning |
|---|---|
| 0.0 | Perfect alignment between static and runtime |
| 0.0โ0.3 | Minor drift, likely benign |
| 0.3โ0.6 | Moderate drift, investigate |
| 0.6โ1.0 | Severe drift, likely security issue |
OTel Trace Formatโ
SquireX expects traces in this JSON format:
[{
"traceId": "abc123",
"sessionId": "session-1",
"agentName": "ServiceBot",
"startTime": "2026-04-24T10:00:00Z",
"endTime": "2026-04-24T10:05:00Z",
"spans": [
{
"spanId": "span-1",
"name": "Submit_Case",
"kind": "tool_call",
"startTime": "2026-04-24T10:00:01Z",
"duration": 150000000,
"status": "ok",
"attributes": { "object": "Case" }
}
]
}]
Span Kindsโ
| Kind | What It Represents |
|---|---|
tool_call | Tool invocation (function, plugin) |
action_invoke | GenAiFunction action execution |
llm_request | LLM API call (ignored for drift detection) |
http_callout | External HTTP request |