Runtime Observability

SquireX can detect runtime capability drift by correlating OpenTelemetry (OTel) session traces against your static metadata. This bridges the gap between what your agent should be able to do (metadata) and what it actually does at runtime.

How It Works

Static Metadata (scan-request.json)
        ↓
   Semantic Graph → Declared tool capabilities
        ↕
   Correlation Engine → Drift detection
        ↑
Runtime Traces (OTel JSON)
        ↓
   Drift Report (findings + score)

Drift Types

Type	Severity	Description
`PHANTOM_TOOL`	Critical	Tool executed at runtime but not declared in static metadata
`SCOPE_CREEP`	High	Agent accessed tools owned by another agent
`FREQUENCY_ANOMALY`	Medium	Abnormal invocation frequency (>50 calls/session)
`POLICY_BYPASS`	High	Runtime invocation bypassed expected gateway policy

CLI Usage

Basic Drift Detection

# Correlate static metadata against runtime traces
squireinterp observe scan-request.json --traces session.json

# Save drift report to file
squireinterp observe scan-request.json --traces session.json --output drift-report.json

The observe command exits with code 2 if drift findings are detected, enabling CI gate integration.

A/B Security Regression

# Generate regression eval spec from before/after violation sets
squireinterp eval-diff --before violations-main.json --after violations-pr.json

# Save to file
squireinterp eval-diff --before violations-main.json --after violations-pr.json --output regression.yaml

The eval-diff command:

Computes introduced, resolved, and persisted violations
Generates Testing Center YAML specs for regression tests
Assesses regression risk (none, low, medium, high)
Exits with code 2 on high-risk regressions

Drift Report Format

{
  "generatedAt": "2026-04-24T10:00:00Z",
  "agentName": "ServiceBot",
  "tracesAnalyzed": 5,
  "summary": {
    "totalFindings": 2,
    "bySeverity": { "critical": 1, "medium": 1 },
    "byType": { "PHANTOM_TOOL": 1, "FREQUENCY_ANOMALY": 1 },
    "phantomToolCount": 1,
    "scopeCreepCount": 0,
    "driftScore": 0.65
  },
  "staticCoverage": {
    "totalRuntimeTools": 10,
    "staticallyCovered": 8,
    "coveragePercentage": 80.0,
    "uncoveredTools": ["Delete_All_Records", "Custom_API_Call"]
  },
  "findings": [...]
}

Drift Score

The drift score ranges from 0.0 (clean) to 1.0 (severe drift):

Score	Meaning
0.0	Perfect alignment between static and runtime
0.0–0.3	Minor drift, likely benign
0.3–0.6	Moderate drift, investigate
0.6–1.0	Severe drift, likely security issue

OTel Trace Format

SquireX expects traces in this JSON format:

[{
  "traceId": "abc123",
  "sessionId": "session-1",
  "agentName": "ServiceBot",
  "startTime": "2026-04-24T10:00:00Z",
  "endTime": "2026-04-24T10:05:00Z",
  "spans": [
    {
      "spanId": "span-1",
      "name": "Submit_Case",
      "kind": "tool_call",
      "startTime": "2026-04-24T10:00:01Z",
      "duration": 150000000,
      "status": "ok",
      "attributes": { "object": "Case" }
    }
  ]
}]

Span Kinds

Kind	What It Represents
`tool_call`	Tool invocation (function, plugin)
`action_invoke`	GenAiFunction action execution
`llm_request`	LLM API call (ignored for drift detection)
`http_callout`	External HTTP request

How It Works​

Drift Types​

CLI Usage​

Basic Drift Detection​

A/B Security Regression​

Drift Report Format​

Drift Score​

OTel Trace Format​

Span Kinds​