Agent Trust Architecture

How Recursiv governs autonomous AI agents for regulated and enterprise environments.

Last updated: 2026-04-10

Overview

Recursiv provides agent orchestration infrastructure for enterprises deploying autonomous AI systems. This document describes the trust architecture that governs how agents access tools, act on data, and make decisions within the platform.

Our approach is informed by emerging enterprise trust frameworks, including zero trust principles applied to data quality and AI governance, as described in recent work by researchers at the SPARK AI Consortium at UC San Diego (Short, 2025; Massa & Short, 2025). We apply these principles specifically to the agent execution layer: every agent tool call is verified against policy, every action is audited, and the system defaults to denial when trust cannot be established.

Core Principles

1. Never Trust, Always Verify

No agent tool execution is trusted by default. Every tool call passes through a policy verification layer before execution. Tools that have not been explicitly classified and permitted are blocked. There is no implicit trust path.

This aligns with zero trust architectures where trust is not an assumed attribute but a continuously verified condition. In the context of agent orchestration, this means:

Every tool is classified before an agent can access it
Every execution is checked against the active policy at call time
Unrecognized tools are denied (fail-closed)
Policy changes take effect immediately for all subsequent tool calls

2. Classify, Then Govern

Every tool available to an agent is classified across multiple dimensions before it enters the system. These classifications drive policy enforcement automatically.

Dimension	Values	Purpose
Trust Tier	native, self_hosted, third_party_processor, public_protocol	Where the tool executes and who controls it
Data Sensitivity	public, internal, confidential, regulated	What kind of data the tool accesses or produces
Export Boundary	none, org_internal, third_party_processor, public_network	Whether data leaves the organization
Consent Requirement	none, org_admin, end_user, case_by_case	Who must approve before the tool can be used
Retention Class	ephemeral, operational, audit, user_content	How long execution artifacts are retained
Scope	project, organization, network, platform	Blast radius of the tool’s effects
Audience	customer_safe, approval_required, internal_only	Who should have access to this tool

These classifications are defined in code, version-controlled, and auditable. They are not configurable by the agent.

3. Fail Closed

If a tool is not mapped to a known classification, it is blocked. The system does not default to permissive access. This is enforced at the tool registration layer: unclassified tools are filtered out before the agent model ever sees them.

This means an agent cannot discover and execute an arbitrary tool. The set of available tools is deterministic, policy-governed, and auditable.

4. Audit Everything

Every agent action produces audit records across multiple dimensions:

Tool Execution Log: Every tool invocation records the agent, tool name, success/failure status, error details, conversation context, and timestamp.

Prompt Guard Log: Every user message is scanned for prompt injection. Results are logged with threat level (clean/suspicious/hostile), detected patterns, and action taken (passed/sanitized/blocked). Raw message content is never logged for privacy.

Agent Configuration Change Log: Every change to an agent’s configuration (model, system prompt, tool mode, guardrails, budget) is recorded with the old value, new value, who made the change, and why.

AI Usage Tracking: Every LLM inference call records provider, model, token counts, and cost for billing reconciliation and usage analysis.

Task Activity Log: For orchestrated multi-step workflows, every task state transition (created, claimed, completed, audited) is logged with the acting agent and detail.

Tool Policy Enforcement

Three-Level Permission Model

Every tool operates under one of three permission levels:

Permission	Behavior	Use Case
`auto`	Tool executes immediately when called by the agent	Low-risk read operations, web search, memory recall
`approval`	Tool execution is queued for human review and approval	Data mutations, external communications, deployments
`off`	Tool is removed from the agent’s available toolset entirely	Disabled capabilities, restricted environments

Policy Resolution

Policies are resolved at three levels of specificity:

Per-tool overrides (most specific): A specific tool can be set to any permission level
Per-bundle overrides: A category of tools (e.g., all database tools) can be set
Default bundle policy (least specific): The built-in default for each tool category

The most specific applicable policy wins. This allows organizations to customize agent capabilities while maintaining safe defaults.

Enforcement Architecture

Policy enforcement is implemented as a wrapper around the tool definition itself. Before the agent model receives its available tools, each tool passes through wrapToolWithProjectPolicy:

If the policy is off, the tool is removed from the toolset. The agent never sees it.
If the policy is approval, the tool’s execute function is replaced with an approval gate that queues the request and notifies the user.
If the policy is auto, the tool passes through unchanged.

This architecture ensures that policy enforcement cannot be bypassed by the agent. The policy layer sits between the tool registry and the model, not between the model and the execution.

Hardened Tool Categories

Certain tool categories enforce approval regardless of the agent’s tool mode setting:

Email sending (send_email_to_human): Always requires human approval. The agent cannot send email autonomously under any configuration.
Code execution: Gated by sandbox provisioning checks and budget tier verification.
Database mutations: Default to approval-required for write operations.
Deployments: Require explicit approval before triggering production deployments.

Human-in-the-Loop Controls

Approval Gate System

When a tool requires approval, the system:

Creates a pending execution record with the tool name, parameters, agent identity, and conversation context
Sends a real-time WebSocket notification to the supervising user
Returns a structured response to the agent instructing it to wait
The agent cannot proceed until the human approves or denies

Approval records include:

Argument hash (SHA-256) for deduplication without exposing sensitive parameters
Organization scope ensuring approvals are isolated between tenants
Expiry enforcement (pending requests expire after 10-60 minutes if not acted on)
Denial reason capture when a request is rejected

Agent Tool Modes

Each agent operates in one of three modes that interact with the tool policy system:

Mode	Behavior
`chat_only`	Agent can only converse. All tools are blocked. Tool descriptions are provided so the agent can explain what it would do, but execution is prevented.
`permission`	All write tools require human approval. Read tools execute immediately.
`autonomous`	Tools execute according to their bundle policy. Hardened categories (email, deployment) still require approval.

Isolation and Containment

Multi-Tenant Isolation

All agent operations are scoped to a network and optionally an organization. Data cannot leak between tenants:

API keys are scoped to organizations
Agents are bound to owners and organizations
Database queries are filtered by network/organization context
Approval decisions are org-scoped

Sandbox Isolation

Code execution runs in VM-level sandboxes (not containers) with:

1-hour maximum lifetime
Per-user and per-org concurrency caps
No access to platform credentials, database connections, or API keys
Each agent gets its own sandbox keyed by project and agent ID

Rate Limiting and Budget Controls

Agent activity is bounded by multiple overlapping controls:

Per-agent daily request limits (configurable, default 100)
Per-user, per-org, and per-key API rate limits
Budget-based throttling that degrades agent capabilities as spending approaches limits
Per-request cost caps ($10 max) as runaway protection

Prompt Injection Defense

Every user message processed by an agent is scanned by the Prompt Guard service before the agent model receives it. The system:

Detects known injection patterns using pattern matching
Classifies threats as clean, suspicious, or hostile
Takes action: pass, sanitize, or block
Logs the result without storing raw message content (privacy preservation)

This provides a defense layer between untrusted user input and agent tool execution.

Alignment with Enterprise Trust Frameworks

Zero Trust Principles

Our agent trust architecture applies zero trust principles as described in NIST SP 800-207 and as extended to data quality contexts by researchers at the SPARK AI Consortium (Massa & Short, “Meeting Agentic AI’s Data Quality Needs with Zero Trust Data Quality,” 2025):

Zero Trust Principle	Recursiv Implementation
Never trust, always verify	Every tool call verified against policy at execution time
Assume breach	Prompt injection scanning on every message; fail-closed tool defaults
Least privilege access	Tools scoped by bundle; agents see only permitted tools
Continuous verification	Policy checked at every tool call, not just at agent creation
Explicit authorization	Approval gates require affirmative human action for sensitive operations

Governance Automation

Consistent with research on automating data quality governance decisions (Massa & Short, “Scientifically Automating Data Quality Decisions with AI Explainability Weights,” 2025), our platform automates governance decisions that would otherwise require manual intervention:

Tool permission decisions are automated based on pre-defined policy classifications
Budget-based capability degradation happens automatically without human intervention
Rate limit enforcement is continuous and atomic
Approval expiry is automatic (no stale pending requests)

Areas of Active Development

We are actively developing enhancements informed by enterprise trust research:

Threshold-based policies: Extending the current binary permission model (auto/approval/off) to support confidence-weighted decisions where agent autonomy scales with input data quality
Data freshness verification: Systematic validation of input data recency before agent tool execution
Decision justification capture: Structured reasoning attached to both automated and human-approved decisions for regulatory audit support

Compliance Posture

Recursiv maintains a comprehensive compliance program aligned with SOC 2 Type II Trust Services Criteria:

15 formal security and compliance policies mapped to SOC 2 controls (CC1-CC9, A1, C1, P4-P6)
Quarterly automated evidence collection
Formal risk register with 10 tracked risks, mitigations, and residual risk assessments
Semi-annual policy review cycle with automated deadline enforcement
Internal security audits with prioritized remediation tracking

For detailed compliance documentation, see our Security Policies and Operations Documentation.

References

NIST Special Publication 800-207, “Zero Trust Architecture” (2020)
Massa, J. & Short, J.E., “Meeting Agentic AI’s Data Quality Needs with Zero Trust Data Quality,” SPARK AI Consortium Executive Briefing Vol. 1 No. 2 (2025)
Massa, J. & Short, J.E., “Scientifically Automating Data Quality Decisions with AI Explainability Weights,” SPARK AI Consortium Executive Briefing Vol. 1 No. 4 (2025)
Short, J., “Is AI Governable? Industry Perspectives on the Adoption, Effectiveness and Accountability of Frontier AI,” SPARK AI Working Paper (2025)

Contact

For enterprise security inquiries: security@recursiv.io For compliance documentation requests: compliance@recursiv.io