Frame 15.png

Overview

The problem

When an AI agent calls DELETE FROM users or reads /etc/passwd, your LLM's output looks completely clean. It says "processing your request" while the damage is being done.

Text-level guardrails — Guardrails AI, NeMo, Constitutional AI — evaluate what the model outputs. None of them see the tool call arguments. None of them know what the agent is actually executing.

AgentGate intercepts at the Python execution layer, one step before any side effect occurs.

Text-level guardrails never inspect the arguments passed to tool functions. They cannot block what they cannot see. AgentGate was built to close that gap.

Architecture: Two-Tier Evaluation

Building a firewall at the execution layer means navigating a hard tradeoff: you cannot add seconds of latency to every tool call, but you also cannot rely on pure pattern matching for a problem as semantically rich as "is this tool call consistent with the declared task?"

AgentGate resolves this with a two-tier pipeline.

Agent decides to call a tool
         │
         ▼
┌─────────────────────────┐
│   AgentGate intercepts  │  ← before any side effect
└─────────────────────────┘
         │
         ▼
┌─────────────────────────┐
│  Tier 1 — static check  │  0.3ms avg · no API calls
│  SQL / filesystem / HTTP│  handles ~64% of decisions
│  analyzer + scope check │
└─────────────────────────┘
         │ ambiguous
         ▼
┌─────────────────────────┐
│  Tier 2 — LLM judge     │  separate GPT-4o-mini instance
│  5-dimension scoring    │  task consistency + scope auth
│  trajectory analysis    │  fail-closed: uncertainty = BLOCK
└─────────────────────────┘
         │
    ALLOW / BLOCK
         │
         ▼
   AuditLogger → Supabase (async · fire-and-forget · zero latency impact)

Tier 1: Static Analysis

Tier 1 handles clear cases fast. It runs entirely in-process, makes no network calls, and completes in under a millisecond on average. Three domain-specific analyzers do the heavy lifting.

SQL Analyzer. A three-pass approach catches both obvious and obfuscated attacks. Pass 1 is a fast regex scan for destructive keywords and dangerous patterns — string literals are stripped first to prevent false positives on queries like SELECT 'DROP TABLE' AS column_name. Pass 2 runs a full AST parse using sqlparse to catch multi-statement SQL hidden behind comments and mixed casing. Pass 3 is a post-AST scan that catches patterns the AST naively classifies as safe SELECT statements — UNION SELECT exfiltration, stored procedure execution via xp_cmdshell and sp_executesql, and variable injection via DECLARE @.

Filesystem Analyzer. The most technically involved component. Real path traversal attacks don't arrive as clean ../../etc/passwd strings — they arrive URL-encoded, double-encoded, or using overlong UTF-8 sequences where / is represented as the multi-byte sequence %c0%af. Python's standard urllib.parse.unquote_to_bytes re-encodes through UTF-8 before returning, which silently destroys overlong sequences and lets the attack through. The fix is a custom _raw_percent_decode function that operates directly on byte values without re-encoding, followed by an explicit overlong sequence replacement step before normalization. After decoding, path normalization runs, and the result is checked against system directories (/etc, /proc, /sys, /root) and sensitive file patterns (.env, .ssh, API key files, credential-named files).