You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OpenShell enforces network policy at L4 (allow/deny by host:port) and L7 (method/path/query for REST, operation-type/fields for GraphQL). Neither layer inspects the content of request or response bodies for security-relevant signals like prompt injection, PII leakage, sensitive data exfiltration, or adversarial payloads.
Agents operating inside sandboxes receive LLM responses and make outbound API calls. An attacker who controls upstream content (e.g., a poisoned web page fetched by the agent, a malicious tool response, or a compromised API) can embed prompt injection payloads in responses. Conversely, a compromised or misguided agent can exfiltrate sensitive data in outbound request bodies. Today, neither vector is visible to the policy layer.
The inference proxy already buffers request/response bodies for GraphQL inspection (#1022) and credential injection (#689). This proposal adds a general-purpose content inspection hook system that lets operators run external scripts/classifiers against L7 traffic inline, per-route.
Relationship to Privacy Router
The Privacy Router (#1043) and content inspection hooks are complementary, not overlapping:
Privacy Router answers: where should this traffic go? It routes inference requests to local or external providers based on data sensitivity, PII classification, and operator policy. It controls the destination.
Content inspection hooks answer: should this traffic flow at all? They inspect request/response bodies for adversarial content (prompt injection), data exfiltration (secrets, PII in outbound calls), and policy violations. They gate the traffic.
A deployment might use both: the Privacy Router ensures sensitive prompts stay on a local NIM deployment, while content filters block prompt injection payloads in responses regardless of which provider served them. The router can't catch prompt injection (it classifies sensitivity, not adversarial intent), and content filters don't decide routing (they allow or deny, not redirect).
They share an interest in body content but serve different security objectives — routing policy vs. content policy.
Prior Art: fullsend Security Pipeline
The fullsend project has a production-grade, multi-layered security pipeline that validates this approach and should inform the design:
Input pipeline (InputPipeline): UnicodeNormalizer → ContextInjectionScanner — runs before untrusted text enters agent processing. Output pipeline (OutputPipeline): SecretRedactor — runs before agent-generated text is posted to external APIs.
canary_pretool.py — detects canary token exfiltration in tool inputs
secret_redact_posttool.py — redacts secrets from tool output before the LLM sees them
unicode_posttool.py — normalizes unicode in tool output
Experimental validation (experiments/guardrails-eval/): evaluated LLM Guard (DeBERTa-v3), NeMo Guardrails (YARA), and Model Armor (GCP). Key finding: ML sentence-level scanning + regex pattern matching is complementary — ML catches social engineering, regex catches structural attacks. Neither alone is sufficient.
Proposed Design
Core concept
Add a content_filters field to L7 endpoint policy. Each filter references an executable script that the supervisor (not the sandbox) runs against request and/or response bodies. Scripts receive body content on stdin and signal allow (exit 0) or deny (exit 1 + reason on stdout). The supervisor short-circuits with a 403 and the denial reason if any filter denies.
Scripts live outside the sandbox
Filter scripts run in the supervisor process context, not inside the agent container. This is a hard requirement — if the agent can modify the scripts that inspect it, the entire mechanism is bypassable. Scripts are mounted from the host or baked into the supervisor image, never from the sandbox filesystem.
Two inspection modes
Synchronous (outbound requests): The supervisor buffers the request body, pipes it to each filter script sequentially, and only forwards upstream if all filters pass. This catches sensitive data exfiltration and prompt injection in outbound calls before they leave the sandbox boundary.
Async streaming (inbound responses): For SSE/streaming inference responses, buffering the full response before returning it to the agent kills latency. Instead:
Proxy chunks through to the agent in real-time.
Simultaneously accumulate chunks and pipe to the filter script(s) asynchronously.
If a filter flags content mid-stream, sever the connection — inject an SSE error frame and close the stream.
Optionally: accumulate to a temp file outside the sandbox, run the full scan on completion, and only then decide whether to persist/allow the result.
The tradeoff: the agent may see partial content before denial. For prompt injection this is acceptable — the dangerous part is the agent acting on injected instructions, not reading partial tokens. Severing the stream causes most agent frameworks to treat the response as failed and not act on it.
Based on fullsend's production pipeline and experimental results, the recommended default filter stack for OpenShell would be:
UnicodeNormalizer (both directions, fast) — strip invisible characters, bidi overrides, tag chars before any other scanner sees the content. Pre-processing stage, not a deny gate.
ContextInjectionScanner (response direction, regex) — 27 patterns covering instruction override, credential exfiltration, hidden content, execution-via-translation. Fast, deterministic, zero false positives on known patterns.
ONNXGuardScanner (response direction, ML) — DeBERTa-v3 sentence-level classification for social engineering and indirect prompt injection that regex won't catch. Configurable threshold (default 0.92).
SecretRedactor (request direction, regex) — prevent exfiltration of API keys, tokens, private keys, DB strings in outbound requests. 20+ prefix patterns + structural matching.
The ML + regex combination is critical: fullsend's evaluation showed ML alone misses structural attacks (unicode tricks, encoded exfiltration) while regex alone misses social engineering and indirect injection.
Observability
Every filter execution must be fully auditable. Operators, security teams, and compliance workflows need to see what was inspected, what was flagged, and what was allowed through.
Every filter execution emits an OCSF event, regardless of outcome:
Outcome
OCSF event
Severity
What is logged
Allow
HttpActivityBuilder
Informational
Filter name, script path, direction, host:port, execution time, body size
Never log body content in OCSF events. Body bytes may contain secrets, PII, or credentials. Log a SHA-256 hash of the body for correlation, not the content itself. The OCSF JSONL file may be shipped to external systems.
Always log execution time. Filter latency is critical for debugging proxy performance. Emit filter_duration_ms on every event.
Correlation ID. Each request/response pair gets a unique ID so allow/deny decisions on the same HTTP transaction can be correlated across request-side and response-side filter events.
Filter metrics (execution count, deny rate, p99 latency per script) should be exposed as Prometheus-compatible metrics if the sandbox metrics endpoint is enabled, enabling alerting on filter degradation.
Alternatives Considered
In-sandbox filters (readonly mount): Simpler deployment but weaker security boundary. A sandbox escape or container breakout could tamper with the scripts. Rejected in favor of supervisor-side execution.
Built-in classifier (compiled into supervisor): Lower latency but rigid. Operators can't customize detection rules, add domain-specific patterns, or swap classifiers without rebuilding the supervisor. The script interface lets operators iterate without image rebuilds. However, a compiled ONNX runtime (as fullsend does with hugot) could be offered as a built-in fast-path option alongside the script interface.
Gateway-side inspection: The gateway doesn't have body bytes — it receives gRPC metadata from the sandbox. Moving inspection to the gateway would require streaming body content over gRPC, adding significant complexity. The supervisor already has the bytes in flight.
Buffered-only (no streaming mode): Simpler but kills inference latency. Agents routinely use streaming for LLM calls — buffering a 30-second generation to scan it would break interactive workflows. The async streaming mode preserves responsiveness at the cost of partial exposure.
OPA-only (Rego rules on body content): OPA is not designed for arbitrary text classification. Pattern matching in Rego is limited to regex.match — no subprocess execution, no ML model calls. OPA remains the policy decision point; content filters are a pre-processing stage.
Merge with Privacy Router: The Privacy Router (Privacy Router #1043) classifies data sensitivity for routing decisions (local vs. external provider). Content filters classify adversarial intent and data exfiltration for allow/deny decisions. They share interest in body content but serve different security objectives. Keeping them separate avoids coupling routing logic to content scanning logic.
Fire-and-forget audit-only mode: Log but don't block. Useful for gradual rollout — could be added as an enforcement: audit option on individual filters. But insufficient standalone for prompt injection and exfiltration which require active blocking.
Agent Investigation
Codebase surveyed prior to filing:
L7 body buffering exists:crates/openshell-sandbox/src/l7/rest.rs implements parse_body_length for Content-Length and chunked bodies. GraphQL inspection (GraphQL-aware L7 inspection: operation-type and field-level policy rules #1022) buffers up to 64 KiB. The bytes are already available at the point where filters would run.
Streaming inference proxy exists: The inference proxy in crates/openshell-sandbox/src/proxy.rs handles SSE chunk forwarding. Async filter mode would hook into this path.
OCSF dual-emit pattern exists:DetectionFindingBuilder is used for security findings (nonce replay, bypass detection). Content filter denials follow the same pattern.
OCSF shorthand layer exists: Shorthand log format is auto-derived from builder fields — filter events get human-readable log lines automatically.
No content inspection today:grep -rn "content_filter\|body_scan\|body_inspect\|prompt_inject" crates/openshell-sandbox/ returns zero hits.
Supervisor runs outside sandbox: The supervisor process runs in the host/pod context, not inside the agent container. Scripts executed by the supervisor are inaccessible to the agent.
fullsend prior art: The fullsend project ships a production input/output security pipeline with regex-based injection scanning, ONNX ML classification, unicode normalization, secret redaction, and SSRF validation. Experimental evaluation confirmed ML + regex complementarity.
Privacy Router is complementary:Privacy Router #1043 handles sensitivity-based routing decisions (where to send). Content filters handle adversarial content and exfiltration decisions (whether to send). Both inspect body content but for different security objectives.
Problem Statement
OpenShell enforces network policy at L4 (allow/deny by host:port) and L7 (method/path/query for REST, operation-type/fields for GraphQL). Neither layer inspects the content of request or response bodies for security-relevant signals like prompt injection, PII leakage, sensitive data exfiltration, or adversarial payloads.
Agents operating inside sandboxes receive LLM responses and make outbound API calls. An attacker who controls upstream content (e.g., a poisoned web page fetched by the agent, a malicious tool response, or a compromised API) can embed prompt injection payloads in responses. Conversely, a compromised or misguided agent can exfiltrate sensitive data in outbound request bodies. Today, neither vector is visible to the policy layer.
The inference proxy already buffers request/response bodies for GraphQL inspection (#1022) and credential injection (#689). This proposal adds a general-purpose content inspection hook system that lets operators run external scripts/classifiers against L7 traffic inline, per-route.
Relationship to Privacy Router
The Privacy Router (#1043) and content inspection hooks are complementary, not overlapping:
A deployment might use both: the Privacy Router ensures sensitive prompts stay on a local NIM deployment, while content filters block prompt injection payloads in responses regardless of which provider served them. The router can't catch prompt injection (it classifies sensitivity, not adversarial intent), and content filters don't decide routing (they allow or deny, not redirect).
They share an interest in body content but serve different security objectives — routing policy vs. content policy.
Prior Art: fullsend Security Pipeline
The fullsend project has a production-grade, multi-layered security pipeline that validates this approach and should inform the design:
Input pipeline (
InputPipeline): UnicodeNormalizer → ContextInjectionScanner — runs before untrusted text enters agent processing.Output pipeline (
OutputPipeline): SecretRedactor — runs before agent-generated text is posted to external APIs.Key scanners:
Additionally, fullsend ships runtime hooks (Python scripts as PreToolUse/PostToolUse hooks):
tirith_check.py— Tirith CLI for static command injection + unicode trick detectioncanary_pretool.py— detects canary token exfiltration in tool inputssecret_redact_posttool.py— redacts secrets from tool output before the LLM sees themunicode_posttool.py— normalizes unicode in tool outputExperimental validation (
experiments/guardrails-eval/): evaluated LLM Guard (DeBERTa-v3), NeMo Guardrails (YARA), and Model Armor (GCP). Key finding: ML sentence-level scanning + regex pattern matching is complementary — ML catches social engineering, regex catches structural attacks. Neither alone is sufficient.Proposed Design
Core concept
Add a
content_filtersfield to L7 endpoint policy. Each filter references an executable script that the supervisor (not the sandbox) runs against request and/or response bodies. Scripts receive body content on stdin and signal allow (exit 0) or deny (exit 1 + reason on stdout). The supervisor short-circuits with a 403 and the denial reason if any filter denies.Scripts live outside the sandbox
Filter scripts run in the supervisor process context, not inside the agent container. This is a hard requirement — if the agent can modify the scripts that inspect it, the entire mechanism is bypassable. Scripts are mounted from the host or baked into the supervisor image, never from the sandbox filesystem.
Two inspection modes
Synchronous (outbound requests): The supervisor buffers the request body, pipes it to each filter script sequentially, and only forwards upstream if all filters pass. This catches sensitive data exfiltration and prompt injection in outbound calls before they leave the sandbox boundary.
Async streaming (inbound responses): For SSE/streaming inference responses, buffering the full response before returning it to the agent kills latency. Instead:
The tradeoff: the agent may see partial content before denial. For prompt injection this is acceptable — the dangerous part is the agent acting on injected instructions, not reading partial tokens. Severing the stream causes most agent frameworks to treat the response as failed and not act on it.
Policy surface
script: Absolute path on the supervisor filesystem. Must be executable. Not accessible from inside the sandbox.direction: Which body to inspect —request(outbound),response(inbound), orboth.timeout_ms: Per-script execution timeout. Prevents slow classifiers from blocking the proxy indefinitely.on_timeout: Fail-closed (deny, default) or fail-open (allow) when the script exceeds its timeout.Script interface
"Prompt injection: instruction override pattern detected"). On allow (exit 0), stdout is ignored.on_timeout: deny).OPENSHELL_FILTER_HOST,OPENSHELL_FILTER_PORT,OPENSHELL_FILTER_METHOD,OPENSHELL_FILTER_PATH,OPENSHELL_FILTER_DIRECTION(request/response).Recommended Filter Stack
Based on fullsend's production pipeline and experimental results, the recommended default filter stack for OpenShell would be:
The ML + regex combination is critical: fullsend's evaluation showed ML alone misses structural attacks (unicode tricks, encoded exfiltration) while regex alone misses social engineering and indirect injection.
Observability
Every filter execution must be fully auditable. Operators, security teams, and compliance workflows need to see what was inspected, what was flagged, and what was allowed through.
Every filter execution emits an OCSF event, regardless of outcome:
HttpActivityBuilderHttpActivityBuilder+DetectionFindingBuilder(dual-emit)HttpActivityBuilder+DetectionFindingBuilder(dual-emit)on_timeoutaction takenHttpActivityBuilder+DetectionFindingBuilder(dual-emit)DetectionFindingBuilderKey observability constraints:
filter_duration_mson every event.filter.name(script basename),filter.script_path,filter.direction,filter.timeout_ms,filter.exit_code,filter.duration_ms,filter.body_size_bytes,filter.body_hash(SHA-256),filter.denial_reason(on deny only).CONTENT_FILTER DENY injection-scan.sh response api.openai.com:443 "instruction override pattern" 12msorCONTENT_FILTER ALLOW onnx-guard.sh response api.openai.com:443 45ms.Integration with existing observability:
openshell policy denialsand the TUI log viewer.Alternatives Considered
regex.match— no subprocess execution, no ML model calls. OPA remains the policy decision point; content filters are a pre-processing stage.enforcement: auditoption on individual filters. But insufficient standalone for prompt injection and exfiltration which require active blocking.Agent Investigation
Codebase surveyed prior to filing:
crates/openshell-sandbox/src/l7/rest.rsimplementsparse_body_lengthfor Content-Length and chunked bodies. GraphQL inspection (GraphQL-aware L7 inspection: operation-type and field-level policy rules #1022) buffers up to 64 KiB. The bytes are already available at the point where filters would run.crates/openshell-sandbox/src/proxy.rshandles SSE chunk forwarding. Async filter mode would hook into this path.DetectionFindingBuilderis used for security findings (nonce replay, bypass detection). Content filter denials follow the same pattern.grep -rn "content_filter\|body_scan\|body_inspect\|prompt_inject" crates/openshell-sandbox/returns zero hits.