Skip to content

Policy & Threats

CyberCage protects your AI development environment by detecting and blocking security threats in MCP (Model Context Protocol) traffic. You control what to protect through policies that define which threats to monitor and how to respond.

How It Works

CyberCage inspects all MCP communication between your AI assistants and MCP servers. When suspicious activity is detected:

  1. Threat Detection - The system identifies potential security issues using both pattern-based detection and AI analysis
  2. Policy Check - Your configured policies determine whether to allow or block the activity
  3. Response - Actions are blocked or allowed based on policy, with full logging for audit and investigation

All activity is logged in the dashboard for security review, whether blocked or allowed.

What CyberCage Protects Against

CyberCage detects and blocks threats across 11 categories:

AI-Specific Threats

Prompt Injection Detects attempts to manipulate your AI assistant's behavior, including jailbreak attempts, instruction overrides, and context poisoning.

Tool Poisoning Identifies malicious MCP tools or attempts to modify legitimate tool behavior.

Credential & Data Theft

Credential Exfiltration Catches attempts to steal SSH keys, cloud credentials (AWS, GCP, Azure), API tokens, browser credentials, and other secrets.

Data Exfiltration Detects unauthorized data extraction including file theft, database dumps, and encoded data transfers.

Execution & System Attacks

Code Execution Blocks unauthorized code execution attempts including shell injection, eval/exec patterns, and reverse shells.

System Tampering Identifies modifications to system files, security settings, logs, and package managers.

Persistence Detects backdoor installation, cron jobs, SSH key manipulation, and service installations designed to maintain access.

Privilege Escalation Catches attempts to gain elevated privileges through SUID exploitation, sudo abuse, or container escapes.

Process Injection Identifies code injection into running processes through various techniques.

Kernel Modification Detects rootkits, kernel module loading, and other kernel-level attacks.

Defense Evasion Catches obfuscation, proxy tunneling, history disabling, and other anti-detection techniques.

Policies: Control Your Protection

Policies determine which threats CyberCage actively monitors and how it responds when threats are detected.

How Policies Work

CyberCage provides a catalog of threat detection policies. Your organization chooses which policies to enable based on your security needs. Each policy:

  • Targets a specific threat category (e.g., credential theft, prompt injection)
  • Has a severity level (Critical, High, Medium, Low, Info)
  • Can be configured to DENY (block) or ALLOW

Policy Presets

Choose a preset configuration when setting up your organization:

  • Essential Only - Critical threats only (minimal protection, fewer alerts)
  • Recommended - Critical + High threats (balanced security)
  • Maximum Protection - All threat categories (strongest security)
  • Custom - Select individual policies for your specific needs

By default, new organizations enable Critical and High severity policies, providing protection against credential theft, data exfiltration, code execution, privilege escalation, prompt injection, and tool poisoning.

Managing Policies

From the dashboard, you can:

  • Enable or disable policies at any time
  • View how often each policy has triggered
  • See which threats each policy has caught
  • Add policies from the catalog as your needs evolve

When Threats Are Detected

CyberCage responds immediately based on your policy configuration:

Policy ActionWhat Happens
DENYBlocks the request and logs details to the dashboard
ALLOWPermits the request and logs the event for audit

Every threat detection is logged with full context including what was detected, the threat category, and the complete request details. Use the dashboard to investigate threats, understand what triggered detection, and adjust policies as needed.

Investigating Threats

The dashboard provides detailed threat reports showing:

  • What threat was detected and why
  • The threat category and severity
  • Full request and response details
  • When it occurred and which user/application was involved

See the Threat Investigation Guide for detailed workflows on analyzing and responding to threats.

Reducing False Positives

CyberCage is designed to minimize false positives while catching real threats. If a policy blocks legitimate activity:

  1. Review the threat report in the dashboard to understand what triggered detection
  2. Temporarily disable the policy if needed
  3. Adjust policy settings or work with your security team to tune detection
  4. Consider whether the activity should be an exception

Next Steps

Built in Berlin, DE 🇩🇪