Policy & Threats

CyberCage protects your AI development environment by detecting and blocking security threats in MCP (Model Context Protocol) traffic. You control what to protect through policies that define which threats to monitor and how to respond.

How It Works

CyberCage inspects all MCP communication between your AI assistants and MCP servers. When suspicious activity is detected:

Threat Detection - The system identifies potential security issues using both pattern-based detection and AI analysis
Policy Check - Your configured policies determine whether to allow or block the activity
Response - Actions are blocked or allowed based on policy, with full logging for audit and investigation

All activity is logged in the dashboard for security review, whether blocked or allowed.

What CyberCage Protects Against

CyberCage detects and blocks threats across 11 categories:

AI-Specific Threats

Prompt Injection Detects attempts to manipulate your AI assistant's behavior, including jailbreak attempts, instruction overrides, and context poisoning.

Tool Poisoning Identifies malicious MCP tools or attempts to modify legitimate tool behavior.

Credential & Data Theft

Credential Exfiltration Catches attempts to steal SSH keys, cloud credentials (AWS, GCP, Azure), API tokens, browser credentials, and other secrets.

Data Exfiltration Detects unauthorized data extraction including file theft, database dumps, and encoded data transfers.

Execution & System Attacks

Code Execution Blocks unauthorized code execution attempts including shell injection, eval/exec patterns, and reverse shells.

System Tampering Identifies modifications to system files, security settings, logs, and package managers.

Persistence Detects backdoor installation, cron jobs, SSH key manipulation, and service installations designed to maintain access.

Privilege Escalation Catches attempts to gain elevated privileges through SUID exploitation, sudo abuse, or container escapes.

Process Injection Identifies code injection into running processes through various techniques.

Kernel Modification Detects rootkits, kernel module loading, and other kernel-level attacks.

Defense Evasion Catches obfuscation, proxy tunneling, history disabling, and other anti-detection techniques.

Policies: Control Your Protection

Policies determine which threats CyberCage actively monitors and how it responds when threats are detected.

How Policies Work

CyberCage provides a catalog of threat detection policies. Your organization chooses which policies to enable based on your security needs. Each policy:

Targets a specific threat category (e.g., credential theft, prompt injection)
Has a severity level (Critical, High, Medium, Low, Info)
Can be configured to DENY (block) or ALLOW

Policy Presets

Choose a preset configuration when setting up your organization:

Essential Only - Critical threats only (minimal protection, fewer alerts)
Recommended - Critical + High threats (balanced security)
Maximum Protection - All threat categories (strongest security)
Custom - Select individual policies for your specific needs

By default, new organizations enable Critical and High severity policies, providing protection against credential theft, data exfiltration, code execution, privilege escalation, prompt injection, and tool poisoning.

Managing Policies

From the dashboard, you can:

Enable or disable policies at any time
View how often each policy has triggered
See which threats each policy has caught
Add policies from the catalog as your needs evolve

When Threats Are Detected

CyberCage responds immediately based on your policy configuration:

Policy Action	What Happens
DENY	Blocks the request and logs details to the dashboard
ALLOW	Permits the request and logs the event for audit

Every threat detection is logged with full context including what was detected, the threat category, and the complete request details. Use the dashboard to investigate threats, understand what triggered detection, and adjust policies as needed.

Investigating Threats

The dashboard provides detailed threat reports showing:

What threat was detected and why
The threat category and severity
Full request and response details
When it occurred and which user/application was involved

See the Threat Investigation Guide for detailed workflows on analyzing and responding to threats.

Reducing False Positives

CyberCage is designed to minimize false positives while catching real threats. If a policy blocks legitimate activity:

Review the threat report in the dashboard to understand what triggered detection
Temporarily disable the policy if needed
Adjust policy settings or work with your security team to tune detection
Consider whether the activity should be an exception

Next Steps

Threat Investigation Guide - Investigate and respond to detected threats
Applications - Configure protected applications
MCP Servers - Manage server catalog and approvals
Dashboard Overview - Navigate the security dashboard

Policy & Threats ​

How It Works ​

What CyberCage Protects Against ​

AI-Specific Threats ​

Credential & Data Theft ​

Execution & System Attacks ​

Policies: Control Your Protection ​

How Policies Work ​

Policy Presets ​

Managing Policies ​

When Threats Are Detected ​

Investigating Threats ​

Reducing False Positives ​

Next Steps ​