Advanced Security Testing for Large Language Models
OWASP LLM Top 10 2025 • OWASP Agentic Top 10 2026 • ML-Powered Optimization
Full LLM Top 10 2025 and Agentic Top 10 2026 coverage with 70 test cases, MITRE ATLAS cross-references, and MAESTRO layer mappings.
Crescendo, Skeleton Key, Many-Shot Jailbreaking, BoN Sampling, MetaBreak, Content Concretization, and more from the latest published research.
Multi-armed bandit algorithms (Thompson Sampling, UCB1, Contextual) for intelligent attack selection and strategy adaptation.
Identifies content filters, prompt guards, safety alignments, rate limiting, and output filtering across any LLM provider.
Redis-backed job queues, distributed execution, connection pooling, load balancing, and real-time monitoring dashboard.
Test models from OpenAI, Anthropic, Google, Ollama, and custom endpoints. Binaries for Linux, macOS, and Windows.
Document injection, vector embedding manipulation, knowledge graph poisoning, cross-encoder reranking attacks.
4 modulesTool poisoning via description injection, schema manipulation, filesystem boundary escape, supply chain exploitation.
4 modulesDOM injection targeting agent perception, navigation hijack, screenshot exfiltration from AI browser agents.
3 modulesAudio jailbreaks, speech model exploits, multilingual audio attacks, Best-of-N audio sampling.
4 modulesAutonomous multi-turn jailbreaks, Chain-of-Thought exploitation, reasoning loop resource exhaustion.
3 modulesGradient-based optimization, reinforcement learning optimization, diffusion-based adversarial attacks.
3 modulesDelegation trust chain exploitation, toxic output cascade, recursive task bomb across agent orchestrations.
3 modulesMarketplace skill poisoning, typosquatting attacks targeting plugin/skill ecosystems.
2 modulesConfig/prompt rewrite, credential harvesting, RCE tool chain escalation for persistent agent compromise.
3 modules| Framework | Key Attack Vectors | Risk Level |
|---|---|---|
| OpenClaw | 4 tracked CVEs, malicious skill marketplace, queue lane bypass | Critical |
| CrewAI | No per-agent RBAC, raw output passing between agents | High |
| LangGraph | State manipulation, recursive sub-agent spawning ($38K incident) | High |
| AutoGen | Auto-execute code blocks, Docker sandbox escape | Critical |
# Clone the repository
git clone https://github.com/perplext/LLMrecon.git
cd LLMrecon
# Install dependencies
pip install -r requirements.txt
# Test your local models
python3 llmrecon_2025.py --models llama3:latest
# Show OWASP categories
python3 llmrecon_2025.py --owasp
# Download the latest release
curl -LO https://github.com/perplext/LLMrecon/releases/latest/download/llmrecon-linux-amd64
chmod +x llmrecon-linux-amd64
# Or build from source
go build -o llmrecon ./src/main.go
# Run OWASP compliance scan
./llmrecon scan --provider openai --model gpt-4 --owasp
# Pull from GitHub Container Registry
docker pull ghcr.io/perplext/llmrecon:latest
# Run a scan
docker run --rm ghcr.io/perplext/llmrecon:latest scan \
--provider openai --model gpt-4 --owasp
# Or build locally
docker build -t llmrecon .
docker run --rm llmrecon --help
The most comprehensive open-source LLM security testing framework available.