OpenClaw with Local LLM: The Complete Guide
A startup founder messaged me last week:
"I love OpenClaw but I can't send proprietary code to Claude's servers. Legal will kill me."
Fair. Most enterprise policies prohibit sending source code to third-party AI providers. Healthcare can't send patient data. Finance can't send trading algorithms. Defense can't send anything.
But here's the thing: OpenClaw doesn't care where your LLM lives.
You can run Llama 3.1, Mistral, Qwen, or any OpenAI-compatible model on your own hardware—and connect it to OpenClaw in 5 minutes.
No API costs. No data leaving your network. Full agent capabilities.
This guide covers every method that works.
Why Local LLMs + OpenClaw?
| Concern | Cloud API | Local LLM |
|---|---|---|
| Data privacy | Data leaves your network | Stays on your hardware |
| API costs | $0.015–0.06 per 1K tokens | $0 after hardware |
| Rate limits | Yes | None |
| Latency | 500ms–2s | 50–200ms |
| Offline capability | No | Yes |
| Compliance | Depends on vendor | You control everything |
For agents that touch sensitive data, local is often the only option.
Method 1: Ollama (Easiest)
Ollama is the Docker of LLMs. One command to install, one command to run.
Install Ollama:
curl -fsSL https://ollama.ai/install.sh | sh
Pull a model:
# Fast and capable (8GB VRAM)
ollama pull llama3.1:8b
# Better quality (16GB VRAM)
ollama pull llama3.1:70b-q4_K_M
# Best for coding
ollama pull codellama:34b
Start the server:
ollama serve
Ollama exposes an OpenAI-compatible API at http://localhost:11434/v1.
Configure OpenClaw:
llm:
name: local-ollama
type: openai-compatible
base_url: http://localhost:11434/v1
model: llama3.1:8b
timeout_ms: 60000
That's it. Your agent now uses a local model.
Method 2: vLLM (Best Performance)
vLLM is built for production. It's 24x faster than Hugging Face Transformers and supports continuous batching for multiple concurrent requests.
Install vLLM:
pip install vllm
Start the server:
vllm serve meta-llama/Llama-3.1-70B-Instruct \
--enable-auto-tool-choice \
--tool-call-parser llama3_json \
--tensor-parallel-size 2 # For multi-GPU
Configure OpenClaw:
llm:
name: local-vllm
type: openai-compatible
base_url: http://localhost:8000/v1
model: meta-llama/Llama-3.1-70B-Instruct
timeout_ms: 30000
vLLM shines when you need:
- Multiple agents hitting the same model
- High throughput (hundreds of requests/minute)
- Multi-GPU setups
Method 3: LM Studio (GUI-based)
LM Studio is Ollama with a UI. Great for experimenting with models before committing.
- Download from lmstudio.ai
- Search and download a model
- Click "Start Server" in the Local Server tab
- Configure OpenClaw to use
http://localhost:1234/v1
Configure OpenClaw:
llm:
name: local-lmstudio
type: openai-compatible
base_url: http://localhost:1234/v1
model: local-model
timeout_ms: 60000
Method 4: Text Generation WebUI (Most Models)
If you need exotic models or fine-tunes, oobabooga's text-generation-webui supports almost everything.
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
./start_linux.sh --api --extensions openai
API available at http://localhost:5000/v1.
Which Local LLM Should You Use?
| Model | VRAM | Strength | Best For |
|---|---|---|---|
| Llama 3.1 8B | 8GB | Fast, capable | Quick tasks, triage |
| Llama 3.1 70B | 48GB | GPT-4 quality | Complex reasoning |
| Codellama 34B | 20GB | Code understanding | Code review, generation |
| Mistral 7B | 8GB | Fast | Simple tasks |
| Qwen 2.5 72B | 48GB | Multilingual | International teams |
| DeepSeek Coder | 16GB | Code | Development tasks |
Hardware reality check:
| GPU | VRAM | Max Model |
|---|---|---|
| RTX 3060 | 12GB | 8B models |
| RTX 3090 | 24GB | 34B models (quantized) |
| RTX 4090 | 24GB | 34B models (quantized) |
| A100 40GB | 40GB | 70B models (quantized) |
| 2x A100 | 80GB | 70B models (full) |
No GPU? Use CPU inference with llama.cpp—just expect 10-20x slower responses.
The Security Gap You're Not Thinking About
Running a local LLM solves the data privacy problem.
But you still have the agent security problem.
Your local LLM is private. Great. But the agent connected to it can still:
- Execute arbitrary shell commands
- Read/write any file on the system
- Make HTTP requests to any domain
- Access your API keys and credentials
42,665 exposed OpenClaw instances were found in January 2026. 93.4% were vulnerable. The LLM location didn't matter—the deployment security did.
This is where Clawctl's managed deployment comes in.
Without Clawctl (Raw OpenClaw):
- Local LLM ✓
- Data stays on network ✓
- Agent can run arbitrary code ⚠️
- No audit trail ⚠️
- No kill switch ⚠️
- Credentials in plaintext ⚠️
- No approval workflow ⚠️
With Clawctl Managed Deployment:
- Local LLM ✓
- Data stays on network ✓
- Sandbox isolation — Agent can't escape its container
- Full audit trail — Every action searchable, exportable
- One-click kill switch — Stop everything instantly
- Encrypted secrets vault — API keys encrypted at rest
- Human-in-the-loop — 70+ risky actions blocked until you approve
- Egress control — Only approved domains reachable
- Prompt injection defense — Attack patterns detected and blocked
Example: Local LLM + Clawctl
# Start Ollama
ollama serve &
# Deploy OpenClaw with Clawctl
# Sign up at clawctl.com/checkout, pick a plan, and your agent is provisioned automatically
Configure your agent to use the local model:
llm:
name: local
type: openai-compatible
base_url: http://host.docker.internal:11434/v1
model: llama3.1:70b
Now you have:
- Zero API costs
- Data on your network
- Agent security from Clawctl
- Full audit trail
- Human approval for risky actions
Common Issues
"Connection refused to localhost"
Docker containers can't reach localhost the same way. Use:
host.docker.internal(Docker Desktop)- Your machine's LAN IP
--network=hostflag
"Model too slow"
- Quantize: Use Q4_K_M instead of full precision
- Batch: Enable continuous batching in vLLM
- Upgrade: More VRAM = bigger context = better results
"Tool calling doesn't work"
Not all models support structured tool calls. Use:
- Llama 3.1 (native tool calling)
- Mistral (function calling)
- Models trained with tool-use datasets
Cost Comparison
Cloud API (1M tokens/month):
| Provider | Cost |
|---|---|
| Claude 3.5 Sonnet | $18 |
| GPT-4o | $15 |
| Gemini Pro | $7 |
Local LLM (1M tokens/month):
| Setup | Cost |
|---|---|
| RTX 3090 (used) | $600 one-time + electricity |
| Cloud GPU (A100) | $1-3/hour |
| Your MacBook M2 | $0 (slow) |
At 1M tokens/month, a $600 GPU pays for itself in 3-4 months.
At 10M tokens/month, it pays for itself in 2 weeks.
Deploy Your Local LLM Agent Securely
Running a local LLM is step one. Running it safely in production is step two.
Clawctl gives you a managed, secure OpenClaw deployment in 60 seconds. Sign up at clawctl.com/checkout, pick a plan, and your agent is provisioned automatically.
What you get:
- Gateway authentication (256-bit, formally verified)
- Container sandbox isolation
- Network egress control (domain allowlist)
- Human-in-the-loop approvals for 70+ risky actions
- Full audit logging (searchable, exportable)
- One-click kill switch
- Prompt injection defense
- Automatic security updates
Your model. Your data. Our guardrails. $49/month — cheaper than one incident.
Deploy securely with Clawctl →
More resources: