Clawctl
Guides
11 min

OpenClaw with Local LLM: The Complete Guide (Ollama, vLLM, LM Studio)

Keep your code on your network. Pay $0 in API fees. Run Llama, Mistral, or Qwen locally and connect it to OpenClaw. Here's every method that works.

Clawctl Team

Product & Engineering

OpenClaw with Local LLM: The Complete Guide

A startup founder messaged me last week:

"I love OpenClaw but I can't send proprietary code to Claude's servers. Legal will kill me."

Fair. Most enterprise policies prohibit sending source code to third-party AI providers. Healthcare can't send patient data. Finance can't send trading algorithms. Defense can't send anything.

But here's the thing: OpenClaw doesn't care where your LLM lives.

You can run Llama 3.1, Mistral, Qwen, or any OpenAI-compatible model on your own hardware—and connect it to OpenClaw in 5 minutes.

No API costs. No data leaving your network. Full agent capabilities.

This guide covers every method that works.

Why Local LLMs + OpenClaw?

ConcernCloud APILocal LLM
Data privacyData leaves your networkStays on your hardware
API costs$0.015–0.06 per 1K tokens$0 after hardware
Rate limitsYesNone
Latency500ms–2s50–200ms
Offline capabilityNoYes
ComplianceDepends on vendorYou control everything

For agents that touch sensitive data, local is often the only option.

Method 1: Ollama (Easiest)

Ollama is the Docker of LLMs. One command to install, one command to run.

Install Ollama:

curl -fsSL https://ollama.ai/install.sh | sh

Pull a model:

# Fast and capable (8GB VRAM)
ollama pull llama3.1:8b

# Better quality (16GB VRAM)
ollama pull llama3.1:70b-q4_K_M

# Best for coding
ollama pull codellama:34b

Start the server:

ollama serve

Ollama exposes an OpenAI-compatible API at http://localhost:11434/v1.

Configure OpenClaw:

llm:
  name: local-ollama
  type: openai-compatible
  base_url: http://localhost:11434/v1
  model: llama3.1:8b
  timeout_ms: 60000

That's it. Your agent now uses a local model.

Method 2: vLLM (Best Performance)

vLLM is built for production. It's 24x faster than Hugging Face Transformers and supports continuous batching for multiple concurrent requests.

Install vLLM:

pip install vllm

Start the server:

vllm serve meta-llama/Llama-3.1-70B-Instruct \
  --enable-auto-tool-choice \
  --tool-call-parser llama3_json \
  --tensor-parallel-size 2  # For multi-GPU

Configure OpenClaw:

llm:
  name: local-vllm
  type: openai-compatible
  base_url: http://localhost:8000/v1
  model: meta-llama/Llama-3.1-70B-Instruct
  timeout_ms: 30000

vLLM shines when you need:

  • Multiple agents hitting the same model
  • High throughput (hundreds of requests/minute)
  • Multi-GPU setups

Method 3: LM Studio (GUI-based)

LM Studio is Ollama with a UI. Great for experimenting with models before committing.

  1. Download from lmstudio.ai
  2. Search and download a model
  3. Click "Start Server" in the Local Server tab
  4. Configure OpenClaw to use http://localhost:1234/v1

Configure OpenClaw:

llm:
  name: local-lmstudio
  type: openai-compatible
  base_url: http://localhost:1234/v1
  model: local-model
  timeout_ms: 60000

Method 4: Text Generation WebUI (Most Models)

If you need exotic models or fine-tunes, oobabooga's text-generation-webui supports almost everything.

git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
./start_linux.sh --api --extensions openai

API available at http://localhost:5000/v1.

Which Local LLM Should You Use?

ModelVRAMStrengthBest For
Llama 3.1 8B8GBFast, capableQuick tasks, triage
Llama 3.1 70B48GBGPT-4 qualityComplex reasoning
Codellama 34B20GBCode understandingCode review, generation
Mistral 7B8GBFastSimple tasks
Qwen 2.5 72B48GBMultilingualInternational teams
DeepSeek Coder16GBCodeDevelopment tasks

Hardware reality check:

GPUVRAMMax Model
RTX 306012GB8B models
RTX 309024GB34B models (quantized)
RTX 409024GB34B models (quantized)
A100 40GB40GB70B models (quantized)
2x A10080GB70B models (full)

No GPU? Use CPU inference with llama.cpp—just expect 10-20x slower responses.

The Security Gap You're Not Thinking About

Running a local LLM solves the data privacy problem.

But you still have the agent security problem.

Your local LLM is private. Great. But the agent connected to it can still:

  • Execute arbitrary shell commands
  • Read/write any file on the system
  • Make HTTP requests to any domain
  • Access your API keys and credentials

42,665 exposed OpenClaw instances were found in January 2026. 93.4% were vulnerable. The LLM location didn't matter—the deployment security did.

This is where Clawctl's managed deployment comes in.

Without Clawctl (Raw OpenClaw):

  • Local LLM ✓
  • Data stays on network ✓
  • Agent can run arbitrary code ⚠️
  • No audit trail ⚠️
  • No kill switch ⚠️
  • Credentials in plaintext ⚠️
  • No approval workflow ⚠️

With Clawctl Managed Deployment:

  • Local LLM ✓
  • Data stays on network ✓
  • Sandbox isolation — Agent can't escape its container
  • Full audit trail — Every action searchable, exportable
  • One-click kill switch — Stop everything instantly
  • Encrypted secrets vault — API keys encrypted at rest
  • Human-in-the-loop — 70+ risky actions blocked until you approve
  • Egress control — Only approved domains reachable
  • Prompt injection defense — Attack patterns detected and blocked

Example: Local LLM + Clawctl

# Start Ollama
ollama serve &

# Deploy OpenClaw with Clawctl
# Sign up at clawctl.com/checkout, pick a plan, and your agent is provisioned automatically

Configure your agent to use the local model:

llm:
  name: local
  type: openai-compatible
  base_url: http://host.docker.internal:11434/v1
  model: llama3.1:70b

Now you have:

  • Zero API costs
  • Data on your network
  • Agent security from Clawctl
  • Full audit trail
  • Human approval for risky actions

Common Issues

"Connection refused to localhost"

Docker containers can't reach localhost the same way. Use:

  • host.docker.internal (Docker Desktop)
  • Your machine's LAN IP
  • --network=host flag

"Model too slow"

  • Quantize: Use Q4_K_M instead of full precision
  • Batch: Enable continuous batching in vLLM
  • Upgrade: More VRAM = bigger context = better results

"Tool calling doesn't work"

Not all models support structured tool calls. Use:

  • Llama 3.1 (native tool calling)
  • Mistral (function calling)
  • Models trained with tool-use datasets

Cost Comparison

Cloud API (1M tokens/month):

ProviderCost
Claude 3.5 Sonnet$18
GPT-4o$15
Gemini Pro$7

Local LLM (1M tokens/month):

SetupCost
RTX 3090 (used)$600 one-time + electricity
Cloud GPU (A100)$1-3/hour
Your MacBook M2$0 (slow)

At 1M tokens/month, a $600 GPU pays for itself in 3-4 months.

At 10M tokens/month, it pays for itself in 2 weeks.

Deploy Your Local LLM Agent Securely

Running a local LLM is step one. Running it safely in production is step two.

Clawctl gives you a managed, secure OpenClaw deployment in 60 seconds. Sign up at clawctl.com/checkout, pick a plan, and your agent is provisioned automatically.

What you get:

  • Gateway authentication (256-bit, formally verified)
  • Container sandbox isolation
  • Network egress control (domain allowlist)
  • Human-in-the-loop approvals for 70+ risky actions
  • Full audit logging (searchable, exportable)
  • One-click kill switch
  • Prompt injection defense
  • Automatic security updates

Your model. Your data. Our guardrails. $49/month — cheaper than one incident.

Deploy securely with Clawctl →


More resources:

Ready to deploy your OpenClaw securely?

Get your OpenClaw running in production with Clawctl's enterprise-grade security.