openclaw

OpenClaw + DeepSeek: Local LLM, Open Source and Reducing API Costs

A 15-agent OpenClaw network means millions of tokens per month. BOTUM documents its hybrid strategy: local DeepSeek via Ollama for mechanical tasks, premium cloud for complex ones. Real ROI calculations, field limitations, and series conclusion.

FDbot Admin

14 mars 2026 — 7 min read

A network of 15 active AI agents can easily generate 5 to 10 million tokens per month sent to the cloud. At ~$15/million tokens (Claude Sonnet, GPT-4), that bill quickly exceeds $100/month — for tasks that, in the majority of cases, don't require the best model on the market. The question is no longer "cloud or local?" but "how do you route intelligently?"

This post closes our 7-part series on deploying OpenClaw at BOTUM. We wanted it more exhaustive than the others: DeepSeek + Ollama + hybrid routing + real ROI calculations. It's the post we wish we had before we started.

OpenClaw + Ollama + DeepSeek Architecture

1. The Cost Problem at Scale

When you deploy one or two agents with OpenClaw, cloud API costs remain marginal. But as the network grows — system agents, email, calendar, writing, billing, monitoring, infrastructure — token volume explodes.

At BOTUM, here's a real estimate from our 15-agent network:

High-frequency agents (system, email, monitoring): ~300,000 tokens/day each
Medium-frequency agents (writing, calendar, billing): ~100,000 tokens/day
Occasional agents (monitoring, infra, recruitment): ~50,000 tokens/session

Estimated total: 6 to 8 million tokens/month. At ~$15/M tokens (Claude Sonnet 4), that's $90–120/month — over $1,000/year just in API tokens for mostly mechanical tasks.

The solution: don't send all these tasks to the most expensive model. That's where DeepSeek comes in.

2. Why DeepSeek?

DeepSeek surprised the market in 2024–2025 with performance comparable to GPT-4 on many benchmarks — for an open-source model, deployable locally, and for free. This isn't marketing: results on MMLU, HumanEval, and GSM8K place DeepSeek-V3 and DeepSeek-R1 among the best available models.

Key advantages for OpenClaw usage:

Open-source and free — no licensing, deployable on your infra with no recurring costs
Solid performance on structured tasks — summaries, classification, extraction, utility code generation
Native Ollama compatibility — one config line to activate in OpenClaw
Long context available — DeepSeek-V3 supports up to 128K tokens of context
Multiple model sizes — from 7B (CPU-friendly) to 67B (GPU-optimized)

Where it excels: well-defined repetitive tasks, email summaries, ticket classification, structured report generation, monitoring and alerts, utility scripts.

Where it's weaker: complex multi-step reasoning, fine cultural nuances (literary translation, subtle copywriting), very long instructions with nested constraints.

3. Architecture: OpenClaw + DeepSeek via Ollama

The connection between OpenClaw and a local model is made via Ollama, which exposes an OpenAI-compatible API. The configuration is straightforward.

Step 1 — Install Ollama

curl -fsSL https://ollama.ai/install.sh | sh
ollama pull deepseek-v3
# For the compact version (less RAM required):
ollama pull deepseek-r1:7b

Step 2 — Configure OpenClaw

In ~/.openclaw/openclaw.json, add an Ollama provider:

{
  "providers": {
    "ollama-local": {
      "type": "openai-compatible",
      "baseUrl": "http://localhost:11434/v1",
      "model": "deepseek-v3",
      "apiKey": "ollama"
    }
  }
}

Step 3 — Assign the Model to Specific Agents

In the agent config (e.g., system agent):

{
  "agent": "jarvis",
  "model": "ollama-local",
  "fallback_model": "anthropic/claude-sonnet-4-6"
}

Latency and Resources

Configuration	RAM Required	Latency (~500 token response)	Recommended Use
DeepSeek 7B — CPU only	8 GB RAM	15–30 seconds	Non-urgent tasks, overnight batch
DeepSeek 7B — GPU (RTX 3060)	8 GB VRAM	2–4 seconds	Routine tasks, frequent agents
DeepSeek 67B — GPU (A100)	40+ GB VRAM	5–10 seconds	Complex tasks, near-cloud quality
Claude Sonnet (cloud)	—	1–3 seconds	Complex tasks, critical production

Field observation: Without a dedicated GPU, DeepSeek on CPU is too slow for high-frequency agents. With a modest GPU (RTX 3060 or equivalent), latency becomes acceptable for most mechanical tasks.

Hybrid routing local vs cloud — decision tree

4. Hybrid Routing Strategy

The real optimization lever isn't replacing cloud with local — it's routing each task to the most appropriate model.

The Decision Logic in 5 Questions

Is it a structured, well-defined task? (classification, short summary, field extraction) → Local (DeepSeek)
Is it an interactive task with the user? (direct chat reply, client email) → Lightweight cloud (Haiku, GPT-3.5)
Does it require complex reasoning? (strategic analysis, debugging, architecture) → Premium cloud (Claude Sonnet/Opus, GPT-4)
Is response speed critical? (<2 seconds expected) → Cloud (more predictable latency)
Are there sensitive data involved? → Local mandatory

Implementation in OpenClaw

OpenClaw lets you define the model per agent and per task type. Here's the pattern we use:

# High-frequency / mechanical task agents → local DeepSeek
agents_local = ["jarvis", "aegis", "chronos-digest", "argus-monitoring"]

# Interactive / quality-important agents → Claude Haiku
agents_haiku = ["hermes", "nexus", "forge"]

# Complex / writing / analysis tasks → Claude Sonnet
agents_premium = ["cyrano", "career", "knox-audit"]

In practice, we configure an automatic fallback: if the local model isn't available (restart, GPU saturation), the agent automatically switches to cloud. Continuity guaranteed.

5. Real Savings — ROI Calculation

API cloud vs local model cost comparison

Baseline Scenario — 15-agent network, no optimization

Model	Tokens/month	Cost/M tokens	Cost/month
Claude Sonnet 4 (all cloud)	7,000,000	$15	$105

Optimized Scenario — Hybrid routing (60% local, 30% lightweight cloud, 10% premium cloud)

Model	Tokens/month	Cost/M tokens	Cost/month
DeepSeek local (Ollama)	4,200,000	~$0*	~$0
Claude Haiku (lightweight cloud)	2,100,000	$1	$2.10
Claude Sonnet (premium cloud)	700,000	$15	$10.50
OPTIMIZED TOTAL	7,000,000	—	~$12.60

* Electricity cost estimated at ~$3–5/month for an RTX 3060 GPU in partial use.

Monthly savings: ~$92 → ~88% reduction. GPU investment break-even (RTX 3060 ≈ $450 used): approximately 5 months.

6. Known Limitations

An honest field report includes the limits. Here's what we discovered in production:

GPU Latency and Availability

The GPU is a shared resource. If multiple agents trigger simultaneous calls, requests queue up. For a 15-agent active network, a single GPU can create bottlenecks at peak hours (typically 8–10am and 2–4pm).

Mitigation: automatic fallback to cloud on saturation, or a second dedicated GPU for high-frequency agents.

Variable Quality by Task

DeepSeek 7B is noticeably inferior to Claude Sonnet on tasks requiring nuanced judgment: fine copywriting, high-value commercial emails, complex strategic analyses. We learned (sometimes the hard way) not to delegate these tasks to the local model.

Practical rule: if the task output will be read by someone outside the team, use premium cloud by default.

Long Context: Watch the Windows

DeepSeek-V3 supports 128K tokens in theory. In practice, response quality degrades significantly beyond 32K tokens on mid-size local models. For large document analysis, cloud remains more reliable.

Local Infra Maintenance

A local model is infrastructure to maintain: Ollama updates, GPU driver management, disk space (models weigh 4–40 GB), health monitoring. This maintenance cost is real — factor it into total ROI calculations.

Cases Where Cloud Remains Indispensable

Real-time tasks (<1 second latency required)
Very complex reasoning (reflexive agents, multi-step planning)
Very long contexts (>50K effective tokens)
Access to web search, advanced tools, vision (multimodal)
During local infrastructure maintenance windows

7. BOTUM Field Report

After several months of hybrid deployment, here's what we actually use at BOTUM:

What We Do Today

DeepSeek 7B (Ollama, RTX GPU): JARVIS (system), AEGIS (monitoring), ARGUS (RSS monitoring), automatic heartbeats
Claude Haiku: HERMÈS (email digests), CHRONOS (calendar reminders), NEXUS (simple LinkedIn)
Claude Sonnet 4: CYRANO (writing), KNOX (security), complex analyses, interactive sessions with Faiçal

What We'd Do Differently

Start with routing from day one — we ran everything through cloud for 2 months before optimizing. Unnecessary cost.
Size the GPU before deploying — CPU-only is too slow for an active network. In hindsight, a GPU is a prerequisite, not an option.
Test DeepSeek on each task type before assigning it — we discovered its copywriting limitations in an embarrassing way. A quick benchmark avoids surprises.
Set up automatic fallback from day one — not after the first availability incident at 2am.

OpenClaw series recap — 7 posts, next steps

8. Series Conclusion — What We Built in 7 Posts

We started this series with a simple question: is a self-hosted AI agent network genuinely useful in business — or is it complexity for complexity's sake?

After 7 posts and several months in production, the answer is clear: yes, it's useful — but only if you approach it methodically.

Here's what this series covered:

Post 1: The concept — OpenClaw as an agent runtime, not a chatbot
Post 2: The installation — workspace, skills, first operational agent
Post 3: Security — SSL, reverse proxy, vault, robust authentication
Post 4: Secrets — credential management and AI context
Post 5: The agents — JARVIS, HERMÈS, CHRONOS and the specialization logic
Post 6: The comparison — OpenClaw vs ChatGPT vs Claude API, honestly
Post 7 (this post): Cost optimization — DeepSeek, Ollama, hybrid routing

What we didn't cover (out of respect for your attention): implementation details that depend on your specific environment, tradeoffs with no universal right answer, configurations that took weeks to stabilize. That's where field expertise makes the difference.

If you're reading this series considering deploying OpenClaw: do it. Start small (one agent, one concrete use case), validate the value, then expand. The learning curve is real but manageable. And the operational gains, once the network is running smoothly, more than justify the investment.

🚀 Ready to Deploy OpenClaw in Your Organization?

This 7-part series covers the fundamentals. But going from theory to a production agent network in your environment is a different challenge.

BOTUM teams help organizations deploy enterprise AI architectures — from auditing your needs to production deployment. Every project is different. Yours too.

Talk to a BOTUM Expert →

📥 Full PDF Guide

Download this guide as a PDF to read offline.

Download the guide (PDF)

The Complete OpenClaw Series

Post 1 — OpenClaw: From AI Assistant to Agent Network
Post 2 — Installation and First Operational Agent
Post 3 — Securing OpenClaw: SSL, Reverse Proxy, Vault
Post 4 — Secrets, Credentials and AI Context
Post 5 — Configuring First Agents: JARVIS, HERMÈS, CHRONOS
Post 6 — OpenClaw vs ChatGPT vs Claude API: Honest Comparison
Post 7 — OpenClaw + DeepSeek: Local LLM and Cost Reduction (this post)

🚀 Go Further with BOTUM

This guide covers the essentials. In production, every environment has its own specifics. BOTUM teams accompany organizations through deployment, advanced configuration, and infrastructure hardening. If you have a project, let's talk.

Discuss your project →

OpenClaw Series

← B6 — OpenClaw vs ChatGPT vs Claude B8 — Automating Operations →