The $20/Month AI Operations Stack: Self-Host Ollama + n8n + Coolify on a VPS
Cloud API costs scale linearly with usage. Self-hosting Ollama + n8n + Coolify on a $20/month VPS removes per-token fees for local Ollama inference — you still pay for the VPS, bandwidth, and your time; throughput is limited by CPU/RAM. When workflows call Apify, email, or other SaaS APIs, data transits those providers — not only your server.
This guide covers the complete setup from bare VPS to production-ready AI pipeline, with official documentation links for every step.
TL;DR:
| Component | Role | Official docs |
|---|---|---|
| Coolify | Infrastructure manager (replaces Heroku/Vercel for self-hosting) | coolify.io/docs |
| Ollama | Local LLM inference (Llama 3.1, Mistral, DeepSeek) | ollama.com |
| n8n | Workflow orchestration (visual automation) | docs.n8n.io |
| Qdrant | Vector database for AI memory/RAG (Retrieval-Augmented Generation) | qdrant.tech/documentation |
| PostgreSQL | Structured data storage | postgresql.org |
Prerequisites:
- A VPS with 4+ CPU cores, 8+ GB RAM, 80+ GB SSD ($16–$25/month range). Minimum 8 GB RAM required for running Llama 3.1 8B comfortably alongside n8n.
- A domain name pointed to the VPS IP
- Basic terminal/SSH knowledge
- 30–60 minutes of setup time
Why self-host in 2026
Three compelling reasons beyond cost savings:
| Reason | Impact |
|---|---|
| Zero per-token cost | Run thousands of AI inference requests daily without metering. Process 10,000 documents through Llama 3.1 for the same cost as processing 1. |
| Data sovereignty | LLM prompts to Ollama stay on your VPS if you do not forward them to cloud APIs. GDPR/HIPAA still require legal basis, access controls, logging, and often DPAs — self-hosting alone is not automatic compliance. Any step that calls Apify, webhooks, or email sends data to those vendors. |
| No vendor lock-in | Switch models instantly (Llama → Mistral → DeepSeek) without changing a single API call. Ollama provides an OpenAI-compatible API endpoint. |
When self-hosting is NOT the right choice: You need frontier model quality (Claude Opus, GPT-4o), your workload is intermittent (paying per-token is cheaper than 24/7 VPS), or you lack operations capacity to maintain servers. For managed alternatives at scale, see Apify pricing.
Architecture overview
┌──────────────────────────────────────────────────────────────┐
│ Coolify (Manager) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Ollama │ │ n8n │ │ Qdrant │ │ Postgres │ │
│ │ (LLM) │ │ (Flows) │ │ (Vector) │ │ (SQL) │ │
│ │ │ │ │ │ │ │ │ │
│ │ :11434 │ │ :5678 │ │ :6333 │ │ :5432 │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Traefik (Reverse Proxy) │ │
│ │ SSL termination, domain routing │ │
│ │ n8n.yourdomain.com → :5678 │ │
│ │ ollama.yourdomain.com → :11434 │ │
│ └─────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘
All services run as Docker containers managed by Coolify. Traefik (included with Coolify) handles SSL certificates and domain routing automatically.
VPS selection guide
For this stack, you need CPU-optimized VPS with sufficient RAM for LLM inference. GPU is optional — Ollama runs well on CPU for 7B/8B models.
| Provider | Plan | vCPUs | RAM | Storage | Monthly cost | Best for |
|---|---|---|---|---|---|---|
| Hetzner | CPX31 | 4 | 8 GB | 160 GB | €10–21/mo (varies by region — check hetzner.com/pricing for current rates) | Budget-optimized, EU data center |
| DigitalOcean | Premium 4vCPU | 4 | 8 GB | 100 GB | $48 | Beginner-friendly UI |
| Contabo | VPS M | 6 | 16 GB | 400 GB | €10.49 (~$12) | Most RAM per dollar |
Prices approximate as of April 2026 — verify on vendor sites before ordering as prices change frequently.
Recommended: Hetzner CPX31 or Contabo VPS M for the best value. Use a data center close to your users for lower latency.
Step 1: Install Coolify
Coolify replaces the complexity of managing Docker, Traefik, SSL, and deployments. One command installs everything.
SSH into your VPS and run:
curl -fsSL https://cdn.coollabs.io/coolify/install.sh | bash
After installation (~2 minutes), Coolify's UI is available at http://YOUR_VPS_IP:8000.
Initial setup:
- Open
http://YOUR_VPS_IP:8000in your browser - Create an admin account
- Add your domain under Settings → Domains
- Coolify auto-provisions SSL via Let's Encrypt
Official docs: coolify.io/docs/installation
For a deep dive into Coolify configuration, see Self-Host Coolify on a VPS.
Step 2: Deploy Ollama
Ollama provides an OpenAI-compatible API for running open-source LLMs locally.
Deploy via Coolify
In Coolify's dashboard:
- Go to Projects → New → Docker Compose
- Paste this
docker-compose.yml:
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
restart: unless-stopped
# Docker Compose v2: use mem_limit (deploy.resources only applies under Docker Swarm)
mem_limit: 6g
volumes:
ollama_data:
- Click Deploy
Pull your first model
SSH into the VPS and run:
docker exec -it ollama ollama pull llama3.1:8b
Expected output:
pulling manifest
pulling 8eeb52dfb3bb... 100% ▕████████████████▏ 4.7 GB
pulling 73b313b5552d... 100% ▕████████████████▏ 1.4 KB
pulling 0ba8f0e314b4... 100% ▕████████████████▏ 12 KB
pulling fa304d675061... 100% ▕████████████████▏ 487 B
verifying sha256 digest
writing manifest
success
Model selection guide
| Model | Parameters | RAM needed | Best for |
|---|---|---|---|
| Llama 3.1 8B | 8B | ~5 GB | General tasks, classification, summarization |
| Mistral 7B | 7B | ~4.5 GB | Fast inference, code generation |
| DeepSeek Coder V2 | 16B | ~10 GB | Code-specific tasks |
| Phi-3 Medium | 14B | ~8–9+ GB (quantization-dependent) | Reasoning, math, structured output — verify RAM against your exact Ollama tag |
For 8 GB RAM VPS: stick with Llama 3.1 8B or Mistral 7B. For 16 GB: you can run DeepSeek Coder V2 or load two 7B models.
Test the model:
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1:8b",
"prompt": "Summarize the key benefits of web scraping for business intelligence in 3 bullet points.",
"stream": false
}'
Official docs: github.com/ollama/ollama
Step 3: Deploy n8n
n8n is a visual workflow automation tool — the self-hosted alternative to Make.com and Zapier.
Deploy via Coolify
Add a second Docker Compose service in Coolify:
services:
n8n:
image: n8nio/n8n:latest
container_name: n8n
ports:
- "5678:5678"
environment:
# n8n v1+ uses built-in user management — N8N_BASIC_AUTH_* was removed.
# On first visit to https://n8n.yourdomain.com (or http://YOUR_VPS_IP:5678), complete the owner setup wizard.
- N8N_HOST=n8n.yourdomain.com
- N8N_PROTOCOL=https
- WEBHOOK_URL=https://n8n.yourdomain.com/
- GENERIC_TIMEZONE=UTC
volumes:
- n8n_data:/home/node/.n8n
restart: unless-stopped
volumes:
n8n_data:
Connect n8n to Ollama
In n8n, add a new HTTP Request node:
| Field | Value |
|---|---|
| Method | POST |
| URL | http://ollama:11434/api/generate |
| Body Content Type | JSON |
| JSON Body | {"model": "llama3.1:8b", "prompt": "{{$json.text}}", "stream": false} |
The Docker network lets n8n reach Ollama via the service name ollama — no public exposure needed.
⚠️ Docker Networking Note: The hostname
ollamaresolves only if n8n and Ollama share the same Docker Compose network. If you deploy them as separate Coolify projects, use one of these approaches instead:
- Single compose file — Add both services to one Coolify compose stack so they share a network automatically.
- External Docker network — Create a shared network (
docker network create ai-stack) and addnetworks: [ai-stack]to both compose services.- Host IP — Use
http://<your-server-private-ip>:11434with a firewall rule allowing port 11434 only from localhost/internal.
Official docs: docs.n8n.io/hosting/installation/docker
For advanced n8n patterns, see n8n Advanced Workflows and n8n + Apify Integration.
Step 4: Add vector memory (Qdrant)
Qdrant stores embeddings for RAG. Your AI workflows can "remember" documents, conversations, and scraped data.
services:
qdrant:
image: qdrant/qdrant:latest
container_name: qdrant
ports:
- "6333:6333"
volumes:
- qdrant_data:/qdrant/storage
restart: unless-stopped
volumes:
qdrant_data:
Alternative: pgvector — if you prefer PostgreSQL for everything, add the pgvector extension instead of running a separate Qdrant instance. Good for simpler setups; Qdrant wins on vector search performance at scale.
Official docs: qdrant.tech/documentation/quick-start
For RAG pipeline architecture, see RAG in Production and Local RAG Chatbot with Ollama + ChromaDB.
Step 5: Build your first AI workflow
A complete workflow: Webhook trigger → Apify scrape → Ollama classify → store → alert.
[Webhook]──▶[Apify API]──▶[Ollama Classify]──▶[PostgreSQL]──▶[Slack Alert]
│ │ │ │ │
│ "New lead" │ Scrape the │ "Is this lead │ Store with │ "New high-fit
│ detected │ company │ a good fit? │ score and │ lead: Acme
│ │ website │ Score 1-10" │ metadata │ Corp (8/10)"
n8n workflow steps:
- Webhook trigger — receives a JSON payload with a company URL
- HTTP Request — calls Apify API to start an Actor run. In n8n, set URL, Method, and Headers on the node — do not nest them inside the JSON body.
| n8n HTTP Request field | Value |
|---|---|
| Method | POST |
| URL | https://api.apify.com/v2/acts/apify~website-content-crawler/runs |
| Authentication | Header Authorization: Bearer YOUR_APIFY_TOKEN |
| Body (JSON) | See below |
{
"startUrls": [{ "url": "{{$json.companyUrl}}" }],
"maxCrawlPages": 5
}
- Wait — poll for Apify run completion (or use webhook callback)
- Ollama classification — send scraped text for lead scoring:
The JSON body below scores each lead against your ICP for B2B SaaS fit.
{
"model": "llama3.1:8b",
"prompt": "Based on this company website content, score this lead 1-10 for fit with our ICP (B2B SaaS, 10-200 employees). Respond with only the score and one sentence explanation.\n\nContent: {{$json.scrapedText}}",
"stream": false
}
- PostgreSQL — insert the lead with score and metadata
- IF node — if score ≥ 7, send Slack alert
This single workflow replaces hours of manual lead research. For the full lead generation pipeline, see Automated Lead Generation with AI Agents.
Security hardening
A production AI stack needs proper hardening. Key steps:
| Layer | Action | Guide |
|---|---|---|
| Firewall | Allow only ports 80, 443, 22. Block 11434, 5678 externally. | Before ufw enable, ensure SSH (port 22 or your SSH port) is allowed or you can lock yourself out. Typical order: ufw default deny incoming, ufw allow 22/tcp, ufw allow 80,443/tcp, then ufw enable. |
| SSH | Key-only auth, disable password login, change default port | VPS Security Hardening |
| Reverse proxy | Let Traefik (via Coolify) handle SSL and routing | Included in Coolify setup |
| Docker isolation | Non-root containers, read-only filesystems where possible | Docker Compose Production Guide |
| API auth | n8n owner account + strong password; Ollama not exposed publicly | Use n8n's built-in user management (first-run setup). Put Ollama behind Docker network or reverse proxy with auth — Ollama has no rich app-level auth by itself. |
| Backups | Automated daily volume snapshots | Backup Guide |
For the complete hardening checklist, see VPS Security Hardening Checklist.
Cost breakdown
Monthly costs at various VPS tiers, compared to cloud API pricing. Prices approximate as of April 2026 — verify on vendor sites before ordering as prices change frequently. API pricing from OpenAI and Anthropic.
| Self-hosted (Hetzner) | Self-hosted (Contabo) | OpenAI API | Anthropic API | |
|---|---|---|---|---|
| Monthly base | €10–21/mo (varies by region — check hetzner.com/pricing) | €10.49 ($12) | $0 (pay-per-use) | $0 (pay-per-use) |
| 1,000 requests/day | $0 extra | $0 extra | ~$30–90/mo | ~$45–135/mo |
| 10,000 requests/day | $0 extra | $0 extra | ~$300–900/mo | ~$450–1,350/mo |
| Data sovereignty | ✅ Full control | ✅ Full control | ✗ Data to USA | ✗ Data to USA |
| Model flexibility | ✅ Any open model | ✅ Any open model | ✗ GPT only | ✗ Claude only |
| Frontier quality | ✗ 7B–14B tier | ✗ 7B–14B tier | ✅ GPT-4o | ✅ Claude Opus |
Break-even analysis: Self-hosting becomes cheaper than cloud APIs at roughly 200–500 requests per day, depending on prompt length and model. Below that threshold, cloud APIs are more cost-effective because you don't pay for idle VPS time.
When to upgrade to cloud
Self-hosting has limits. Consider moving to managed services when:
| Signal | Action |
|---|---|
| You need frontier model quality (Claude Opus, GPT-4o) | Use cloud APIs for quality-critical tasks, Ollama for bulk processing |
| Response time under 1 second matters | Cloud GPU inference (A100/H100) is 5–10x faster than VPS CPU |
| You're processing > 50K requests/day | Scaling VPS horizontally requires DevOps expertise |
| Compliance requires SOC 2 / ISO 27001 | Choose managed providers with certifications |
For managed web scraping at scale, Apify handles infrastructure, proxies, and scheduling. Create a free account at console.apify.com.
Yes. A Hetzner CPX31 (4 vCPUs, 8 GB RAM) is typically €10–21/mo (varies by region — check hetzner.com/pricing for current rates) and comfortably runs Ollama with Llama 3.1 8B, n8n, Qdrant, and PostgreSQL simultaneously. Add a domain (~$1/month) and total cost usually stays in a modest VPS range. For 16 GB RAM VPS (Contabo), you can run larger models for similar monthly cost.
On CPU, expect 10-30 tokens/second with 7B-8B models — fast enough for batch processing, classification, and summarization. For real-time chat, CPU inference is noticeably slower than cloud APIs. If you need faster inference, consider a GPU VPS (Hetzner GPU, Lambda Labs) at $50-200/month.
This stack uses Coolify (infrastructure) + n8n (orchestration). The alternative uses Dify (AI app builder) + n8n. Coolify is a better fit if you want general-purpose self-hosting beyond AI. Dify is a better fit if you're building AI-specific applications with RAG and prompt management. See our comparison in n8n + Dify + Ollama Automation Stack.
Yes. Two approaches: (1) Call the Apify API from n8n to run cloud-hosted scrapers (best for anti-bot targets). (2) Self-host Crawlee on the same VPS for simpler scraping tasks. See Self-Hosting Web Scrapers Guide for the Crawlee approach.
Pin model versions in Ollama (e.g., llama3.1:8b instead of llama3.1:latest). Test new versions in a staging workflow before updating production. n8n's version pinning and Coolify's rollback features make this manageable.
With proper hardening (firewall, SSH keys, SSL, auth), local Ollama inference avoids sending those prompts to OpenAI/Anthropic. Data still leaves your server if workflows call Apify, email, or other SaaS APIs. This is often more private than cloud LLMs for the inference step alone — not a blanket 'no third parties.' Follow our VPS Security Hardening Checklist. For regulated industries, add encryption at rest, contracts (DPAs/BAA), and network isolation.
Yes. n8n supports multi-user with role-based access. Ollama handles concurrent requests. For 10 users with moderate AI usage, upgrade to an 8 vCPU / 16 GB RAM VPS ($25-40/month). Beyond 20 concurrent users, consider horizontal scaling with a load balancer.
Coolify adds SSL management, git-based deployments, environment variable management, and a web dashboard — replacing 2-3 hours of manual DevOps per month. Use bare Docker Compose only if you already have infrastructure automation (Terraform, Ansible) or prefer CLI-only management.
Self-hosting your AI stack in 2026 is not a hobby project — it is a legitimate cost optimization for businesses running AI workloads at volume. The Ollama + n8n + Coolify stack costs $12–25/month, eliminates per-token LLM inference fees (local models via Ollama run on your VPS — note that Apify and other cloud tools in your workflows still have their own usage costs), keeps data sovereign, and lets you swap models without changing code.
Start with Coolify installation, pull Llama 3.1 8B, and build your first n8n workflow this afternoon. For web scraping integration, sign up on Apify and call Actors from n8n. For the managed automation alternative, create a Make.com account.
