Skip to main content

The $20/Month AI Operations Stack: Self-Host Ollama + n8n + Coolify on a VPS

· 13 min read
Yassine El Haddad
Software Developer & Automation Specialist

I build production AI agents, web scrapers, and automation pipelines. Most of what I publish here comes from the actual problems they run into: proxies that get banned, anti-bot stacks that fingerprint your client, RAG that drifts when the underlying data moves. Stack: Python, TypeScript, Go, FastAPI, LangChain, Crawlee, Playwright, deployed on AWS, GCP, and Cloudflare.

Cloud API costs scale linearly with usage. Self-hosting Ollama + n8n + Coolify on a $20/month VPS removes per-token fees for local Ollama inference — you still pay for the VPS, bandwidth, and your time; throughput is limited by CPU/RAM. When workflows call Apify, email, or other SaaS APIs, data transits those providers — not only your server.

This guide covers the complete setup from bare VPS to production-ready AI pipeline, with official documentation links for every step.

TL;DR:

ComponentRoleOfficial docs
CoolifyInfrastructure manager (replaces Heroku/Vercel for self-hosting)coolify.io/docs
OllamaLocal LLM inference (Llama 3.1, Mistral, DeepSeek)ollama.com
n8nWorkflow orchestration (visual automation)docs.n8n.io
QdrantVector database for AI memory/RAG (Retrieval-Augmented Generation)qdrant.tech/documentation
PostgreSQLStructured data storagepostgresql.org

Prerequisites:

  • A VPS with 4+ CPU cores, 8+ GB RAM, 80+ GB SSD ($16–$25/month range). Minimum 8 GB RAM required for running Llama 3.1 8B comfortably alongside n8n.
  • A domain name pointed to the VPS IP
  • Basic terminal/SSH knowledge
  • 30–60 minutes of setup time

Why self-host in 2026

Three compelling reasons beyond cost savings:

ReasonImpact
Zero per-token costRun thousands of AI inference requests daily without metering. Process 10,000 documents through Llama 3.1 for the same cost as processing 1.
Data sovereigntyLLM prompts to Ollama stay on your VPS if you do not forward them to cloud APIs. GDPR/HIPAA still require legal basis, access controls, logging, and often DPAs — self-hosting alone is not automatic compliance. Any step that calls Apify, webhooks, or email sends data to those vendors.
No vendor lock-inSwitch models instantly (Llama → Mistral → DeepSeek) without changing a single API call. Ollama provides an OpenAI-compatible API endpoint.

When self-hosting is NOT the right choice: You need frontier model quality (Claude Opus, GPT-4o), your workload is intermittent (paying per-token is cheaper than 24/7 VPS), or you lack operations capacity to maintain servers. For managed alternatives at scale, see Apify pricing.


Architecture overview

┌──────────────────────────────────────────────────────────────┐
│ Coolify (Manager) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Ollama │ │ n8n │ │ Qdrant │ │ Postgres │ │
│ │ (LLM) │ │ (Flows) │ │ (Vector) │ │ (SQL) │ │
│ │ │ │ │ │ │ │ │ │
│ │ :11434 │ │ :5678 │ │ :6333 │ │ :5432 │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Traefik (Reverse Proxy) │ │
│ │ SSL termination, domain routing │ │
│ │ n8n.yourdomain.com → :5678 │ │
│ │ ollama.yourdomain.com → :11434 │ │
│ └─────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘

All services run as Docker containers managed by Coolify. Traefik (included with Coolify) handles SSL certificates and domain routing automatically.


VPS selection guide

For this stack, you need CPU-optimized VPS with sufficient RAM for LLM inference. GPU is optional — Ollama runs well on CPU for 7B/8B models.

ProviderPlanvCPUsRAMStorageMonthly costBest for
HetznerCPX3148 GB160 GB€10–21/mo (varies by region — check hetzner.com/pricing for current rates)Budget-optimized, EU data center
DigitalOceanPremium 4vCPU48 GB100 GB$48Beginner-friendly UI
ContaboVPS M616 GB400 GB€10.49 (~$12)Most RAM per dollar

Prices approximate as of April 2026 — verify on vendor sites before ordering as prices change frequently.

Recommended: Hetzner CPX31 or Contabo VPS M for the best value. Use a data center close to your users for lower latency.


Step 1: Install Coolify

Coolify replaces the complexity of managing Docker, Traefik, SSL, and deployments. One command installs everything.

SSH into your VPS and run:

curl -fsSL https://cdn.coollabs.io/coolify/install.sh | bash

After installation (~2 minutes), Coolify's UI is available at http://YOUR_VPS_IP:8000.

Initial setup:

  1. Open http://YOUR_VPS_IP:8000 in your browser
  2. Create an admin account
  3. Add your domain under Settings → Domains
  4. Coolify auto-provisions SSL via Let's Encrypt

Official docs: coolify.io/docs/installation

For a deep dive into Coolify configuration, see Self-Host Coolify on a VPS.


Step 2: Deploy Ollama

Ollama provides an OpenAI-compatible API for running open-source LLMs locally.

Deploy via Coolify

In Coolify's dashboard:

  1. Go to Projects → New → Docker Compose
  2. Paste this docker-compose.yml:
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
restart: unless-stopped
# Docker Compose v2: use mem_limit (deploy.resources only applies under Docker Swarm)
mem_limit: 6g

volumes:
ollama_data:
  1. Click Deploy

Pull your first model

SSH into the VPS and run:

docker exec -it ollama ollama pull llama3.1:8b

Expected output:

pulling manifest
pulling 8eeb52dfb3bb... 100% ▕████████████████▏ 4.7 GB
pulling 73b313b5552d... 100% ▕████████████████▏ 1.4 KB
pulling 0ba8f0e314b4... 100% ▕████████████████▏ 12 KB
pulling fa304d675061... 100% ▕████████████████▏ 487 B
verifying sha256 digest
writing manifest
success

Model selection guide

ModelParametersRAM neededBest for
Llama 3.1 8B8B~5 GBGeneral tasks, classification, summarization
Mistral 7B7B~4.5 GBFast inference, code generation
DeepSeek Coder V216B~10 GBCode-specific tasks
Phi-3 Medium14B~8–9+ GB (quantization-dependent)Reasoning, math, structured output — verify RAM against your exact Ollama tag

For 8 GB RAM VPS: stick with Llama 3.1 8B or Mistral 7B. For 16 GB: you can run DeepSeek Coder V2 or load two 7B models.

Test the model:

curl http://localhost:11434/api/generate -d '{
"model": "llama3.1:8b",
"prompt": "Summarize the key benefits of web scraping for business intelligence in 3 bullet points.",
"stream": false
}'

Official docs: github.com/ollama/ollama


Step 3: Deploy n8n

n8n is a visual workflow automation tool — the self-hosted alternative to Make.com and Zapier.

Deploy via Coolify

Add a second Docker Compose service in Coolify:

services:
n8n:
image: n8nio/n8n:latest
container_name: n8n
ports:
- "5678:5678"
environment:
# n8n v1+ uses built-in user management — N8N_BASIC_AUTH_* was removed.
# On first visit to https://n8n.yourdomain.com (or http://YOUR_VPS_IP:5678), complete the owner setup wizard.
- N8N_HOST=n8n.yourdomain.com
- N8N_PROTOCOL=https
- WEBHOOK_URL=https://n8n.yourdomain.com/
- GENERIC_TIMEZONE=UTC
volumes:
- n8n_data:/home/node/.n8n
restart: unless-stopped

volumes:
n8n_data:

Connect n8n to Ollama

In n8n, add a new HTTP Request node:

FieldValue
MethodPOST
URLhttp://ollama:11434/api/generate
Body Content TypeJSON
JSON Body{"model": "llama3.1:8b", "prompt": "{{$json.text}}", "stream": false}

The Docker network lets n8n reach Ollama via the service name ollama — no public exposure needed.

⚠️ Docker Networking Note: The hostname ollama resolves only if n8n and Ollama share the same Docker Compose network. If you deploy them as separate Coolify projects, use one of these approaches instead:

  1. Single compose file — Add both services to one Coolify compose stack so they share a network automatically.
  2. External Docker network — Create a shared network (docker network create ai-stack) and add networks: [ai-stack] to both compose services.
  3. Host IP — Use http://<your-server-private-ip>:11434 with a firewall rule allowing port 11434 only from localhost/internal.

Official docs: docs.n8n.io/hosting/installation/docker

For advanced n8n patterns, see n8n Advanced Workflows and n8n + Apify Integration.


Step 4: Add vector memory (Qdrant)

Qdrant stores embeddings for RAG. Your AI workflows can "remember" documents, conversations, and scraped data.

services:
qdrant:
image: qdrant/qdrant:latest
container_name: qdrant
ports:
- "6333:6333"
volumes:
- qdrant_data:/qdrant/storage
restart: unless-stopped

volumes:
qdrant_data:

Alternative: pgvector — if you prefer PostgreSQL for everything, add the pgvector extension instead of running a separate Qdrant instance. Good for simpler setups; Qdrant wins on vector search performance at scale.

Official docs: qdrant.tech/documentation/quick-start

For RAG pipeline architecture, see RAG in Production and Local RAG Chatbot with Ollama + ChromaDB.


Step 5: Build your first AI workflow

A complete workflow: Webhook trigger → Apify scrape → Ollama classify → store → alert.

[Webhook]──▶[Apify API]──▶[Ollama Classify]──▶[PostgreSQL]──▶[Slack Alert]
│ │ │ │ │
│ "New lead" │ Scrape the │ "Is this lead │ Store with │ "New high-fit
│ detected │ company │ a good fit? │ score and │ lead: Acme
│ │ website │ Score 1-10" │ metadata │ Corp (8/10)"

n8n workflow steps:

  1. Webhook trigger — receives a JSON payload with a company URL
  2. HTTP Request — calls Apify API to start an Actor run. In n8n, set URL, Method, and Headers on the node — do not nest them inside the JSON body.
n8n HTTP Request fieldValue
MethodPOST
URLhttps://api.apify.com/v2/acts/apify~website-content-crawler/runs
AuthenticationHeader Authorization: Bearer YOUR_APIFY_TOKEN
Body (JSON)See below
{
"startUrls": [{ "url": "{{$json.companyUrl}}" }],
"maxCrawlPages": 5
}
  1. Wait — poll for Apify run completion (or use webhook callback)
  2. Ollama classification — send scraped text for lead scoring:

The JSON body below scores each lead against your ICP for B2B SaaS fit.

{
"model": "llama3.1:8b",
"prompt": "Based on this company website content, score this lead 1-10 for fit with our ICP (B2B SaaS, 10-200 employees). Respond with only the score and one sentence explanation.\n\nContent: {{$json.scrapedText}}",
"stream": false
}
  1. PostgreSQL — insert the lead with score and metadata
  2. IF node — if score ≥ 7, send Slack alert

This single workflow replaces hours of manual lead research. For the full lead generation pipeline, see Automated Lead Generation with AI Agents.


Security hardening

A production AI stack needs proper hardening. Key steps:

LayerActionGuide
FirewallAllow only ports 80, 443, 22. Block 11434, 5678 externally.Before ufw enable, ensure SSH (port 22 or your SSH port) is allowed or you can lock yourself out. Typical order: ufw default deny incoming, ufw allow 22/tcp, ufw allow 80,443/tcp, then ufw enable.
SSHKey-only auth, disable password login, change default portVPS Security Hardening
Reverse proxyLet Traefik (via Coolify) handle SSL and routingIncluded in Coolify setup
Docker isolationNon-root containers, read-only filesystems where possibleDocker Compose Production Guide
API authn8n owner account + strong password; Ollama not exposed publiclyUse n8n's built-in user management (first-run setup). Put Ollama behind Docker network or reverse proxy with auth — Ollama has no rich app-level auth by itself.
BackupsAutomated daily volume snapshotsBackup Guide

For the complete hardening checklist, see VPS Security Hardening Checklist.


Cost breakdown

Monthly costs at various VPS tiers, compared to cloud API pricing. Prices approximate as of April 2026 — verify on vendor sites before ordering as prices change frequently. API pricing from OpenAI and Anthropic.

Self-hosted (Hetzner)Self-hosted (Contabo)OpenAI APIAnthropic API
Monthly base€10–21/mo (varies by region — check hetzner.com/pricing)€10.49 ($12)$0 (pay-per-use)$0 (pay-per-use)
1,000 requests/day$0 extra$0 extra~$30–90/mo~$45–135/mo
10,000 requests/day$0 extra$0 extra~$300–900/mo~$450–1,350/mo
Data sovereignty✅ Full control✅ Full control✗ Data to USA✗ Data to USA
Model flexibility✅ Any open model✅ Any open model✗ GPT only✗ Claude only
Frontier quality✗ 7B–14B tier✗ 7B–14B tier✅ GPT-4o✅ Claude Opus

Break-even analysis: Self-hosting becomes cheaper than cloud APIs at roughly 200–500 requests per day, depending on prompt length and model. Below that threshold, cloud APIs are more cost-effective because you don't pay for idle VPS time.


When to upgrade to cloud

Self-hosting has limits. Consider moving to managed services when:

SignalAction
You need frontier model quality (Claude Opus, GPT-4o)Use cloud APIs for quality-critical tasks, Ollama for bulk processing
Response time under 1 second mattersCloud GPU inference (A100/H100) is 5–10x faster than VPS CPU
You're processing > 50K requests/dayScaling VPS horizontally requires DevOps expertise
Compliance requires SOC 2 / ISO 27001Choose managed providers with certifications

For managed web scraping at scale, Apify handles infrastructure, proxies, and scheduling. Create a free account at console.apify.com.


Frequently Asked Questions

Yes. A Hetzner CPX31 (4 vCPUs, 8 GB RAM) is typically €10–21/mo (varies by region — check hetzner.com/pricing for current rates) and comfortably runs Ollama with Llama 3.1 8B, n8n, Qdrant, and PostgreSQL simultaneously. Add a domain (~$1/month) and total cost usually stays in a modest VPS range. For 16 GB RAM VPS (Contabo), you can run larger models for similar monthly cost.

On CPU, expect 10-30 tokens/second with 7B-8B models — fast enough for batch processing, classification, and summarization. For real-time chat, CPU inference is noticeably slower than cloud APIs. If you need faster inference, consider a GPU VPS (Hetzner GPU, Lambda Labs) at $50-200/month.

This stack uses Coolify (infrastructure) + n8n (orchestration). The alternative uses Dify (AI app builder) + n8n. Coolify is a better fit if you want general-purpose self-hosting beyond AI. Dify is a better fit if you're building AI-specific applications with RAG and prompt management. See our comparison in n8n + Dify + Ollama Automation Stack.

Yes. Two approaches: (1) Call the Apify API from n8n to run cloud-hosted scrapers (best for anti-bot targets). (2) Self-host Crawlee on the same VPS for simpler scraping tasks. See Self-Hosting Web Scrapers Guide for the Crawlee approach.

Pin model versions in Ollama (e.g., llama3.1:8b instead of llama3.1:latest). Test new versions in a staging workflow before updating production. n8n's version pinning and Coolify's rollback features make this manageable.

With proper hardening (firewall, SSH keys, SSL, auth), local Ollama inference avoids sending those prompts to OpenAI/Anthropic. Data still leaves your server if workflows call Apify, email, or other SaaS APIs. This is often more private than cloud LLMs for the inference step alone — not a blanket 'no third parties.' Follow our VPS Security Hardening Checklist. For regulated industries, add encryption at rest, contracts (DPAs/BAA), and network isolation.

Yes. n8n supports multi-user with role-based access. Ollama handles concurrent requests. For 10 users with moderate AI usage, upgrade to an 8 vCPU / 16 GB RAM VPS ($25-40/month). Beyond 20 concurrent users, consider horizontal scaling with a load balancer.

Coolify adds SSL management, git-based deployments, environment variable management, and a web dashboard — replacing 2-3 hours of manual DevOps per month. Use bare Docker Compose only if you already have infrastructure automation (Terraform, Ansible) or prefer CLI-only management.


Self-hosting your AI stack in 2026 is not a hobby project — it is a legitimate cost optimization for businesses running AI workloads at volume. The Ollama + n8n + Coolify stack costs $12–25/month, eliminates per-token LLM inference fees (local models via Ollama run on your VPS — note that Apify and other cloud tools in your workflows still have their own usage costs), keeps data sovereign, and lets you swap models without changing code.

Start with Coolify installation, pull Llama 3.1 8B, and build your first n8n workflow this afternoon. For web scraping integration, sign up on Apify and call Actors from n8n. For the managed automation alternative, create a Make.com account.

Common mistakes and fixes

Ollama runs out of memory on 4GB VPS.

Use quantized models: `llama3.1:8b` (default ~4.7 GB pull) or `llama3.1:8b-instruct-q4_K_M` fit typical 8 GB VPS setups. There is no valid tag `llama3.1:8b-q4` on Ollama — use the library tags from ollama.com/library/llama3.1. Avoid running multiple large models simultaneously.

n8n can't connect to Ollama on the Docker network.

Use the Docker service name (ollama) instead of localhost. In n8n HTTP Request node, set URL to http://ollama:11434/api/generate.

Coolify SSL certificates fail to provision.

Ensure your domain's DNS A record points to the VPS IP. Port 80 must be open for ACME challenge. Wait 2-5 minutes for DNS propagation.