The $20/Month AI Operations Stack: Self-Host Ollama + n8n + Coolify on a VPS

April 7, 2026 · 13 min read

Software Developer & Automation Specialist

I build production AI agents, web scrapers, and automation pipelines. Most of what I publish here comes from the actual problems they run into: proxies that get banned, anti-bot stacks that fingerprint your client, RAG that drifts when the underlying data moves. Stack: Python, TypeScript, Go, FastAPI, LangChain, Crawlee, Playwright, deployed on AWS, GCP, and Cloudflare.

Cloud API costs scale linearly with usage. Self-hosting Ollama + n8n + Coolify on a $20/month VPS removes per-token fees for local Ollama inference — you still pay for the VPS, bandwidth, and your time; throughput is limited by CPU/RAM. When workflows call Apify, email, or other SaaS APIs, data transits those providers — not only your server.

This guide covers the complete setup from bare VPS to production-ready AI pipeline, with official documentation links for every step.

TL;DR:

Component	Role	Official docs
Coolify	Infrastructure manager (replaces Heroku/Vercel for self-hosting)	coolify.io/docs
Ollama	Local LLM inference (Llama 3.1, Mistral, DeepSeek)	ollama.com
n8n	Workflow orchestration (visual automation)	docs.n8n.io
Qdrant	Vector database for AI memory/RAG (Retrieval-Augmented Generation)	qdrant.tech/documentation
PostgreSQL	Structured data storage	postgresql.org

Prerequisites:

A VPS with 4+ CPU cores, 8+ GB RAM, 80+ GB SSD ($16–$25/month range). Minimum 8 GB RAM required for running Llama 3.1 8B comfortably alongside n8n.
A domain name pointed to the VPS IP
Basic terminal/SSH knowledge
30–60 minutes of setup time

Why self-host in 2026

Three compelling reasons beyond cost savings:

Reason	Impact
Zero per-token cost	Run thousands of AI inference requests daily without metering. Process 10,000 documents through Llama 3.1 for the same cost as processing 1.
Data sovereignty	LLM prompts to Ollama stay on your VPS if you do not forward them to cloud APIs. GDPR/HIPAA still require legal basis, access controls, logging, and often DPAs — self-hosting alone is not automatic compliance. Any step that calls Apify, webhooks, or email sends data to those vendors.
No vendor lock-in	Switch models instantly (Llama → Mistral → DeepSeek) without changing a single API call. Ollama provides an OpenAI-compatible API endpoint.

When self-hosting is NOT the right choice: You need frontier model quality (Claude Opus, GPT-4o), your workload is intermittent (paying per-token is cheaper than 24/7 VPS), or you lack operations capacity to maintain servers. For managed alternatives at scale, see Apify pricing.

Architecture overview

┌──────────────────────────────────────────────────────────────┐
│                        Coolify (Manager)                     │
│    ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐   │
│    │  Ollama   │  │   n8n    │  │  Qdrant  │  │ Postgres │   │
│    │  (LLM)   │  │ (Flows)  │  │ (Vector) │  │  (SQL)   │   │
│    │          │  │          │  │          │  │          │   │
│    │ :11434   │  │  :5678   │  │  :6333   │  │  :5432   │   │
│    └──────────┘  └──────────┘  └──────────┘  └──────────┘   │
│                                                              │
│    ┌─────────────────────────────────────────────────────┐   │
│    │              Traefik (Reverse Proxy)                 │   │
│    │         SSL termination, domain routing             │   │
│    │         n8n.yourdomain.com → :5678                  │   │
│    │         ollama.yourdomain.com → :11434              │   │
│    └─────────────────────────────────────────────────────┘   │
└──────────────────────────────────────────────────────────────┘

All services run as Docker containers managed by Coolify. Traefik (included with Coolify) handles SSL certificates and domain routing automatically.

VPS selection guide

For this stack, you need CPU-optimized VPS with sufficient RAM for LLM inference. GPU is optional — Ollama runs well on CPU for 7B/8B models.

Provider	Plan	vCPUs	RAM	Storage	Monthly cost	Best for
Hetzner	CPX31	4	8 GB	160 GB	€10–21/mo (varies by region — check hetzner.com/pricing for current rates)	Budget-optimized, EU data center
DigitalOcean	Premium 4vCPU	4	8 GB	100 GB	$48	Beginner-friendly UI
Contabo	VPS M	6	16 GB	400 GB	€10.49 (~$12)	Most RAM per dollar

Prices approximate as of April 2026 — verify on vendor sites before ordering as prices change frequently.

Recommended: Hetzner CPX31 or Contabo VPS M for the best value. Use a data center close to your users for lower latency.

Step 1: Install Coolify

Coolify replaces the complexity of managing Docker, Traefik, SSL, and deployments. One command installs everything.

SSH into your VPS and run:

curl -fsSL https://cdn.coollabs.io/coolify/install.sh | bash

After installation (~2 minutes), Coolify's UI is available at http://YOUR_VPS_IP:8000.

Initial setup:

Open http://YOUR_VPS_IP:8000 in your browser
Create an admin account
Add your domain under Settings → Domains
Coolify auto-provisions SSL via Let's Encrypt

Official docs: coolify.io/docs/installation

For a deep dive into Coolify configuration, see Self-Host Coolify on a VPS.

Step 2: Deploy Ollama

Ollama provides an OpenAI-compatible API for running open-source LLMs locally.

Deploy via Coolify

In Coolify's dashboard:

Go to Projects → New → Docker Compose
Paste this docker-compose.yml:

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    restart: unless-stopped
    # Docker Compose v2: use mem_limit (deploy.resources only applies under Docker Swarm)
    mem_limit: 6g

volumes:
  ollama_data:

Click Deploy

Pull your first model

SSH into the VPS and run:

docker exec -it ollama ollama pull llama3.1:8b

Expected output:

pulling manifest
pulling 8eeb52dfb3bb... 100% ▕████████████████▏ 4.7 GB
pulling 73b313b5552d... 100% ▕████████████████▏ 1.4 KB
pulling 0ba8f0e314b4... 100% ▕████████████████▏  12 KB
pulling fa304d675061... 100% ▕████████████████▏  487 B
verifying sha256 digest
writing manifest
success

Model selection guide

Model	Parameters	RAM needed	Best for
Llama 3.1 8B	8B	~5 GB	General tasks, classification, summarization
Mistral 7B	7B	~4.5 GB	Fast inference, code generation
DeepSeek Coder V2	16B	~10 GB	Code-specific tasks
Phi-3 Medium	14B	~8–9+ GB (quantization-dependent)	Reasoning, math, structured output — verify RAM against your exact Ollama tag

For 8 GB RAM VPS: stick with Llama 3.1 8B or Mistral 7B. For 16 GB: you can run DeepSeek Coder V2 or load two 7B models.

Test the model:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.1:8b",
  "prompt": "Summarize the key benefits of web scraping for business intelligence in 3 bullet points.",
  "stream": false
}'

Official docs: github.com/ollama/ollama

Step 3: Deploy n8n

n8n is a visual workflow automation tool — the self-hosted alternative to Make.com and Zapier.

Deploy via Coolify

Add a second Docker Compose service in Coolify:

services:
  n8n:
    image: n8nio/n8n:latest
    container_name: n8n
    ports:
      - "5678:5678"
    environment:
      # n8n v1+ uses built-in user management — N8N_BASIC_AUTH_* was removed.
      # On first visit to https://n8n.yourdomain.com (or http://YOUR_VPS_IP:5678), complete the owner setup wizard.
      - N8N_HOST=n8n.yourdomain.com
      - N8N_PROTOCOL=https
      - WEBHOOK_URL=https://n8n.yourdomain.com/
      - GENERIC_TIMEZONE=UTC
    volumes:
      - n8n_data:/home/node/.n8n
    restart: unless-stopped

volumes:
  n8n_data:

Connect n8n to Ollama

In n8n, add a new HTTP Request node:

Field	Value
Method	POST
URL	`http://ollama:11434/api/generate`
Body Content Type	JSON
JSON Body	`{"model": "llama3.1:8b", "prompt": "{{$json.text}}", "stream": false}`

The Docker network lets n8n reach Ollama via the service name ollama — no public exposure needed.

⚠️ Docker Networking Note: The hostname ollama resolves only if n8n and Ollama share the same Docker Compose network. If you deploy them as separate Coolify projects, use one of these approaches instead:

Single compose file — Add both services to one Coolify compose stack so they share a network automatically.

External Docker network — Create a shared network (docker network create ai-stack) and add networks: [ai-stack] to both compose services.

Host IP — Use http://<your-server-private-ip>:11434 with a firewall rule allowing port 11434 only from localhost/internal.

Official docs: docs.n8n.io/hosting/installation/docker

For advanced n8n patterns, see n8n Advanced Workflows and n8n + Apify Integration.

Step 4: Add vector memory (Qdrant)

Qdrant stores embeddings for RAG. Your AI workflows can "remember" documents, conversations, and scraped data.

services:
  qdrant:
    image: qdrant/qdrant:latest
    container_name: qdrant
    ports:
      - "6333:6333"
    volumes:
      - qdrant_data:/qdrant/storage
    restart: unless-stopped

volumes:
  qdrant_data:

Alternative: pgvector — if you prefer PostgreSQL for everything, add the pgvector extension instead of running a separate Qdrant instance. Good for simpler setups; Qdrant wins on vector search performance at scale.

Official docs: qdrant.tech/documentation/quick-start

For RAG pipeline architecture, see RAG in Production and Local RAG Chatbot with Ollama + ChromaDB.

Step 5: Build your first AI workflow

A complete workflow: Webhook trigger → Apify scrape → Ollama classify → store → alert.

[Webhook]──▶[Apify API]──▶[Ollama Classify]──▶[PostgreSQL]──▶[Slack Alert]
   │              │               │                  │              │
   │  "New lead"  │  Scrape the   │  "Is this lead   │  Store with  │  "New high-fit
   │  detected    │  company      │   a good fit?    │  score and   │   lead: Acme
   │              │  website      │   Score 1-10"    │  metadata    │   Corp (8/10)"

n8n workflow steps:

Webhook trigger — receives a JSON payload with a company URL
HTTP Request — calls Apify API to start an Actor run. In n8n, set URL, Method, and Headers on the node — do not nest them inside the JSON body.

n8n HTTP Request field	Value
Method	`POST`
URL	`https://api.apify.com/v2/acts/apify~website-content-crawler/runs`
Authentication	Header `Authorization: Bearer YOUR_APIFY_TOKEN`
Body (JSON)	See below

{
  "startUrls": [{ "url": "{{$json.companyUrl}}" }],
  "maxCrawlPages": 5
}

Wait — poll for Apify run completion (or use webhook callback)
Ollama classification — send scraped text for lead scoring:

The JSON body below scores each lead against your ICP for B2B SaaS fit.

{
  "model": "llama3.1:8b",
  "prompt": "Based on this company website content, score this lead 1-10 for fit with our ICP (B2B SaaS, 10-200 employees). Respond with only the score and one sentence explanation.\n\nContent: {{$json.scrapedText}}",
  "stream": false
}

PostgreSQL — insert the lead with score and metadata
IF node — if score ≥ 7, send Slack alert

This single workflow replaces hours of manual lead research. For the full lead generation pipeline, see Automated Lead Generation with AI Agents.

Security hardening

A production AI stack needs proper hardening. Key steps:

Layer	Action	Guide
Firewall	Allow only ports 80, 443, 22. Block 11434, 5678 externally.	Before `ufw enable`, ensure SSH (port 22 or your SSH port) is allowed or you can lock yourself out. Typical order: `ufw default deny incoming`, `ufw allow 22/tcp`, `ufw allow 80,443/tcp`, then `ufw enable`.
SSH	Key-only auth, disable password login, change default port	VPS Security Hardening
Reverse proxy	Let Traefik (via Coolify) handle SSL and routing	Included in Coolify setup
Docker isolation	Non-root containers, read-only filesystems where possible	Docker Compose Production Guide
API auth	n8n owner account + strong password; Ollama not exposed publicly	Use n8n's built-in user management (first-run setup). Put Ollama behind Docker network or reverse proxy with auth — Ollama has no rich app-level auth by itself.
Backups	Automated daily volume snapshots	Backup Guide

For the complete hardening checklist, see VPS Security Hardening Checklist.

Cost breakdown

Monthly costs at various VPS tiers, compared to cloud API pricing. Prices approximate as of April 2026 — verify on vendor sites before ordering as prices change frequently. API pricing from OpenAI and Anthropic.

	Self-hosted (Hetzner)	Self-hosted (Contabo)	OpenAI API	Anthropic API
Monthly base	€10–21/mo (varies by region — check hetzner.com/pricing)	€10.49 ($12)	$0 (pay-per-use)	$0 (pay-per-use)
1,000 requests/day	$0 extra	$0 extra	~$30–90/mo	~$45–135/mo
10,000 requests/day	$0 extra	$0 extra	~$300–900/mo	~$450–1,350/mo
Data sovereignty	✅ Full control	✅ Full control	✗ Data to USA	✗ Data to USA
Model flexibility	✅ Any open model	✅ Any open model	✗ GPT only	✗ Claude only
Frontier quality	✗ 7B–14B tier	✗ 7B–14B tier	✅ GPT-4o	✅ Claude Opus

Break-even analysis: Self-hosting becomes cheaper than cloud APIs at roughly 200–500 requests per day, depending on prompt length and model. Below that threshold, cloud APIs are more cost-effective because you don't pay for idle VPS time.

When to upgrade to cloud

Self-hosting has limits. Consider moving to managed services when:

Signal	Action
You need frontier model quality (Claude Opus, GPT-4o)	Use cloud APIs for quality-critical tasks, Ollama for bulk processing
Response time under 1 second matters	Cloud GPU inference (A100/H100) is 5–10x faster than VPS CPU
You're processing > 50K requests/day	Scaling VPS horizontally requires DevOps expertise
Compliance requires SOC 2 / ISO 27001	Choose managed providers with certifications

For managed web scraping at scale, Apify handles infrastructure, proxies, and scheduling. Create a free account at console.apify.com.

Frequently Asked Questions

Yes. A Hetzner CPX31 (4 vCPUs, 8 GB RAM) is typically €10–21/mo (varies by region — check hetzner.com/pricing for current rates) and comfortably runs Ollama with Llama 3.1 8B, n8n, Qdrant, and PostgreSQL simultaneously. Add a domain (~$1/month) and total cost usually stays in a modest VPS range. For 16 GB RAM VPS (Contabo), you can run larger models for similar monthly cost.

On CPU, expect 10-30 tokens/second with 7B-8B models — fast enough for batch processing, classification, and summarization. For real-time chat, CPU inference is noticeably slower than cloud APIs. If you need faster inference, consider a GPU VPS (Hetzner GPU, Lambda Labs) at $50-200/month.

This stack uses Coolify (infrastructure) + n8n (orchestration). The alternative uses Dify (AI app builder) + n8n. Coolify is a better fit if you want general-purpose self-hosting beyond AI. Dify is a better fit if you're building AI-specific applications with RAG and prompt management. See our comparison in n8n + Dify + Ollama Automation Stack.

Yes. Two approaches: (1) Call the Apify API from n8n to run cloud-hosted scrapers (best for anti-bot targets). (2) Self-host Crawlee on the same VPS for simpler scraping tasks. See Self-Hosting Web Scrapers Guide for the Crawlee approach.

Pin model versions in Ollama (e.g., llama3.1:8b instead of llama3.1:latest). Test new versions in a staging workflow before updating production. n8n's version pinning and Coolify's rollback features make this manageable.

With proper hardening (firewall, SSH keys, SSL, auth), local Ollama inference avoids sending those prompts to OpenAI/Anthropic. Data still leaves your server if workflows call Apify, email, or other SaaS APIs. This is often more private than cloud LLMs for the inference step alone — not a blanket 'no third parties.' Follow our VPS Security Hardening Checklist. For regulated industries, add encryption at rest, contracts (DPAs/BAA), and network isolation.

Yes. n8n supports multi-user with role-based access. Ollama handles concurrent requests. For 10 users with moderate AI usage, upgrade to an 8 vCPU / 16 GB RAM VPS ($25-40/month). Beyond 20 concurrent users, consider horizontal scaling with a load balancer.

Coolify adds SSL management, git-based deployments, environment variable management, and a web dashboard — replacing 2-3 hours of manual DevOps per month. Use bare Docker Compose only if you already have infrastructure automation (Terraform, Ansible) or prefer CLI-only management.

Self-hosting your AI stack in 2026 is not a hobby project — it is a legitimate cost optimization for businesses running AI workloads at volume. The Ollama + n8n + Coolify stack costs $12–25/month, eliminates per-token LLM inference fees (local models via Ollama run on your VPS — note that Apify and other cloud tools in your workflows still have their own usage costs), keeps data sovereign, and lets you swap models without changing code.

Start with Coolify installation, pull Llama 3.1 8B, and build your first n8n workflow this afternoon. For web scraping integration, sign up on Apify and call Actors from n8n. For the managed automation alternative, create a Make.com account.

Why self-host in 2026​

Architecture overview​

VPS selection guide​

Step 1: Install Coolify​

Step 2: Deploy Ollama​

Deploy via Coolify​

Pull your first model​

Model selection guide​

Step 3: Deploy n8n​

Deploy via Coolify​

Connect n8n to Ollama​

Step 4: Add vector memory (Qdrant)​

Step 5: Build your first AI workflow​

n8n workflow steps:​

Security hardening​

Cost breakdown​

When to upgrade to cloud​

Common mistakes and fixes

Why self-host in 2026

Architecture overview

VPS selection guide

Step 1: Install Coolify

Step 2: Deploy Ollama

Deploy via Coolify

Pull your first model

Model selection guide

Step 3: Deploy n8n

Deploy via Coolify

Connect n8n to Ollama

Step 4: Add vector memory (Qdrant)

Step 5: Build your first AI workflow

n8n workflow steps:

Security hardening

Cost breakdown

When to upgrade to cloud