Skip to main content
use-apify.com

Crawlee: guides & tutorials

Apify's Crawlee unifies HTTP and Playwright crawling in Node or Python: queues, storage, and anti-blocking helpers for custom scraper code.

11 articlesPage 1 of 2

View all tags

Crawlee is Apify's open-source crawling library for Node.js and Python that unifies HTTP and headless browser scraping behind one API. It handles request queues, automatic retries, proxy rotation, and dataset storage so you write extraction logic instead of plumbing. These guides cover building real crawlers with Crawlee from scratch.

Its strength is graduating smoothly from fast HTTP crawls to Playwright or Puppeteer when a site needs a real browser, without rewriting your code. Deploy the same crawler locally or as an Apify Actor in the cloud. Below you will find step-by-step tutorials, configuration patterns, and examples for anti-blocking, concurrency, and exporting clean data.

Related topics

Apify5 min read

Crawlee Node.js Tutorial: Production Web Scraping Without the Boilerplate (2026)

· 5 min read
Yassine El Haddad
Software Developer & Automation Specialist

Crawlee is an open-source Node.js framework from Apify that bundles everything a production scraper needs: request deduplication, auto-retry, proxy rotation, session management, persistent storage, and Playwright/Puppeteer/HTTP crawlers under one API.

Where raw Playwright requires wiring all those pieces manually, Crawlee provides them out of the box — letting you focus on extraction logic.

Freshness note: Examples verified against Crawlee 3.x (March 2026). Install crawlee@latest to get the current release.

Crawlee8 min read

Complete Guide to Web Scraping with JavaScript and Node.js in 2026

· 8 min read
Yassine El Haddad
Software Developer & Automation Specialist

JavaScript and Node.js power some of the most capable web scrapers in 2026. The ecosystem spans Axios and node-fetch for HTTP, Cheerio for HTML parsing, Playwright and Puppeteer for browser automation, and Crawlee as the full framework that powers Apify Actors. This guide covers the JS scraping stack, comparison tables, TypeScript patterns, data output options, and a complete Crawlee TypeScript Actor example. Try Apify to run Crawlee Actors in the cloud.

Apify5 min read

Firecrawl vs Crawlee: API Abstraction vs Orchestration Frameworks

· 5 min read
Yassine El Haddad
Software Developer & Automation Specialist

Data engineering in 2026 is sharply divided by two distinct extraction paradigms: utilizing a managed API for rapid data normalization, or deploying an orchestration framework for deterministic, high-volume control.

The two dominant solutions representing these philosophies are Firecrawl (an API-first pipeline optimized for LLM ingestion) and Crawlee (the industry-standard open-source scraping framework maintained by Apify).

This guide provides a strict architectural comparison to determine which tool fits your extraction parameters.

Beautiful Soup4 min read

Python Extraction Architectures: httpx vs Playwright vs Crawlee

· 4 min read
Yassine El Haddad
Software Developer & Automation Specialist

Python is a common choice when your stack already lives there—PyTorch training loops, Polars pipelines, or internal services. Keeping extraction in Python avoids extra RPC glue between languages.

This guide walks from simple static fetches (httpx + BeautifulSoup) to browser automation and Crawlee for heavier jobs.

Crawlee4 min read

JavaScript Extraction Architectures: Cheerio vs Playwright vs Crawlee

· 4 min read
Yassine El Haddad
Software Developer & Automation Specialist

Node.js possesses outsized advantages for data extraction pipelines: its single-threaded, non-blocking asynchronous event loop naturally aligns with high-concurrency network I/O, and its DOM-manipulation syntax mirrors native browser behavior.

This guide provides a formal architectural breakdown of the three primary abstraction layers available to JavaScript data engineers in 2026.

Architecture5 min read

Production Data Extraction: CI/CD, Queues, and Telemetry (2026)

· 5 min read
Yassine El Haddad
Software Developer & Automation Specialist

A linear Python script with requests and a for loop over 500 URLs is not a production system. In real deployments, markup changes, socket timeouts, and bad proxy exits eventually break naive runs.

To move from a side project to production, your pipeline needs fault tolerance, state, and observability.

This guide covers four practical building blocks for running high-volume extraction reliably.

Crawlee10 min read

Best Free Web Scraping Tools in 2026 (Honest Limits & Comparison)

· 10 min read
Yassine El Haddad
Software Developer & Automation Specialist

“Free” web scraping usually means free software or a free tier—not free infrastructure. Bandwidth, headless browsers, CAPTCHAs, and IP reputation still cost money somewhere. This guide lists practical free and freemium options, what each is good for, where free breaks, and how to choose before you pay for proxies or platforms.

Quick answer

The best free web scraping tools in 2026 are Apify (free plan with $5 monthly credits), Crawlee (open-source), Beautiful Soup and Scrapy (Python open-source), Playwright (browser automation), and ParseHub (no-code, limited free tier). For managed crawling APIs with starter free credits, Firecrawl and ScraperAPI are also common starting points—each caps free usage tightly.

Automation6 min read

Crawlee vs. Scrapy vs. BeautifulSoup: Which Framework in 2026?

· 6 min read
Yassine El Haddad
Software Developer & Automation Specialist

These three tools are frequently compared but rarely doing the same job. BeautifulSoup is not a crawler — it's an HTML parser. Scrapy is a Python crawling framework. Crawlee is a Node.js (and Python) crawling library with first-class browser support.

Picking the wrong one means building a codebase with the wrong tool for your actual target. This guide makes the differences concrete.

Guides on this site

Frequently asked questions

Frequently Asked Questions

Crawlee is an open-source Node.js and Python library from Apify for building reliable crawlers. It handles request queuing, automatic retries, browser pools, session management, and storage. Crawlee runs locally or on Apify Cloud with minimal config changes, making it easy to prototype locally before deploying at scale.

Scrapy is Python-native with a mature ecosystem; Crawlee is JavaScript-native and ships headless browser support out of the box. Choose Scrapy for pure Python shops or Crawlee if your team codes in Node. Both support horizontal scaling, but Crawlee integrates more tightly with Apify storage and proxy rotation.

Plain HTTP with Cheerio, headless Chrome or Firefox with Playwright, or Puppeteer-based crawlers—all from the same interface. Mix handler types per URL pattern. Crawlee's auto-scale pool manages concurrency automatically, preventing target server overload and reducing proxy costs compared to manual pool sizing.

Yes, Crawlee ships first-class TypeScript types and all official examples use it. Type-checking input schemas, dataset models, and router handlers catches integration bugs before deployment. Apify actors built on Crawlee benefit from end-to-end types from the platform input JSON through to final dataset rows.