use-apify.com
Crawlee: guides & tutorials
Apify's Crawlee unifies HTTP and Playwright crawling in Node or Python: queues, storage, and anti-blocking helpers for custom scraper code.
11 articlesPage 1 of 2
View all tags
Crawlee is Apify's open-source crawling library for Node.js and Python that unifies HTTP and headless browser scraping behind one API. It handles request queues, automatic retries, proxy rotation, and dataset storage so you write extraction logic instead of plumbing. These guides cover building real crawlers with Crawlee from scratch.
Its strength is graduating smoothly from fast HTTP crawls to Playwright or Puppeteer when a site needs a real browser, without rewriting your code. Deploy the same crawler locally or as an Apify Actor in the cloud. Below you will find step-by-step tutorials, configuration patterns, and examples for anti-blocking, concurrency, and exporting clean data.

Crawlee is an open-source Node.js framework from Apify that bundles everything a production scraper needs: request deduplication, auto-retry, proxy rotation, session management, persistent storage, and Playwright/Puppeteer/HTTP crawlers under one API.
Where raw Playwright requires wiring all those pieces manually, Crawlee provides them out of the box — letting you focus on extraction logic.
Freshness note: Examples verified against Crawlee 3.x (March 2026). Install crawlee@latest to get the current release.

Data engineering in 2026 is sharply divided by two distinct extraction paradigms: utilizing a managed API for rapid data normalization, or deploying an orchestration framework for deterministic, high-volume control.
The two dominant solutions representing these philosophies are Firecrawl (an API-first pipeline optimized for LLM ingestion) and Crawlee (the industry-standard open-source scraping framework maintained by Apify).
This guide provides a strict architectural comparison to determine which tool fits your extraction parameters.

A linear Python script with requests and a for loop over 500 URLs is not a production system. In real deployments, markup changes, socket timeouts, and bad proxy exits eventually break naive runs.
To move from a side project to production, your pipeline needs fault tolerance, state, and observability.
This guide covers four practical building blocks for running high-volume extraction reliably.