Skip to main content
use-apify.com

Scrapy: guides & tutorials

Crawl at scale with Twisted spiders, pipelines, and middleware for structured data—production crawlers often pair exports and scheduling with Apify.

7 articles

View all tags

Scrapy is a mature Python framework for crawling at scale, with built-in spiders, pipelines, and middleware for structured data. These guides cover defining spiders, processing items, and handling concurrency so large crawls stay organized and maintainable.

Scrapy excels at big, repeatable crawls but needs extra setup for JavaScript-heavy pages, where a browser or scraping API helps. Production teams often pair Scrapy exports with Apify scheduling and proxies. Below you will find tutorials, comparisons, and patterns for shipping Scrapy crawlers reliably.

Related topics

Apify4 min read

Apify vs Scrapy 2026: Which Web Scraping Tool Should You Use?

· 4 min read
Yassine El Haddad
Software Developer & Automation Specialist

Scrapy is the mature Python web crawling framework. Apify is a cloud platform (with Crawlee as its open-source framework) that handles infrastructure, scaling, and storage on top of Node.js.

They're not direct competitors — Scrapy is a code framework, Apify is a full platform — but teams frequently choose between them. This comparison covers where each excels.

Guide9 min read

Complete Guide to Web Scraping with Python in 2026: Tools, Code, and Best Practices

· 9 min read
Yassine El Haddad
Software Developer & Automation Specialist

Python remains the dominant language for web scraping in 2026. Whether you need static HTML parsing, JavaScript-rendered pages, or production-grade crawlers, the Python ecosystem delivers: requests, BeautifulSoup, httpx, Playwright, Scrapy, and Crawlee for Python. This guide covers the full stack—libraries, comparison tables, code examples, and data storage—so you can choose and build with confidence. Try Apify for managed Python Actors or run Crawlee Python locally.

Scrapy3 min read

Learn Scrapy on Udemy: Best Courses and Learning Path 2026

· 3 min read
Yassine El Haddad
Software Developer & Automation Specialist

The best Udemy Scrapy courses in 2026 are Scrapy: Powerful Web Scraping & Crawling with Python (GoTrained, Lazar Telebak, 4.2★, 16K+ students), Modern Web Scraping with Python using Scrapy Splash Selenium (Ahmed Rafik, 4.6★, 24K+), and Web Scraping in Python Selenium, Scrapy + ChatGPT (4.4★). Scrapy excels at large-scale crawling; use BeautifulSoup for quick one-off parsers.

Browse Scrapy courses on Udemy

Beautiful Soup7 min read

Web Scraping with Python for Beginners (2026 Guide)

· 7 min read
Yassine El Haddad
Software Developer & Automation Specialist

Python is the most popular language for collecting web data because it pairs readable syntax with strong libraries for HTTP, HTML parsing, browsers, and large-scale crawling. This guide explains when to use each approach, gives copy-paste examples, and shows where Apify fits if you want no-code or fully managed scraping.

Crawlee10 min read

Best Free Web Scraping Tools in 2026 (Honest Limits & Comparison)

· 10 min read
Yassine El Haddad
Software Developer & Automation Specialist

“Free” web scraping usually means free software or a free tier—not free infrastructure. Bandwidth, headless browsers, CAPTCHAs, and IP reputation still cost money somewhere. This guide lists practical free and freemium options, what each is good for, where free breaks, and how to choose before you pay for proxies or platforms.

Quick answer

The best free web scraping tools in 2026 are Apify (free plan with $5 monthly credits), Crawlee (open-source), Beautiful Soup and Scrapy (Python open-source), Playwright (browser automation), and ParseHub (no-code, limited free tier). For managed crawling APIs with starter free credits, Firecrawl and ScraperAPI are also common starting points—each caps free usage tightly.

Automation6 min read

Crawlee vs. Scrapy vs. BeautifulSoup: Which Framework in 2026?

· 6 min read
Yassine El Haddad
Software Developer & Automation Specialist

These three tools are frequently compared but rarely doing the same job. BeautifulSoup is not a crawler — it's an HTML parser. Scrapy is a Python crawling framework. Crawlee is a Node.js (and Python) crawling library with first-class browser support.

Picking the wrong one means building a codebase with the wrong tool for your actual target. This guide makes the differences concrete.

Crawlee6 min read

Python vs Node.js for Web Scraping (2026): When Each Wins

· 6 min read
Yassine El Haddad
Software Developer & Automation Specialist

Quick Answer

Python is better for data science and ML workflows (BeautifulSoup, Scrapy, Pandas). Node.js is better for JavaScript-heavy sites (Puppeteer, Playwright, Crawlee) and real-time processing.

That is a rule of thumb, not a law: both ecosystems run Playwright, both can scale in the cloud, and platforms like Apify run Python and Node Actors so you can mix languages with hosted infra.

Choosing a language for scraping is less about “which is faster in theory” and more about what you already ship, what the target site needs (static HTML vs heavy JavaScript), and where the data goes next (notebooks, warehouses, real-time APIs).

Guides on this site

Frequently asked questions

Frequently Asked Questions

Scrapy is a mature Python web scraping framework — fast, batteries-included, and well-tested for large structured crawls. It handles scheduling, request queuing, retry logic, data pipelines, and output feeds out of the box. It is ideal for Python shops that need to crawl thousands of pages reliably and want a proven framework rather than assembling libraries manually. For JavaScript-heavy sites, Scrapy needs a Playwright or Splash integration.

For one-off extractions or simple sites, requests + BeautifulSoup is faster to set up than Scrapy. Scrapy pays off when you need built-in concurrency, middleware pipeline, structured item exports, and production-grade retry handling — typically when crawling hundreds or thousands of pages on a schedule. If you are a Python developer building a recurring data pipeline, Scrapy is worth the initial setup time.

Scrapy + Redis (via scrapy-redis) distributes the request queue across multiple workers. For managed infrastructure, deploy Scrapy actors to Apify or use Zyte's Scrapy Cloud. Tune CONCURRENT_REQUESTS, DOWNLOAD_DELAY, and AutoThrottle settings for each target domain. For hundreds of thousands of pages, plan for distributed crawling from the start — single-process Scrapy will hit memory and CPU limits.

Scrapy is Python-only. If your team primarily writes JavaScript or TypeScript, Crawlee is the closer equivalent — it provides the same framework-level structure (request queuing, retry logic, datasets) but in Node.js, with native Playwright and Puppeteer support. Both run on Apify. Choose the language your team is most productive in.