use-apify.com
Playwright: guides & tutorials
Automate Chromium, Firefox, and WebKit with one API: resilient waits, contexts, and network hooks for modern scraping—ideal for Apify browser Actors.
17 articlesPage 1 of 2
View all tags
Playwright automates Chromium, Firefox, and WebKit with one API, making it a top choice for scraping JavaScript-heavy sites that plain HTTP requests cannot reach. It offers resilient auto-waiting, browser contexts for isolation, and network interception for blocking or capturing requests. These guides show how to build scrapers that survive modern dynamic pages.
Playwright shines on infinite scroll, logins, and content that loads after the initial response, and it pairs cleanly with Crawlee for queuing and Apify for cloud runs. Below you will find tutorials for browser automation, anti-detection tactics, and patterns for adding proxies and retries so your Playwright scrapers stay stable at scale.

Crawlee is an open-source Node.js framework from Apify that bundles everything a production scraper needs: request deduplication, auto-retry, proxy rotation, session management, persistent storage, and Playwright/Puppeteer/HTTP crawlers under one API.
Where raw Playwright requires wiring all those pieces manually, Crawlee provides them out of the box — letting you focus on extraction logic.
Freshness note: Examples verified against Crawlee 3.x (March 2026). Install crawlee@latest to get the current release.

Cloudflare Bot Management (including Turnstile, Bot Score, and Managed Rules) is the most common blocker scrapers hit in 2026. It combines TLS fingerprinting, JavaScript challenges, behavioral analysis, and IP reputation scoring — none of which raw requests or fetch can handle.
This guide ranks every bypass method by effectiveness, complexity, and cost.
Legal note: Only scrape data you have a legitimate reason to access. Cloudflare protection is the site's choice; bypassing it may violate ToS and in some jurisdictions, the CFAA. Always check robots.txt and review terms before scraping.

Python remains the dominant language for web scraping in 2026. Whether you need static HTML parsing, JavaScript-rendered pages, or production-grade crawlers, the Python ecosystem delivers: requests, BeautifulSoup, httpx, Playwright, Scrapy, and Crawlee for Python. This guide covers the full stack—libraries, comparison tables, code examples, and data storage—so you can choose and build with confidence. Try Apify for managed Python Actors or run Crawlee Python locally.

A headless browser is a full web browser (Chromium, Firefox, or WebKit) that runs without a graphical interface. It executes JavaScript, renders HTML/CSS, handles cookies, and behaves exactly like a visible browser — but can be controlled programmatically and runs on servers without a display.
For web scraping, headless browsers are the solution for sites that don't work with simple HTTP requests.