Selenium vs Playwright vs Puppeteer 2026: 35-55 pages/min winner
For new scraping projects in 2026, Playwright wins: it runs Chromium, Firefox, and WebKit from one install with built-in auto-wait and trace viewer, hitting ~35–55 pages/min sequentially on static URLs. Puppeteer 25 is a tighter Chrome/Firefox CDP wrapper with WebDriver BiDi support and lower idle RAM. Selenium 4 still leads when WebDriver Grid, Java/C#, or BiDi network logging are non-negotiable.
If you are choosing a driver for web scraping and automation in 2026, the decision is mostly about protocol, waiting model, and browser coverage—not brand loyalty. This guide compares Selenium, Playwright, and Puppeteer feature by feature, sketches realistic performance expectations, shows minimal starter code for each, and ends with Playwright on Apify Crawlee as the default production path.
Quick verdict
Playwright is the best choice for web scraping in 2026 — faster than Selenium, better supported than Puppeteer, with built-in auto-waiting and multi-browser support. Selenium 4 is best for legacy test suites or BiDi-mandated environments.
Use Puppeteer when you are Chrome-only (or Chrome + Firefox via BiDi), Node-only, and want a minimal CDP wrapper. Use Selenium when you must integrate with existing WebDriver-based QA, non-Node stacks, or Selenium Grid that already standardised on WebDriver.
Comparison at a glance
| Dimension | Selenium | Puppeteer | Playwright |
|---|---|---|---|
| Performance | Slowest for typical DOM automation (WebDriver round-trips; manual waits add latency) | Fast on Chromium (direct CDP over WebSocket) | Fast on Chromium; comparable to Puppeteer for same work; less polling than Selenium |
| API quality | Verbose; waits are mostly explicit (WebDriverWait, expected conditions) | Lean, low-level CDP-centric API | Strong auto-waiting, locators, tracing, codegen |
| Browser support | Chrome, Firefox, Safari/WebKit, Edge (via drivers) | Chromium only (Chrome/Edge family) | Chromium, Firefox, WebKit out of the box |
| Headless support | Yes (driver + browser flags) | Yes (headless by default in v25; old "shell" mode opt-in) | Yes (consistent headless across engines) |
| Community & docs | Huge (oldest ecosystem); tons of Stack Overflow answers | Large (Chrome automation); Node-centric | Very large; active scraping/automation content |
| Best use case | Legacy WebDriver tests, orgs standardised on Selenium grids | Chrome-only bots, PDF/screenshot microservices, Node CDP scripts | Default for new scraping: multi-browser, reliable waits, Crawlee integration |
Feature-by-feature comparison
| Feature | Selenium | Puppeteer | Playwright |
|---|---|---|---|
| Wire protocol | W3C WebDriver + BiDi (Selenium 4) | CDP (WebSocket) or WebDriver BiDi (Puppeteer 25) | Playwright protocol (WebSocket; CDP where needed) |
| Auto-waiting | Manual (expected conditions, sleeps) | Partial; you often wait for selectors/navigation yourself | Built-in actionability checks before clicks/fills |
| Contexts / isolation | New driver/session per profile (heavy) | BrowserContexts exist but ergonomics weaker than Playwright | First-class BrowserContext (cookies, storage, proxy per context) |
| Parallelism | Scale via grid/workers; more moving parts | Parallel pages; watch memory and shared state | Parallel contexts/pages; designed for concurrency |
| Language bindings | Java, Python, C#, Ruby, JS, … | Node.js | Node, Python, .NET, Java |
| Stealth / fingerprinting | Community patches (fragile across versions) | Same cat-and-mouse as any Chromium driver | Same; pair with proxies and sensible crawl policy |
| Screenshots / PDF | Yes | Yes | Yes |
| Tracing / debugging | Varies by binding | CDP tools | Built-in trace viewer, codegen, UI mode |
Scraping takeaway: Selenium pays a per-command HTTP tax and pushes wait complexity to you. Puppeteer and Playwright talk WebSocket to the browser and can react to events instead of polling. Playwright goes further with unified cross-browser binaries and actionability before interactions—fewer flakes on React/Vue SPAs.
Performance and benchmarks
Micro-benchmarks differ by site, hardware, and whether you measure raw navigation, DOM queries, or end-to-end “scrape 1k listings”. Published comparisons and production experience generally align on:
- WebDriver (Selenium) — Higher latency per operation than CDP/WebSocket drivers because commands are serialized HTTP requests to
chromedriver(or other drivers). Throughput drops when you add explicit sleeps or aggressive polling for dynamic UI. - Puppeteer vs Playwright (Chromium) — Often similar raw speed for equivalent CDP work. Differences show up in ergonomics (auto-wait, contexts, fixtures) and multi-browser needs, not a universal 2× gap either way.
- Flake rate — Playwright’s auto-waiting usually wins on SPA-heavy targets because fewer race conditions mean fewer retries and lower total wall time in production.
Treat vendor numbers as order-of-magnitude hints. Your own benchmark should use representative URLs, realistic concurrency, and the same proxy and headless settings you run in production.
When to use each
- Choose Playwright for new scrapers, multi-browser checks, SPA sites, and Crawlee-based jobs on Apify. It is the best general-purpose scraping + automation stack in 2026.
- Choose Puppeteer if you are all-in on Chromium, want a smaller API surface, or maintain existing Puppeteer code with no Firefox/WebKit requirement.
- Choose Selenium if you already run Selenium Grid, corporate QA mandates WebDriver, or you need a specific legacy binding. For greenfield data extraction, it is rarely the best default.
Getting started: minimal examples
Examples are intentionally tiny—replace selectors and URLs with your targets.
Selenium (Python)
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
driver = webdriver.Chrome()
driver.get("https://example.com")
wait = WebDriverWait(driver, 15)
el = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "h1")))
print(el.text)
driver.quit()
Playwright (Node.js)
const { chromium } = require('playwright');
(async () => {
const browser = await chromium.launch({ headless: true });
const page = await browser.newPage();
await page.goto('https://example.com', { waitUntil: 'networkidle' });
const text = await page.locator('h1').innerText();
console.log(text);
await browser.close();
})();
Puppeteer (Node.js)
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch(); // headless by default in Puppeteer 25
const page = await browser.newPage();
await page.goto('https://example.com', { waitUntil: 'networkidle2' });
const text = await page.$eval('h1', (el) => el.innerText);
console.log(text);
await browser.close();
})();
For Python Playwright, install playwright and run playwright install; the API mirrors the Node version closely.
Using Playwright with Apify Crawlee
Crawlee is an open-source crawling and browser-automation library that wraps Playwright (and Puppeteer) with queues, retries, session rotation, storage, and scaling hooks—the pieces raw scripts usually bolt on by hand.
On Apify you run Crawlee-based Actors on managed browsers and proxies, with scheduling, webhooks, and datasets for JSON/CSV export. That is the practical path from “works on my laptop” to scheduled production scraping without operating your own grid.
Concrete next step: create a free Apify account, start from a Playwright + Crawlee Actor template in the store, and route output to a dataset for downstream pipelines.
Run browser scraping on Apify’s free plan →
For typical DOM automation, yes: WebDriver’s HTTP round-trip model and manual waiting usually lose wall-clock time versus WebSocket drivers. Absolute numbers depend on the site, concurrency, and proxies—benchmark your own URLs.
Either works. Playwright adds stronger auto-waiting, richer fixtures, and an easier path to Firefox/WebKit if requirements change. Puppeteer stays valid for small Chromium-only services.
No. It remains relevant for WebDriver-centric test orgs and multi-language grids. For new data-extraction projects, Playwright (often via Crawlee) is usually the better default.
They solve different layers. Scrapy excels at large-scale static HTML crawling. Use Playwright when JavaScript rendering, login flows, or complex UI interactions are required—often as a downstream fetcher in a hybrid design.
Use Crawlee for queueing, retries, and sessions; run on Apify for infrastructure, proxies, and datasets. Avoid unbounded parallel browser instances on a single VM—bound concurrency and memory.
No driver is invisible. Combine realistic concurrency, residential/datacenter proxy strategy, and site-appropriate throttling. For hard targets, evaluate dedicated unblocking products such as Bright Data: https://get.brightdata.com/8xa6yqyp2zxn
Start building: Apify free plan ($5 monthly platform credits) → · Browse Playwright/Crawlee templates in the Store →




