Skip to main content

Selenium vs Playwright vs Puppeteer 2026: 35-55 pages/min winner

· 7 min read
Yassine El Haddad
Software Developer & Automation Specialist

I build production AI agents, web scrapers, and automation pipelines. Most of what I publish here comes from the actual problems they run into: proxies that get banned, anti-bot stacks that fingerprint your client, RAG that drifts when the underlying data moves. Stack: Python, TypeScript, Go, FastAPI, LangChain, Crawlee, Playwright, deployed on AWS, GCP, and Cloudflare.

Quick Answer

For new scraping projects in 2026, Playwright wins: it runs Chromium, Firefox, and WebKit from one install with built-in auto-wait and trace viewer, hitting ~35–55 pages/min sequentially on static URLs. Puppeteer 25 is a tighter Chrome/Firefox CDP wrapper with WebDriver BiDi support and lower idle RAM. Selenium 4 still leads when WebDriver Grid, Java/C#, or BiDi network logging are non-negotiable.

If you are choosing a driver for web scraping and automation in 2026, the decision is mostly about protocol, waiting model, and browser coverage—not brand loyalty. This guide compares Selenium, Playwright, and Puppeteer feature by feature, sketches realistic performance expectations, shows minimal starter code for each, and ends with Playwright on Apify Crawlee as the default production path.

Quick verdict

Playwright is the best choice for web scraping in 2026 — faster than Selenium, better supported than Puppeteer, with built-in auto-waiting and multi-browser support. Selenium 4 is best for legacy test suites or BiDi-mandated environments.

Use Puppeteer when you are Chrome-only (or Chrome + Firefox via BiDi), Node-only, and want a minimal CDP wrapper. Use Selenium when you must integrate with existing WebDriver-based QA, non-Node stacks, or Selenium Grid that already standardised on WebDriver.

Comparison at a glance

DimensionSeleniumPuppeteerPlaywright
PerformanceSlowest for typical DOM automation (WebDriver round-trips; manual waits add latency)Fast on Chromium (direct CDP over WebSocket)Fast on Chromium; comparable to Puppeteer for same work; less polling than Selenium
API qualityVerbose; waits are mostly explicit (WebDriverWait, expected conditions)Lean, low-level CDP-centric APIStrong auto-waiting, locators, tracing, codegen
Browser supportChrome, Firefox, Safari/WebKit, Edge (via drivers)Chromium only (Chrome/Edge family)Chromium, Firefox, WebKit out of the box
Headless supportYes (driver + browser flags)Yes (headless by default in v25; old "shell" mode opt-in)Yes (consistent headless across engines)
Community & docsHuge (oldest ecosystem); tons of Stack Overflow answersLarge (Chrome automation); Node-centricVery large; active scraping/automation content
Best use caseLegacy WebDriver tests, orgs standardised on Selenium gridsChrome-only bots, PDF/screenshot microservices, Node CDP scriptsDefault for new scraping: multi-browser, reliable waits, Crawlee integration

Feature-by-feature comparison

FeatureSeleniumPuppeteerPlaywright
Wire protocolW3C WebDriver + BiDi (Selenium 4)CDP (WebSocket) or WebDriver BiDi (Puppeteer 25)Playwright protocol (WebSocket; CDP where needed)
Auto-waitingManual (expected conditions, sleeps)Partial; you often wait for selectors/navigation yourselfBuilt-in actionability checks before clicks/fills
Contexts / isolationNew driver/session per profile (heavy)BrowserContexts exist but ergonomics weaker than PlaywrightFirst-class BrowserContext (cookies, storage, proxy per context)
ParallelismScale via grid/workers; more moving partsParallel pages; watch memory and shared stateParallel contexts/pages; designed for concurrency
Language bindingsJava, Python, C#, Ruby, JS, …Node.jsNode, Python, .NET, Java
Stealth / fingerprintingCommunity patches (fragile across versions)Same cat-and-mouse as any Chromium driverSame; pair with proxies and sensible crawl policy
Screenshots / PDFYesYesYes
Tracing / debuggingVaries by bindingCDP toolsBuilt-in trace viewer, codegen, UI mode

Scraping takeaway: Selenium pays a per-command HTTP tax and pushes wait complexity to you. Puppeteer and Playwright talk WebSocket to the browser and can react to events instead of polling. Playwright goes further with unified cross-browser binaries and actionability before interactions—fewer flakes on React/Vue SPAs.

Performance and benchmarks

Micro-benchmarks differ by site, hardware, and whether you measure raw navigation, DOM queries, or end-to-end “scrape 1k listings”. Published comparisons and production experience generally align on:

  1. WebDriver (Selenium) — Higher latency per operation than CDP/WebSocket drivers because commands are serialized HTTP requests to chromedriver (or other drivers). Throughput drops when you add explicit sleeps or aggressive polling for dynamic UI.
  2. Puppeteer vs Playwright (Chromium) — Often similar raw speed for equivalent CDP work. Differences show up in ergonomics (auto-wait, contexts, fixtures) and multi-browser needs, not a universal 2× gap either way.
  3. Flake rate — Playwright’s auto-waiting usually wins on SPA-heavy targets because fewer race conditions mean fewer retries and lower total wall time in production.

Treat vendor numbers as order-of-magnitude hints. Your own benchmark should use representative URLs, realistic concurrency, and the same proxy and headless settings you run in production.

When to use each

  • Choose Playwright for new scrapers, multi-browser checks, SPA sites, and Crawlee-based jobs on Apify. It is the best general-purpose scraping + automation stack in 2026.
  • Choose Puppeteer if you are all-in on Chromium, want a smaller API surface, or maintain existing Puppeteer code with no Firefox/WebKit requirement.
  • Choose Selenium if you already run Selenium Grid, corporate QA mandates WebDriver, or you need a specific legacy binding. For greenfield data extraction, it is rarely the best default.

Getting started: minimal examples

Examples are intentionally tiny—replace selectors and URLs with your targets.

Selenium (Python)

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait

driver = webdriver.Chrome()
driver.get("https://example.com")
wait = WebDriverWait(driver, 15)
el = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "h1")))
print(el.text)
driver.quit()

Playwright (Node.js)

const { chromium } = require('playwright');

(async () => {
const browser = await chromium.launch({ headless: true });
const page = await browser.newPage();
await page.goto('https://example.com', { waitUntil: 'networkidle' });
const text = await page.locator('h1').innerText();
console.log(text);
await browser.close();
})();

Puppeteer (Node.js)

const puppeteer = require('puppeteer');

(async () => {
const browser = await puppeteer.launch(); // headless by default in Puppeteer 25
const page = await browser.newPage();
await page.goto('https://example.com', { waitUntil: 'networkidle2' });
const text = await page.$eval('h1', (el) => el.innerText);
console.log(text);
await browser.close();
})();

For Python Playwright, install playwright and run playwright install; the API mirrors the Node version closely.

Using Playwright with Apify Crawlee

Crawlee is an open-source crawling and browser-automation library that wraps Playwright (and Puppeteer) with queues, retries, session rotation, storage, and scaling hooks—the pieces raw scripts usually bolt on by hand.

On Apify you run Crawlee-based Actors on managed browsers and proxies, with scheduling, webhooks, and datasets for JSON/CSV export. That is the practical path from “works on my laptop” to scheduled production scraping without operating your own grid.

Concrete next step: create a free Apify account, start from a Playwright + Crawlee Actor template in the store, and route output to a dataset for downstream pipelines.

Run browser scraping on Apify’s free plan →

Apify Affiliate Banner 728x90Apify Affiliate Banner 728x90Apify Affiliate Banner 300x50Apify Affiliate Banner 300x50
Frequently Asked Questions

For typical DOM automation, yes: WebDriver’s HTTP round-trip model and manual waiting usually lose wall-clock time versus WebSocket drivers. Absolute numbers depend on the site, concurrency, and proxies—benchmark your own URLs.

Either works. Playwright adds stronger auto-waiting, richer fixtures, and an easier path to Firefox/WebKit if requirements change. Puppeteer stays valid for small Chromium-only services.

No. It remains relevant for WebDriver-centric test orgs and multi-language grids. For new data-extraction projects, Playwright (often via Crawlee) is usually the better default.

They solve different layers. Scrapy excels at large-scale static HTML crawling. Use Playwright when JavaScript rendering, login flows, or complex UI interactions are required—often as a downstream fetcher in a hybrid design.

Use Crawlee for queueing, retries, and sessions; run on Apify for infrastructure, proxies, and datasets. Avoid unbounded parallel browser instances on a single VM—bound concurrency and memory.

No driver is invisible. Combine realistic concurrency, residential/datacenter proxy strategy, and site-appropriate throttling. For hard targets, evaluate dedicated unblocking products such as Bright Data: https://get.brightdata.com/8xa6yqyp2zxn

Start building: Apify free plan ($5 monthly platform credits) → · Browse Playwright/Crawlee templates in the Store →