use-apify.com
Web scraping: guides & tutorials
Extract structured data from websites with code: crawling, parsing, and anti-bot handling for engineers building datasets and automations on Apify.
117 articlesPage 1 of 12
View all tags
Web scraping turns public web pages into structured datasets you can analyze, monitor, or feed into AI. These guides cover the full workflow: choosing between HTTP requests and headless browsers, parsing HTML with CSS or XPath selectors, handling pagination and infinite scroll, and getting past rate limits and bot detection without breaking sites or laws.
Whether you write your own scraper in Python or JavaScript or run a ready-made actor from the Apify Store, the patterns are the same. Start with a small, well-behaved crawl, add proxies and retries as targets get stricter, and export clean JSON or CSV your pipeline can trust. The tutorials below take you from a first script to production crawls running on a schedule.

Bright Data and ScraperAPI both solve the same core problem: getting your scraper past anti-bot systems. But they solve it very differently.
ScraperAPI is a lightweight proxy pass-through API. Send a URL, get HTML back. Simple, cheap at small scale, and you own the scraper logic.
Bright Data is an enterprise proxy network plus managed datasets and a cloud browser. More powerful unblocking, more features, higher price tag.
This is a split-decision comparison. Neither is universally "better" — it depends on your volume, target sites, and budget.

Most proxy vendors make you choose: raw IPs at good prices from a scrappy provider, or a polished dashboard from an enterprise vendor charging three times more. Proxy-Seller has spent a decade building a third option: all five proxy classes (datacenter IPv4/IPv6, ISP, residential, mobile) under one account, at prices that stay competitive even against single-category specialists, with compliance certs that survive legal review.
They've been running since 2014. 500,000+ clients. Every IP exclusively yours, never shared. The residential pool spans 20M+ IPs across 220+ countries.
Is it the right pick for your stack? That depends on whether you want raw IPs you control, or a managed extraction layer someone else runs. This review walks through verified pricing for all five proxy types, the three discount mechanisms that stack, where Proxy-Seller is genuinely strong, and two scenarios where a different provider will serve you better.

OpenClaw is a self-hosted AI assistant gateway: it connects chat channels (Telegram, Discord, web, and more) and tools to an LLM you choose—often Ollama or vLLM on your own hardware, or a cloud API when you accept that tradeoff. It is not a foundation model; it is orchestration you run yourself.
In March 2026 the project drew unusual attention—including a milestone our editors cited in the weekly roundup (Top 10 AI and tech stories this week). This is time-stamped commentary, not a substitute for upstream docs: channel lists, defaults, and feature names change; confirm behavior, licensing, and security advisories in the official project before production. The piece separates what that attention reflects from what still depends on your own ops discipline, and shows where OpenClaw sits next to local inference, workflow automation, and data collection layers.

Clay (now Mesh) does a lot of the heavy lifting when you connect email, calendar, LinkedIn, and Twitter. What it won’t do on its own is keep polling the open web forever: enrichment tends to reflect what was true when the contact landed in your book, not every headline or title change afterward.
Apify is where scheduled scraping helps — job moves, company news, fresh posts, GitHub activity — then you fold those findings back into Mesh as notes or updates.
Here are three workflows that combine the two without pretending there’s a single “native” button for it.

Amazon is the primary source for product pricing, review sentiment, and competitive research. Scraping it manually is notoriously difficult — Amazon deploys heavy bot protection, JavaScript rendering, and geo-pricing.
Apify's Amazon scrapers handle all of this with residential proxies, CAPTCHA solving, and structured output. No code required.
Legal note: Amazon ToS prohibits unauthorized scraping. Only scrape publicly displayed pricing data for research, price comparison, and competitive intelligence. Never create accounts programmatically or access private data.

An Apify Actor is a serverless scraper or automation packaged for cloud execution. You write standard Node.js code, push it to Apify, and it runs on demand — with built-in proxies, storage, scheduling, and API access included.
This tutorial takes you from an empty folder to a deployed, runnable Actor in about 20 minutes.
Freshness note: Steps verified with Apify CLI 3.x and Apify SDK 3.x (March 2026).