Skip to main content

Rotating Proxies and Sessions with Apify and Crawlee

Proxy rotation is the single most effective defense against IP-based blocking. Crawlee, Apify's open-source crawling library, wires rotation into every crawler by default: useSessionPool is true, so a fresh session (and therefore a fresh proxy URL) is selected for each request unless you opt into stickiness.

This page is the practical reference for configuring ProxyConfiguration and the session pool in Crawlee, using Apify Proxy as the backend.


Why rotate proxies?

Without rotation, every request from a run shares one IP. Sites track per-IP request rates and block IPs that cross a threshold, often within minutes of aggressive traffic.

Rotation spreads requests across many IPs so no single address burns through its quota.

Two rotation modes:

  • Per-request rotation: a different IP per request. Best for independent URLs (product pages, search results).
  • Sticky sessions: the same IP for a sequence of requests tied to a session.id. Required for login flows, carts, and paginated state.

Setting Up Proxy Rotation in Crawlee

Basic ProxyConfiguration

import { CheerioCrawler, ProxyConfiguration } from 'crawlee';

const crawler = new CheerioCrawler({
proxyConfiguration: new ProxyConfiguration({
proxyUrls: [
'http://user:pass@proxy1.example.com:8080',
'http://user:pass@proxy2.example.com:8080',
],
}),
async requestHandler({ $ }) {
// Your scraping logic
},
});

Using Apify Proxy

Inside an Actor, call Actor.createProxyConfiguration(). It reads credentials from the environment and returns a ProxyConfiguration that plugs into any Crawlee crawler:

import { Actor } from 'apify';
import { CheerioCrawler } from 'crawlee';

await Actor.init();

const proxyConfiguration = await Actor.createProxyConfiguration({
groups: ['RESIDENTIAL'], // 'RESIDENTIAL' or a datacenter group name
countryCode: 'US', // Optional: geo-target requests
});

const crawler = new CheerioCrawler({
proxyConfiguration,
async requestHandler({ $, request }) {
// Your scraping logic
},
});

await crawler.run(['https://example.com']);
await Actor.exit();

Proxy product options:

Productgroups valueSpeedDetectabilityBest for
Datacenter shared(omit, or your datacenter group)FastHigherLow-protection sites, high-volume scraping
Residential['RESIDENTIAL']SlowerLowerProtected sites, login flows, social media
Google SERPseparate apify_proxy_groups=GOOGLE_SERP endpointModerateLowGoogle Search scraping

Google SERP proxy is a distinct product with its own hostname and billing (see the Apify Proxy docs), not a group value you pass to datacenter/residential configurations.


Sticky sessions

Per-request rotation breaks login sessions and multi-step workflows. For those, enable the session pool (on by default) and let Crawlee bind each Session to a specific proxy URL. Crawlee calls proxyConfiguration.newUrl(session.id) under the hood, so every request that reuses a session.id gets the same upstream IP until the session is retired.

import { Actor } from 'apify';
import { PlaywrightCrawler } from 'crawlee';

await Actor.init();

const proxyConfiguration = await Actor.createProxyConfiguration({
groups: ['RESIDENTIAL'],
});

const crawler = new PlaywrightCrawler({
proxyConfiguration,
useSessionPool: true, // default: true
persistCookiesPerSession: true, // default: true
sessionPoolOptions: {
maxPoolSize: 20,
sessionOptions: {
maxUsageCount: 50, // retire a session after 50 requests
maxErrorScore: 3, // retire once the error score reaches 3
},
},
async requestHandler({ page, session }) {
// session.id is stable for the lifetime of this session,
// which pins the proxy IP for every request that reuses it.
},
});

await crawler.run(['https://example.com/login']);
await Actor.exit();

Do not call SessionPool.open() yourself and then hand it to a crawler. Crawlers construct their own pool from sessionPoolOptions. Use the standalone API only when driving HTTP calls outside a crawler (got-scraping, fetch, etc.).


Ban detection and recovery

Crawlee auto-retires sessions whose response status matches blockedStatusCodes (default [401, 403, 429]) and retries the request with a fresh session, which (combined with sticky-session routing) means a fresh proxy IP.

Extend the blocklist and add content-based checks:

const crawler = new CheerioCrawler({
proxyConfiguration,
sessionPoolOptions: {
blockedStatusCodes: [401, 403, 429, 503],
},
async requestHandler({ $, session }) {
// Content-based ban check for soft blocks (200 OK + interstitial).
if ($('title').text().includes('Access Denied')) {
session.retire();
throw new Error('Access denied page — retiring session');
}
},
});

Throwing after session.retire() is what forces Crawlee to re-queue the request; retiring alone just flags the session for disposal.

Retry budget:

const crawler = new CheerioCrawler({
maxRequestRetries: 3, // retries per request across sessions
requestHandlerTimeoutSecs: 60,
// Each retry draws a new session (and a new proxy URL) from the pool.
});

Geo-Targeting

Route requests through a specific country's proxy pool to get localized content:

const proxyConfiguration = await Actor.createProxyConfiguration({
groups: ['RESIDENTIAL'],
countryCode: 'DE', // Germany — see localized pricing, language, and content
});

Country targeting use cases:

  • E-commerce price monitoring across regions
  • Checking localized search results
  • Verifying geo-restricted content availability
  • Price discrimination detection (same product, different prices by country)

Rotation with Crawlee for Python

Crawlee for Python mirrors the JS API: construct a ProxyConfiguration, hand it to the crawler, and use the router to register handlers.

import asyncio
from crawlee.crawlers import HttpCrawler, HttpCrawlingContext
from crawlee.proxy_configuration import ProxyConfiguration

async def main():
proxy_config = ProxyConfiguration(
proxy_urls=[
'http://user:pass@proxy1.example.com:8080',
'http://user:pass@proxy2.example.com:8080',
],
)

crawler = HttpCrawler(proxy_configuration=proxy_config)

@crawler.router.default_handler
async def handler(context: HttpCrawlingContext) -> None:
context.log.info(f'Processing {context.request.url}')

await crawler.run(['https://example.com'])

asyncio.run(main())

Inside an Actor, skip the URL list and use the Apify-backed factory. It injects the Actor's proxy password and respects groups and country_code:

import asyncio
from apify import Actor
from crawlee.crawlers import HttpCrawler, HttpCrawlingContext

async def main() -> None:
async with Actor:
proxy_configuration = await Actor.create_proxy_configuration(
groups=['RESIDENTIAL'],
country_code='US',
)
crawler = HttpCrawler(proxy_configuration=proxy_configuration)

@crawler.router.default_handler
async def handler(context: HttpCrawlingContext) -> None:
context.log.info(f'Processing {context.request.url}')

await crawler.run(['https://example.com'])

asyncio.run(main())

Best Practices

  1. Start with datacenter proxies for testing. They're fast and cheap. Switch to residential only when the site blocks datacenter ASNs.
  2. Set maxUsageCount on sessions. A session used for 500 requests starts accumulating behavioral signals. Retire it earlier.
  3. Don't retry immediately on ban. Add a random delay (2–10 seconds) before the retry. Immediate retries look like bot behavior.
  4. Monitor ban rates per proxy group. If your datacenter ban rate exceeds 20%, switch to residential. If residential is still failing, use a CAPTCHA solver or try Apify's managed Actors for the target site.
  5. Use geo-targeting strategically. Routing US requests through a US proxy reduces geographic suspicion signals.
  6. Log session retirement rates. High retirement rates signal the site upgraded its protection, so it's time to adjust your strategy.

Explore Apify Proxy →

Start building for free →

Apify Affiliate Banner 728x90Apify Affiliate Banner 728x90Apify Affiliate Banner 300x50Apify Affiliate Banner 300x50

Frequently Asked Questions

Proxy rotation is the practice of sending each HTTP request through a different IP address to avoid per-IP rate limits and bans. In Crawlee, ProxyConfiguration handles selection and rotation automatically: you provide the proxy pool and the SDK picks a different IP for each request (or maintains sticky sessions for multi-step flows).

Use per-request rotation (default) for scraping independent URLs where each request stands alone: product listings, search results, profile pages. Use sticky sessions when you need to maintain state: login flows, shopping carts, paginated lists that require consistent session cookies, or any interaction where the server tracks session continuity.

Crawlee checks HTTP response status codes against a configurable blocklist (default: 401, 403, 429). When a banned status is detected, the current session is retired and a new session with a fresh proxy is assigned. You can extend this with custom logic that checks page content for CAPTCHA markers or access-denied messages.

Apify provides three proxy pools: DATACENTER (fast, economical, suitable for non-hardened targets), RESIDENTIAL (real user IPs, harder to detect, for protected sites), and GOOGLE_SERP (dedicated pool for Google Search scraping). You access them through Actor.createProxyConfiguration() with the groups parameter.

Datacenter proxies work for the majority of sites that use only IP rate limiting as their defense. You need residential proxies when the site uses ASN-level blocking (flagging all traffic from data center IP ranges), or when fingerprinting systems detect that the IP has no normal browsing history. Start with datacenter and switch to residential only when necessary, since the cost difference is significant.

Apify Affiliate Banner 728x90Apify Affiliate Banner 728x90Apify Affiliate Banner 300x50Apify Affiliate Banner 300x50