Best e-commerce scrapers on Apify

Q: Which Apify scraper works across the most e-commerce sites?

apify/e-commerce-scraping-tool covers Amazon, Walmart, eBay, Alibaba, Etsy, and many regional/local stores with one input schema. For a single marketplace, the dedicated Actor returns richer fields: ASIN and Buy Box for Amazon, usItemId and variantList for Walmart.

Q: Do I need residential proxies for Amazon?

Yes, effectively. Datacenter IPs increasingly return listings with missing prices or wrong currency. The junglee Actor auto-selects proxy country from the domain. Residential proxy is billed separately from Actor compute, so include it in your $/1k estimate.

Q: How much does it cost to track 500 SKUs hourly?

On Amazon at $3/1k results: 500 × 24 × 30 ≈ 360k results/month ≈ $1,080 Actor cost plus residential proxy. On Shopify storefronts the same volume is under $10 because the data is public JSON. Drop fields you don't need (reviews, variants) to keep long-running price schedules cheap.

Q: Can I track prices over time?

Schedule the Actor, append each run to a table keyed by (stable_id, timestamp) where stable_id is asin for Amazon, usItemId for Walmart, or variant SKU for Shopify. Graph deltas or webhook-alert on threshold breaks. Avoid running maximal-field Actors on a price schedule; you're paying for data you discard.

Q: Do these Actors handle CAPTCHAs?

Maintained Actors bundle browser automation, retries, and proxy rotation that absorb most soft challenges. Hard CAPTCHA rates depend on proxy quality and concurrency more than the Actor code. Lower maxConcurrency and switch to residential before adding a solver.

Q: Where do I browse all e-commerce Actors?

Apify Store, ecommerce category: https://apify.com/store/categories?search=ecommerce&fpr=use-apify

Pick the Actor by retailer, not by hype. Amazon needs Buy Box logic and residential proxies; Shopify is a JSON endpoint away; Walmart punishes datacenter IPs; eBay exposes auction mechanics you can't get from a generic scraper. The Actors below are the ones I actually schedule on Apify for weekly price and catalog pulls.

Quick Answer

For a single marketplace, use the dedicated Actor: you get richer fields (ASIN, Buy Box, usItemId, variantList). For a mixed SKU map spanning 3+ retailers, the E-commerce Scraping Tool trades some field depth for one input schema.

Browse the full category: E-commerce Actors →

Comparison table

Actor	Platform	Pricing	Typical fields	Best for
Amazon Product Scraper	Amazon (all locales)	from $3 / 1k results	ASIN, price, listPrice, variants, Buy Box, starsBreakdown, shipping	Catalog + dynamic pricing
Amazon Reviews Scraper	Amazon reviews	from $3 / 1k reviews	Stars, text, verified flag, date, images, reactions	VOC, defect detection
E-commerce Scraping Tool	Amazon, Walmart, eBay, Alibaba, Etsy, regional	Pay-per-event	Name, price+currency, SKU/MPN/GTIN/EAN/UPC, stock, rating	Cross-retailer basket
eBay Items Scraper	eBay	$50 / month + usage	Price, wasPrice, seller, condition, auction/BIN	Resale, arbitrage
Walmart Product Detail Scraper	walmart.com/.ca/.com.mx	Pay-per-usage (~1,250 PDPs / $5 credit)	usItemId, priceInfo, variantList, sellerId, availability	Walmart PDP ingestion
Shopify Scraper	Any Shopify storefront	$5 / month + usage	Title, description, price, SKU, variants, stock, currency-normalized	DTC competitor tracking
Etsy Scraper	Etsy	$30 / month + usage	Listing, variation prices, shop, rating	Handmade / long-tail

User counts and ratings drift, so open each listing before committing to a schedule.

Cost per 1,000 products (rule of thumb)

Stack	Pricing model	~$/1k products
Amazon (junglee)	per-result	~$3
Amazon Reviews (junglee)	per-review	~$3 / 1k reviews (≠ 1k products)
Multi-site (apify/e-commerce)	pay-per-event	varies by retailer; Amazon/Walmart are pricier than regional stores
Shopify (autofacts)	subscription + compute	cents per 1k once the $5 base is covered; Shopify's JSON endpoint is cheap
Walmart (e-commerce)	pay-per-usage	~$4 / 1k PDPs on Starter tier

Shopify is the outlier on the cheap side because the data is literally a public JSON endpoint. Amazon and Walmart are the expensive end because you're paying for proxies and browser time, not just parsing.

Which Actor for which job

Amazon: use the dedicated stack, not the generic one

junglee/amazon-crawler returns the fields an Amazon operator actually cares about: asin, price.value, listPrice, priceVariants, starsBreakdown, seller, shipping. The multi-site tool will give you price and rating, but it won't give you the Buy Box seller or variant-level pricing that drives repricing logic. Pair with junglee/amazon-reviews-scraper when you need review text; the product Actor omits review bodies to keep runs fast.

Residential proxy is not optional on Amazon. Datacenter IPs increasingly return listings without prices or with the wrong marketplace currency. The junglee Actor auto-picks proxy country from the domain; override apifyProxyCountry only if you're deliberately geo-testing.

eBay: auction state matters

Generic scrapers flatten eBay listings into "price." That's wrong for auctions: you need price, wasPrice, bid count, and BIN flag separately. dtrungtin/ebay-items-scraper surfaces those. It's a subscription Actor ($50/mo flat + usage), so it only pencils out above a few thousand listings/month; for smaller jobs, the multi-site tool is cheaper even with shallower fields.

Walmart: proxy quality is the whole game

Walmart's bot defense is more aggressive than Amazon's on datacenter ranges. e-commerce/walmart-product-detail-scraper is maintained by Apify's internal team and bundles the right proxy config. The output has priceInfo.price, priceInfo.wasPrice, variantList[] with per-variant availabilityStatus, sellerId, and usItemId (Walmart's ASIN equivalent). For search-listing ingestion, pair with e-commerce/walmart-fast-product-scraper.

Shopify: it's a JSON endpoint, so stop overpaying

Every Shopify store exposes /products.json. autofacts/shopify wraps that with pagination, currency normalization, and a monitoring mode. You don't need a headless browser here; runs are fast and cheap. The ceiling is ~5,000 products per store via the public endpoint; beyond that you'll need explicit collection URLs.

Etsy: handmade = variation prices

Etsy listings routinely have 10+ variants with different prices. epctex/etsy-scraper exposes includeVariationPrices: turn it on if you're benchmarking or reselling, off if you only need headline listing prices (runtime drops sharply).

Mixed basket: the multi-site Actor

If you're tracking 200 SKUs spread across Amazon, Walmart, and three Shopify DTC brands, maintaining six different schemas and six different proxy configs burns engineering time. apify/e-commerce-scraping-tool accepts product URLs from any supported domain and returns a normalized schema (name, price, priceCurrency, SKU/MPN/GTIN/EAN/UPC, stock, rating). Trade-off: you lose Amazon's Buy Box field and Walmart's variantList structure.

Price tracking vs one-shot extraction

These are different jobs with different Actor choices.

One-shot catalog build (competitor teardown, PIM seed, MAP baseline): run once, capture everything, export to Sheets/BigQuery. Use the richest Actor you can afford; field coverage matters more than run cost.

Price tracking over time (dynamic pricing, MAP enforcement, availability alerts): run on a schedule, append to a table keyed by (asin_or_sku, timestamp), compute deltas. Run cost matters more than field depth: you only need price, availability, maybe sellerId. Budget math: 500 SKUs × 24 runs/day × 30 days = 360k calls/month. At $3/1k that's $1,080/month on Amazon; the same volume on a Shopify store is under $10.

A common mistake: running the maximal-field Actor on a daily price schedule. You're paying for review scraping and variant expansion every day when you only need price. For long-running price jobs, pick the minimal Actor.

Proxy requirements by retailer

Retailer	Minimum proxy	Notes
Amazon	Residential, country-matched	Datacenter IPs return listings without prices
Walmart	Residential	Triggers interstitial on datacenter
eBay	Datacenter often works	Auction/search pages are relatively open
Shopify (public JSON)	None required	`/products.json` is meant to be public
Etsy	Datacenter usually works	Bump to residential if you hit rate limits

Apify Proxy with residential IPs is billed separately from Actor compute, so factor it into the $/1k math; it's often larger than the Actor fee itself.

Field coverage cheatsheet

What you should expect from a well-built retailer Actor vs a generic one.

Field	Why it matters	Amazon (junglee)	Walmart (e-commerce)	Multi-site
Stable product ID	Joining across runs	`asin`	`usItemId`	Mixed (SKU/MPN)
Buy Box seller	Repricing, counterfeit detection	✅	✅ (`sellerId`)	❌
Variant-level prices	Size/color pricing	✅ (`priceVariants`)	✅ (`variantList[]`)	❌
Was-price / list price	Promo detection	✅ (`listPrice`)	✅ (`wasPrice`)	Partial
Stars breakdown	Defect analysis	✅ (`starsBreakdown`)	Avg only	Avg only
Availability	Stock-out alerts	✅	✅	✅

If you need joinable IDs for a longitudinal dataset, the retailer-specific Actors are worth the premium.

Workflows I actually run

Weekly Amazon catalog sweep. junglee/amazon-crawler on a category or ASIN list → Apify dataset → Google Sheets integration. Pull asin, title, price.value, listPrice.value, seller, inStock. Keyed on ASIN, one row per run.

Hourly MAP check on 200 hot SKUs. Schedule the same Actor with a tight SKU list every hour during business hours. Webhook to Slack when price.value < map_floor. Residential proxy, maxConcurrency: 10, scrapes finish in under 90 seconds.

DTC competitor price feed. autofacts/shopify pointed at 12 competitor storefronts, daily. Costs a few dollars a month total. Append to BigQuery, build a Metabase dashboard keyed on (vendor, handle, date).

Review sentiment on a launch. junglee/amazon-reviews-scraper for a week post-launch, filtered to 1–3 stars, piped through an LLM summarizer. filterByRatings: ["oneStar", "twoStar", "threeStar"] cuts volume by ~40%.

Compliance

Retailer ToS routinely prohibit automated access. Scraping public listings is generally defensible in the US (hiQ v. LinkedIn lineage), but legal outcomes depend on jurisdiction, authentication, and data type. Keep it to public pages, avoid PII from reviews unless you have a lawful basis, and don't scrape behind logins. See is web scraping legal? before standing up a production program.

Run your first commerce scrape

Start with Amazon Product Scraper on 10 ASINs. Confirm price.value, asin, and seller land in your dataset before scaling the schedule. Sign up for Apify and the first 1,250-ish products run on the monthly free credit.

Frequently Asked Questions

apify/e-commerce-scraping-tool covers Amazon, Walmart, eBay, Alibaba, Etsy, and many regional/local stores with one input schema. For a single marketplace, the dedicated Actor returns richer fields: ASIN and Buy Box for Amazon, usItemId and variantList for Walmart.

Yes, effectively. Datacenter IPs increasingly return listings with missing prices or wrong currency. The junglee Actor auto-selects proxy country from the domain. Residential proxy is billed separately from Actor compute, so include it in your $/1k estimate.

On Amazon at $3/1k results: 500 × 24 × 30 ≈ 360k results/month ≈ $1,080 Actor cost plus residential proxy. On Shopify storefronts the same volume is under $10 because the data is public JSON. Drop fields you don't need (reviews, variants) to keep long-running price schedules cheap.

Schedule the Actor, append each run to a table keyed by (stable_id, timestamp) where stable_id is asin for Amazon, usItemId for Walmart, or variant SKU for Shopify. Graph deltas or webhook-alert on threshold breaks. Avoid running maximal-field Actors on a price schedule; you're paying for data you discard.

Maintained Actors bundle browser automation, retries, and proxy rotation that absorb most soft challenges. Hard CAPTCHA rates depend on proxy quality and concurrency more than the Actor code. Lower maxConcurrency and switch to residential before adding a solver.

Apify Store, ecommerce category: https://apify.com/store/categories?search=ecommerce&fpr=use-apify

Common mistakes and fixes

Amazon returns empty price or Buy Box fields.

Set `apifyProxyCountry` to match the marketplace (US for amazon.com, GB for amazon.co.uk). Residential proxies are required on tight SKUs; datacenter IPs get stripped prices. Retry off-peak with lower concurrency.

Different HTML across Amazon domains.

Pin one marketplace per run (don't mix amazon.com and amazon.de in the same input). ASINs are marketplace-scoped; the same ASIN can resolve to different listings per region.

Shopify store returns partial catalogs.

Shopify's public `/products.json` endpoint paginates at 250/page and caps around ~5k products per store. For larger catalogs, pass collection URLs explicitly or chunk by vendor/tag filters.

Walmart returns 'Robot or human?' interstitial.

Walmart is aggressive on datacenter IPs; switch to residential proxy and cap concurrency at 5–10. Walmart Product Detail Scraper handles this automatically when proxy is enabled.

Comparison table​

Cost per 1,000 products (rule of thumb)​

Which Actor for which job​

Amazon: use the dedicated stack, not the generic one​

eBay: auction state matters​

Walmart: proxy quality is the whole game​

Shopify: it's a JSON endpoint, so stop overpaying​

Etsy: handmade = variation prices​

Mixed basket: the multi-site Actor​

Price tracking vs one-shot extraction​

Proxy requirements by retailer​

Field coverage cheatsheet​

Workflows I actually run​

Compliance​