Best e-commerce scrapers on Apify
Pick the Actor by retailer, not by hype. Amazon needs Buy Box logic and residential proxies; Shopify is a JSON endpoint away; Walmart punishes datacenter IPs; eBay exposes auction mechanics you can't get from a generic scraper. The Actors below are the ones I actually schedule on Apify for weekly price and catalog pulls.
For a single marketplace, use the dedicated Actor: you get richer fields (ASIN, Buy Box, usItemId, variantList). For a mixed SKU map spanning 3+ retailers, the E-commerce Scraping Tool trades some field depth for one input schema.
Browse the full category: E-commerce Actors →
Comparison table
| Actor | Platform | Pricing | Typical fields | Best for |
|---|---|---|---|---|
| Amazon Product Scraper | Amazon (all locales) | from $3 / 1k results | ASIN, price, listPrice, variants, Buy Box, starsBreakdown, shipping | Catalog + dynamic pricing |
| Amazon Reviews Scraper | Amazon reviews | from $3 / 1k reviews | Stars, text, verified flag, date, images, reactions | VOC, defect detection |
| E-commerce Scraping Tool | Amazon, Walmart, eBay, Alibaba, Etsy, regional | Pay-per-event | Name, price+currency, SKU/MPN/GTIN/EAN/UPC, stock, rating | Cross-retailer basket |
| eBay Items Scraper | eBay | $50 / month + usage | Price, wasPrice, seller, condition, auction/BIN | Resale, arbitrage |
| Walmart Product Detail Scraper | walmart.com/.ca/.com.mx | Pay-per-usage (~1,250 PDPs / $5 credit) | usItemId, priceInfo, variantList, sellerId, availability | Walmart PDP ingestion |
| Shopify Scraper | Any Shopify storefront | $5 / month + usage | Title, description, price, SKU, variants, stock, currency-normalized | DTC competitor tracking |
| Etsy Scraper | Etsy | $30 / month + usage | Listing, variation prices, shop, rating | Handmade / long-tail |
User counts and ratings drift, so open each listing before committing to a schedule.
Cost per 1,000 products (rule of thumb)
| Stack | Pricing model | ~$/1k products |
|---|---|---|
| Amazon (junglee) | per-result | ~$3 |
| Amazon Reviews (junglee) | per-review | ~$3 / 1k reviews (≠ 1k products) |
| Multi-site (apify/e-commerce) | pay-per-event | varies by retailer; Amazon/Walmart are pricier than regional stores |
| Shopify (autofacts) | subscription + compute | cents per 1k once the $5 base is covered; Shopify's JSON endpoint is cheap |
| Walmart (e-commerce) | pay-per-usage | ~$4 / 1k PDPs on Starter tier |
Shopify is the outlier on the cheap side because the data is literally a public JSON endpoint. Amazon and Walmart are the expensive end because you're paying for proxies and browser time, not just parsing.
Which Actor for which job
Amazon: use the dedicated stack, not the generic one
junglee/amazon-crawler returns the fields an Amazon operator actually cares about: asin, price.value, listPrice, priceVariants, starsBreakdown, seller, shipping. The multi-site tool will give you price and rating, but it won't give you the Buy Box seller or variant-level pricing that drives repricing logic. Pair with junglee/amazon-reviews-scraper when you need review text; the product Actor omits review bodies to keep runs fast.
Residential proxy is not optional on Amazon. Datacenter IPs increasingly return listings without prices or with the wrong marketplace currency. The junglee Actor auto-picks proxy country from the domain; override apifyProxyCountry only if you're deliberately geo-testing.
eBay: auction state matters
Generic scrapers flatten eBay listings into "price." That's wrong for auctions: you need price, wasPrice, bid count, and BIN flag separately. dtrungtin/ebay-items-scraper surfaces those. It's a subscription Actor ($50/mo flat + usage), so it only pencils out above a few thousand listings/month; for smaller jobs, the multi-site tool is cheaper even with shallower fields.
Walmart: proxy quality is the whole game
Walmart's bot defense is more aggressive than Amazon's on datacenter ranges. e-commerce/walmart-product-detail-scraper is maintained by Apify's internal team and bundles the right proxy config. The output has priceInfo.price, priceInfo.wasPrice, variantList[] with per-variant availabilityStatus, sellerId, and usItemId (Walmart's ASIN equivalent). For search-listing ingestion, pair with e-commerce/walmart-fast-product-scraper.
Shopify: it's a JSON endpoint, so stop overpaying
Every Shopify store exposes /products.json. autofacts/shopify wraps that with pagination, currency normalization, and a monitoring mode. You don't need a headless browser here; runs are fast and cheap. The ceiling is ~5,000 products per store via the public endpoint; beyond that you'll need explicit collection URLs.
Etsy: handmade = variation prices
Etsy listings routinely have 10+ variants with different prices. epctex/etsy-scraper exposes includeVariationPrices: turn it on if you're benchmarking or reselling, off if you only need headline listing prices (runtime drops sharply).
Mixed basket: the multi-site Actor
If you're tracking 200 SKUs spread across Amazon, Walmart, and three Shopify DTC brands, maintaining six different schemas and six different proxy configs burns engineering time. apify/e-commerce-scraping-tool accepts product URLs from any supported domain and returns a normalized schema (name, price, priceCurrency, SKU/MPN/GTIN/EAN/UPC, stock, rating). Trade-off: you lose Amazon's Buy Box field and Walmart's variantList structure.
Price tracking vs one-shot extraction
These are different jobs with different Actor choices.
One-shot catalog build (competitor teardown, PIM seed, MAP baseline): run once, capture everything, export to Sheets/BigQuery. Use the richest Actor you can afford; field coverage matters more than run cost.
Price tracking over time (dynamic pricing, MAP enforcement, availability alerts): run on a schedule, append to a table keyed by (asin_or_sku, timestamp), compute deltas. Run cost matters more than field depth: you only need price, availability, maybe sellerId. Budget math: 500 SKUs × 24 runs/day × 30 days = 360k calls/month. At $3/1k that's $1,080/month on Amazon; the same volume on a Shopify store is under $10.
A common mistake: running the maximal-field Actor on a daily price schedule. You're paying for review scraping and variant expansion every day when you only need price. For long-running price jobs, pick the minimal Actor.
Proxy requirements by retailer
| Retailer | Minimum proxy | Notes |
|---|---|---|
| Amazon | Residential, country-matched | Datacenter IPs return listings without prices |
| Walmart | Residential | Triggers interstitial on datacenter |
| eBay | Datacenter often works | Auction/search pages are relatively open |
| Shopify (public JSON) | None required | /products.json is meant to be public |
| Etsy | Datacenter usually works | Bump to residential if you hit rate limits |
Apify Proxy with residential IPs is billed separately from Actor compute, so factor it into the $/1k math; it's often larger than the Actor fee itself.
Field coverage cheatsheet
What you should expect from a well-built retailer Actor vs a generic one.
| Field | Why it matters | Amazon (junglee) | Walmart (e-commerce) | Multi-site |
|---|---|---|---|---|
| Stable product ID | Joining across runs | asin | usItemId | Mixed (SKU/MPN) |
| Buy Box seller | Repricing, counterfeit detection | ✅ | ✅ (sellerId) | ❌ |
| Variant-level prices | Size/color pricing | ✅ (priceVariants) | ✅ (variantList[]) | ❌ |
| Was-price / list price | Promo detection | ✅ (listPrice) | ✅ (wasPrice) | Partial |
| Stars breakdown | Defect analysis | ✅ (starsBreakdown) | Avg only | Avg only |
| Availability | Stock-out alerts | ✅ | ✅ | ✅ |
If you need joinable IDs for a longitudinal dataset, the retailer-specific Actors are worth the premium.
Workflows I actually run
Weekly Amazon catalog sweep. junglee/amazon-crawler on a category or ASIN list → Apify dataset → Google Sheets integration. Pull asin, title, price.value, listPrice.value, seller, inStock. Keyed on ASIN, one row per run.
Hourly MAP check on 200 hot SKUs. Schedule the same Actor with a tight SKU list every hour during business hours. Webhook to Slack when price.value < map_floor. Residential proxy, maxConcurrency: 10, scrapes finish in under 90 seconds.
DTC competitor price feed. autofacts/shopify pointed at 12 competitor storefronts, daily. Costs a few dollars a month total. Append to BigQuery, build a Metabase dashboard keyed on (vendor, handle, date).
Review sentiment on a launch. junglee/amazon-reviews-scraper for a week post-launch, filtered to 1–3 stars, piped through an LLM summarizer. filterByRatings: ["oneStar", "twoStar", "threeStar"] cuts volume by ~40%.
Compliance
Retailer ToS routinely prohibit automated access. Scraping public listings is generally defensible in the US (hiQ v. LinkedIn lineage), but legal outcomes depend on jurisdiction, authentication, and data type. Keep it to public pages, avoid PII from reviews unless you have a lawful basis, and don't scrape behind logins. See is web scraping legal? before standing up a production program.
Start with Amazon Product Scraper on 10 ASINs. Confirm price.value, asin, and seller land in your dataset before scaling the schedule. Sign up for Apify and the first 1,250-ish products run on the monthly free credit.
apify/e-commerce-scraping-tool covers Amazon, Walmart, eBay, Alibaba, Etsy, and many regional/local stores with one input schema. For a single marketplace, the dedicated Actor returns richer fields: ASIN and Buy Box for Amazon, usItemId and variantList for Walmart.
Yes, effectively. Datacenter IPs increasingly return listings with missing prices or wrong currency. The junglee Actor auto-selects proxy country from the domain. Residential proxy is billed separately from Actor compute, so include it in your $/1k estimate.
On Amazon at $3/1k results: 500 × 24 × 30 ≈ 360k results/month ≈ $1,080 Actor cost plus residential proxy. On Shopify storefronts the same volume is under $10 because the data is public JSON. Drop fields you don't need (reviews, variants) to keep long-running price schedules cheap.
Schedule the Actor, append each run to a table keyed by (stable_id, timestamp) where stable_id is asin for Amazon, usItemId for Walmart, or variant SKU for Shopify. Graph deltas or webhook-alert on threshold breaks. Avoid running maximal-field Actors on a price schedule; you're paying for data you discard.
Maintained Actors bundle browser automation, retries, and proxy rotation that absorb most soft challenges. Hard CAPTCHA rates depend on proxy quality and concurrency more than the Actor code. Lower maxConcurrency and switch to residential before adding a solver.
Apify Store, ecommerce category: https://apify.com/store/categories?search=ecommerce&fpr=use-apify
Common mistakes and fixes
Amazon returns empty price or Buy Box fields.
Set `apifyProxyCountry` to match the marketplace (US for amazon.com, GB for amazon.co.uk). Residential proxies are required on tight SKUs; datacenter IPs get stripped prices. Retry off-peak with lower concurrency.
Different HTML across Amazon domains.
Pin one marketplace per run (don't mix amazon.com and amazon.de in the same input). ASINs are marketplace-scoped; the same ASIN can resolve to different listings per region.
Shopify store returns partial catalogs.
Shopify's public `/products.json` endpoint paginates at 250/page and caps around ~5k products per store. For larger catalogs, pass collection URLs explicitly or chunk by vendor/tag filters.
Walmart returns 'Robot or human?' interstitial.
Walmart is aggressive on datacenter IPs; switch to residential proxy and cap concurrency at 5–10. Walmart Product Detail Scraper handles this automatically when proxy is enabled.



