Skip to main content

Best e-commerce scrapers on Apify

Pick the Actor by retailer, not by hype. Amazon needs Buy Box logic and residential proxies; Shopify is a JSON endpoint away; Walmart punishes datacenter IPs; eBay exposes auction mechanics you can't get from a generic scraper. The Actors below are the ones I actually schedule on Apify for weekly price and catalog pulls.

Quick Answer

For a single marketplace, use the dedicated Actor: you get richer fields (ASIN, Buy Box, usItemId, variantList). For a mixed SKU map spanning 3+ retailers, the E-commerce Scraping Tool trades some field depth for one input schema.

Browse the full category: E-commerce Actors →

Comparison table

ActorPlatformPricingTypical fieldsBest for
Amazon Product ScraperAmazon (all locales)from $3 / 1k resultsASIN, price, listPrice, variants, Buy Box, starsBreakdown, shippingCatalog + dynamic pricing
Amazon Reviews ScraperAmazon reviewsfrom $3 / 1k reviewsStars, text, verified flag, date, images, reactionsVOC, defect detection
E-commerce Scraping ToolAmazon, Walmart, eBay, Alibaba, Etsy, regionalPay-per-eventName, price+currency, SKU/MPN/GTIN/EAN/UPC, stock, ratingCross-retailer basket
eBay Items ScrapereBay$50 / month + usagePrice, wasPrice, seller, condition, auction/BINResale, arbitrage
Walmart Product Detail Scraperwalmart.com/.ca/.com.mxPay-per-usage (~1,250 PDPs / $5 credit)usItemId, priceInfo, variantList, sellerId, availabilityWalmart PDP ingestion
Shopify ScraperAny Shopify storefront$5 / month + usageTitle, description, price, SKU, variants, stock, currency-normalizedDTC competitor tracking
Etsy ScraperEtsy$30 / month + usageListing, variation prices, shop, ratingHandmade / long-tail

User counts and ratings drift, so open each listing before committing to a schedule.

Cost per 1,000 products (rule of thumb)

StackPricing model~$/1k products
Amazon (junglee)per-result~$3
Amazon Reviews (junglee)per-review~$3 / 1k reviews (≠ 1k products)
Multi-site (apify/e-commerce)pay-per-eventvaries by retailer; Amazon/Walmart are pricier than regional stores
Shopify (autofacts)subscription + computecents per 1k once the $5 base is covered; Shopify's JSON endpoint is cheap
Walmart (e-commerce)pay-per-usage~$4 / 1k PDPs on Starter tier

Shopify is the outlier on the cheap side because the data is literally a public JSON endpoint. Amazon and Walmart are the expensive end because you're paying for proxies and browser time, not just parsing.

Which Actor for which job

Amazon: use the dedicated stack, not the generic one

junglee/amazon-crawler returns the fields an Amazon operator actually cares about: asin, price.value, listPrice, priceVariants, starsBreakdown, seller, shipping. The multi-site tool will give you price and rating, but it won't give you the Buy Box seller or variant-level pricing that drives repricing logic. Pair with junglee/amazon-reviews-scraper when you need review text; the product Actor omits review bodies to keep runs fast.

Residential proxy is not optional on Amazon. Datacenter IPs increasingly return listings without prices or with the wrong marketplace currency. The junglee Actor auto-picks proxy country from the domain; override apifyProxyCountry only if you're deliberately geo-testing.

eBay: auction state matters

Generic scrapers flatten eBay listings into "price." That's wrong for auctions: you need price, wasPrice, bid count, and BIN flag separately. dtrungtin/ebay-items-scraper surfaces those. It's a subscription Actor ($50/mo flat + usage), so it only pencils out above a few thousand listings/month; for smaller jobs, the multi-site tool is cheaper even with shallower fields.

Walmart: proxy quality is the whole game

Walmart's bot defense is more aggressive than Amazon's on datacenter ranges. e-commerce/walmart-product-detail-scraper is maintained by Apify's internal team and bundles the right proxy config. The output has priceInfo.price, priceInfo.wasPrice, variantList[] with per-variant availabilityStatus, sellerId, and usItemId (Walmart's ASIN equivalent). For search-listing ingestion, pair with e-commerce/walmart-fast-product-scraper.

Shopify: it's a JSON endpoint, so stop overpaying

Every Shopify store exposes /products.json. autofacts/shopify wraps that with pagination, currency normalization, and a monitoring mode. You don't need a headless browser here; runs are fast and cheap. The ceiling is ~5,000 products per store via the public endpoint; beyond that you'll need explicit collection URLs.

Etsy: handmade = variation prices

Etsy listings routinely have 10+ variants with different prices. epctex/etsy-scraper exposes includeVariationPrices: turn it on if you're benchmarking or reselling, off if you only need headline listing prices (runtime drops sharply).

Mixed basket: the multi-site Actor

If you're tracking 200 SKUs spread across Amazon, Walmart, and three Shopify DTC brands, maintaining six different schemas and six different proxy configs burns engineering time. apify/e-commerce-scraping-tool accepts product URLs from any supported domain and returns a normalized schema (name, price, priceCurrency, SKU/MPN/GTIN/EAN/UPC, stock, rating). Trade-off: you lose Amazon's Buy Box field and Walmart's variantList structure.

Price tracking vs one-shot extraction

These are different jobs with different Actor choices.

One-shot catalog build (competitor teardown, PIM seed, MAP baseline): run once, capture everything, export to Sheets/BigQuery. Use the richest Actor you can afford; field coverage matters more than run cost.

Price tracking over time (dynamic pricing, MAP enforcement, availability alerts): run on a schedule, append to a table keyed by (asin_or_sku, timestamp), compute deltas. Run cost matters more than field depth: you only need price, availability, maybe sellerId. Budget math: 500 SKUs × 24 runs/day × 30 days = 360k calls/month. At $3/1k that's $1,080/month on Amazon; the same volume on a Shopify store is under $10.

A common mistake: running the maximal-field Actor on a daily price schedule. You're paying for review scraping and variant expansion every day when you only need price. For long-running price jobs, pick the minimal Actor.

Proxy requirements by retailer

RetailerMinimum proxyNotes
AmazonResidential, country-matchedDatacenter IPs return listings without prices
WalmartResidentialTriggers interstitial on datacenter
eBayDatacenter often worksAuction/search pages are relatively open
Shopify (public JSON)None required/products.json is meant to be public
EtsyDatacenter usually worksBump to residential if you hit rate limits

Apify Proxy with residential IPs is billed separately from Actor compute, so factor it into the $/1k math; it's often larger than the Actor fee itself.

Field coverage cheatsheet

What you should expect from a well-built retailer Actor vs a generic one.

FieldWhy it mattersAmazon (junglee)Walmart (e-commerce)Multi-site
Stable product IDJoining across runsasinusItemIdMixed (SKU/MPN)
Buy Box sellerRepricing, counterfeit detection✅ (sellerId)
Variant-level pricesSize/color pricing✅ (priceVariants)✅ (variantList[])
Was-price / list pricePromo detection✅ (listPrice)✅ (wasPrice)Partial
Stars breakdownDefect analysis✅ (starsBreakdown)Avg onlyAvg only
AvailabilityStock-out alerts

If you need joinable IDs for a longitudinal dataset, the retailer-specific Actors are worth the premium.

Workflows I actually run

Weekly Amazon catalog sweep. junglee/amazon-crawler on a category or ASIN list → Apify dataset → Google Sheets integration. Pull asin, title, price.value, listPrice.value, seller, inStock. Keyed on ASIN, one row per run.

Hourly MAP check on 200 hot SKUs. Schedule the same Actor with a tight SKU list every hour during business hours. Webhook to Slack when price.value < map_floor. Residential proxy, maxConcurrency: 10, scrapes finish in under 90 seconds.

DTC competitor price feed. autofacts/shopify pointed at 12 competitor storefronts, daily. Costs a few dollars a month total. Append to BigQuery, build a Metabase dashboard keyed on (vendor, handle, date).

Review sentiment on a launch. junglee/amazon-reviews-scraper for a week post-launch, filtered to 1–3 stars, piped through an LLM summarizer. filterByRatings: ["oneStar", "twoStar", "threeStar"] cuts volume by ~40%.

Compliance

Retailer ToS routinely prohibit automated access. Scraping public listings is generally defensible in the US (hiQ v. LinkedIn lineage), but legal outcomes depend on jurisdiction, authentication, and data type. Keep it to public pages, avoid PII from reviews unless you have a lawful basis, and don't scrape behind logins. See is web scraping legal? before standing up a production program.

Run your first commerce scrape

Start with Amazon Product Scraper on 10 ASINs. Confirm price.value, asin, and seller land in your dataset before scaling the schedule. Sign up for Apify and the first 1,250-ish products run on the monthly free credit.

Apify Affiliate Banner 728x90Apify Affiliate Banner 728x90Apify Affiliate Banner 300x50Apify Affiliate Banner 300x50
Frequently Asked Questions

apify/e-commerce-scraping-tool covers Amazon, Walmart, eBay, Alibaba, Etsy, and many regional/local stores with one input schema. For a single marketplace, the dedicated Actor returns richer fields: ASIN and Buy Box for Amazon, usItemId and variantList for Walmart.

Yes, effectively. Datacenter IPs increasingly return listings with missing prices or wrong currency. The junglee Actor auto-selects proxy country from the domain. Residential proxy is billed separately from Actor compute, so include it in your $/1k estimate.

On Amazon at $3/1k results: 500 × 24 × 30 ≈ 360k results/month ≈ $1,080 Actor cost plus residential proxy. On Shopify storefronts the same volume is under $10 because the data is public JSON. Drop fields you don't need (reviews, variants) to keep long-running price schedules cheap.

Schedule the Actor, append each run to a table keyed by (stable_id, timestamp) where stable_id is asin for Amazon, usItemId for Walmart, or variant SKU for Shopify. Graph deltas or webhook-alert on threshold breaks. Avoid running maximal-field Actors on a price schedule; you're paying for data you discard.

Maintained Actors bundle browser automation, retries, and proxy rotation that absorb most soft challenges. Hard CAPTCHA rates depend on proxy quality and concurrency more than the Actor code. Lower maxConcurrency and switch to residential before adding a solver.

Apify Store, ecommerce category: https://apify.com/store/categories?search=ecommerce&fpr=use-apify

Common mistakes and fixes

Amazon returns empty price or Buy Box fields.

Set `apifyProxyCountry` to match the marketplace (US for amazon.com, GB for amazon.co.uk). Residential proxies are required on tight SKUs; datacenter IPs get stripped prices. Retry off-peak with lower concurrency.

Different HTML across Amazon domains.

Pin one marketplace per run (don't mix amazon.com and amazon.de in the same input). ASINs are marketplace-scoped; the same ASIN can resolve to different listings per region.

Shopify store returns partial catalogs.

Shopify's public `/products.json` endpoint paginates at 250/page and caps around ~5k products per store. For larger catalogs, pass collection URLs explicitly or chunk by vendor/tag filters.

Walmart returns 'Robot or human?' interstitial.

Walmart is aggressive on datacenter IPs; switch to residential proxy and cap concurrency at 5–10. Walmart Product Detail Scraper handles this automatically when proxy is enabled.

Apify Affiliate Banner 728x90Apify Affiliate Banner 728x90Apify Affiliate Banner 300x50Apify Affiliate Banner 300x50