Competitor Analysis with Web Scraping: The Complete Guide
Quick Answer
Competitor analysis with web scraping tracks competitor pricing, product catalogs, social presence, hiring signals, and content, automatically and at scale. On Apify, you run Actors on schedules, store structured rows in datasets, and feed Sheets, BI tools, or Slack via API and webhooks.
This guide maps each signal to a specific Actor, cron, and downstream destination, plus a working change-detection workflow for competitor pricing and messaging pages.
Why scrape instead of checking manually
Manual competitor research (opening tabs, copying prices into a sheet, skimming reviews) is fine for a one-off snapshot. It breaks down the moment you need it weekly across several rivals.
| Approach | Manual checking | Scraping with Apify |
|---|---|---|
| Coverage | A handful of pages you remember to revisit | Hundreds of URLs across every competitor |
| Frequency | Whenever someone has time | Scheduled cron, daily or hourly |
| History | Lost unless you screenshot | Every row keyed and timestamped for diffs |
| Alerts | None | Webhook to Slack, email, or BI on threshold breach |
| Consistency | Subjective, easy to miss | Same fields captured every run |
The payoff is not just speed. Structured, timestamped rows let you see trends (discount cadence, review-sentiment drift, hiring spikes) that a manual glance never surfaces. For a wider view across a whole category rather than named rivals, pair this with market intelligence.
Types of data to track
| Signal | What to collect | What it tells you |
|---|---|---|
| Pricing | Product and PLP prices, promos | Margin pressure, discount cadence |
| Inventory | In stock, quantity hints | Velocity, supply stress |
| Reviews & Q&A | Stars, text, volume | Quality gaps, unmet needs |
| Catalog | New SKUs, categories | Expansion and seasonality |
| Content & SEO | Titles, H1s, meta, blog cadence | Messaging and keyword pivots |
| Jobs | Open roles, locations, seniority | Roadmap hints 6–12 months early |
| SERP | Rankings for target keywords | Organic share of voice |
| Social | Followers, posts, engagement (public) | Brand momentum and campaigns |
Prioritize 3–5 competitors and specific URLs (bestsellers, pricing page, careers, review profiles) so runs stay cheap and interpretable.
Recommended Actors by data type
| Data type | Starting point | What you learn | Guide |
|---|---|---|---|
| Amazon listings | Amazon Product Scraper | ASIN-level price, Buy Box, rating shifts | Scrape Amazon products |
| General e-commerce | E-commerce Scraping Tool | Cross-storefront price and stock | Scrape e-commerce prices · Best e-commerce scrapers |
| Site copy / blogs | Website Content Crawler | Messaging and content cadence over time | Scrape website content |
| Google SERP | Google Search Results Scraper | Organic share of voice for your keywords | Scrape Google SERP |
| Local presence | Google Maps Scraper | Locations, ratings, review velocity | Scrape Google Maps · Best Google Maps scrapers |
| Reviews (platform-specific) | Apify Store search (Trustpilot, G2, etc.) | Quality gaps and unmet needs | Browse Apify Store |
| Instagram / social | Instagram Scraper | Posting cadence, campaigns, engagement | Scrape Instagram · Best social media scrapers |
Verify input schema, pricing model, and rate limits on each Actor page before production schedules. Not sure which Actor fits a niche site? The most popular Actors list is a reliable starting shortlist.
Workflow: from scrape to decision
Step 1: Scope and URL list
Per competitor, capture:
- Hero product or category URLs you must not lose sight of
- Public pricing or plans page (SaaS)
- Careers landing page
- Third-party review profile URLs where buyers compare you
Store these in a sheet or JSON config your runs read via task input, one source of truth prevents drift.
Step 2: Run once, then schedule
Run manually, inspect the dataset schema, then attach a Schedule (Console → Schedules):
| Signal | Suggested cadence | Example cron |
|---|---|---|
| Pricing | Daily | 0 6 * * * |
| Reviews | Daily | 0 7 * * * |
| Content / SEO snapshot | Weekly | 0 8 * * 1 |
| Job postings | Weekly | 0 8 * * 3 |
| SERP | Weekly | 0 9 * * 1 |
Step 3: Normalize and key data
Use a stable key per entity: product URL, ASIN, job ID, or (competitor, keyword) for SERP. Add scrapedAt on every row so BI can time-series and diff.
Step 4: Alerts
Webhook on run succeeded → Make, n8n, or Zapier → rules such as:
- Price drops beyond X%
- Review average below threshold
- Spike in new roles in a new department
- Homepage H1 or pricing table text changed (diff against prior crawl)
Integration with analytics tools
Apify datasets are tabular JSON, so they map cleanly to the tools teams already use:
| Destination | Pattern |
|---|---|
| Google Sheets / Airtable | Scheduled append or upsert via integration or low-code workflow; good for shared scorecards. |
| Spreadsheet + Looker Studio | Sheet as a lightweight semantic layer for exec views. |
| Warehouse (BigQuery, Snowflake, Redshift) | API or orchestrator loads JSONL batches; join with CRM and product data. |
| BI (Power BI, Tableau, Looker) | Point at warehouse tables or synced CSV drops in object storage. |
| Slack / email | Webhook → workflow → formatted message with deep links to product or job URLs. |
For programmatic pulls, use the Apify API to list run datasets and page through items, ideal for nightly ETL owned by data engineering.
Working playbook: pricing + messaging change detection
No custom Actor required. Use apify/website-content-crawler in Markdown mode against a short list of high-signal competitor URLs (pricing, features, homepage hero, changelog), schedule weekly, hash each page's Markdown, diff against the prior run.
Schedule input (5 competitors × ~3 pages each):
{
"startUrls": [
{ "url": "https://competitor-a.com/pricing" },
{ "url": "https://competitor-a.com/changelog" },
{ "url": "https://competitor-b.com/pricing" }
],
"maxCrawlDepth": 0,
"saveMarkdown": true,
"crawlerType": "playwright:adaptive"
}
n8n change detector (triggered on Apify run succeeded webhook):
import crypto from 'crypto';
const today = $input.all().map(i => i.json);
const previous = $('Airtable - previous snapshots').all().map(i => i.json);
const changes = [];
for (const page of today) {
const hash = crypto.createHash('sha256').update(page.markdown || '').digest('hex');
const prev = previous.find(p => p.url === page.url);
if (prev && prev.hash !== hash) {
changes.push({
url: page.url,
title: page.metadata?.title,
prevHash: prev.hash.slice(0, 8),
newHash: hash.slice(0, 8),
wordDelta: (page.markdown.split(/\s+/).length - (prev.wordCount || 0)),
});
}
}
return changes.map(c => ({ json: c }));
Route changes to an LLM summarization node (Claude or GPT) with prompt: "Here is the old and new Markdown of {{url}}. In 3 bullets, describe what materially changed. Ignore nav, cookie banners, and dates." Post the summary to Slack #competitive with a link to the live page.
Overwrite the Airtable snapshot table with today so next Monday's run diffs against this week.
Job postings as leading indicators
| Hiring signal | Often implies |
|---|---|
| Many ML / data roles | New intelligent features or data products |
| Enterprise sales in a new geo | Expansion into your territory |
| Infra / SRE cluster | Scale event or reliability push |
How: crawl careers weekly, count roles by department/location, alert on new clusters compared to the prior snapshot.
Content and SEO change detection
Track title tags, H1s, pricing copy, and blog cadence. Workflow: weekly crawl → store text or hash → diff vs previous week → notify on meaningful changes (not trivial nav tweaks). Pair the on-page crawl with a weekly Google SERP scrape to see whether messaging changes move their rankings on the keywords you both target.
Using AI on top of scraped data
Summarize large review exports or price-change CSVs with an LLM. Keep PII out of third-party prompts unless you have a clear lawful basis.
Review batch prompt (template)
You are a competitive intelligence analyst. Summarize these reviews for {Competitor}:
1) Top complaints (with rough frequency)
2) Top praised themes
3) Sentiment trend vs older reviews if timestamps are present
Legal and ethical boundaries
Scrape public pages; do not bypass authentication you are not entitled to automate. Respect terms, robots.txt, and reasonable request rates. Do not use data to misrepresent a competitor. See Is web scraping legal?.
Cost ballpark
Focused competitor stacks (hundreds of URLs, mixed Actors) often land around $10–30/month on starter-style usage; always validate against each Actor’s Pricing tab and a smoke run.
Related: E-commerce price monitoring · Market intelligence · Social media analytics
Try Apify: free credits to test competitor scrapes →
Common datasets include pricing and availability, product catalog changes, reviews and ratings, marketing and SEO surface text, job postings, SERP positions for your keywords, and public social metrics. Apify’s Store provides Actors for many sites; custom sites can use Crawlee in your own Actor.
Hiring is a leading indicator: new engineering stacks, go-to-market motion, and geographies show up in reqs before press releases. Weekly snapshots of career pages make spikes and new role families easy to detect.
Publicly available factual information is often scrapable under prevailing interpretations in major jurisdictions, but site terms and computer abuse laws still apply. Collect only what you need, avoid circumventing technical barriers improperly, and read our legality overview.
Match cadence to volatility: pricing daily (hourly in promos), reviews and jobs weekly, SERP and marketing copy weekly to monthly. Apify schedules let you set different crons per Actor or task.
Yes. Use run webhooks plus Make, n8n, or Zapier to compare new datasets to baselines and post to Slack or email. You can also load datasets into a warehouse and let BI subscriptions fire when metrics cross thresholds.
Prefer browser-based Actors (Playwright or Puppeteer templates) or a site-specific Store Actor. Add proxies where the README recommends them. See web scraping challenges for mitigation patterns.
Off-the-shelf tools are quick to start but lock you into their schema, sources, and pricing. Scraping with Apify gives you raw, structured ownership of exactly the URLs and fields you choose, plus easy export to your own warehouse or BI. The trade-off is setup time and maintenance when target sites change layout. Many teams run both: a packaged tool for breadth, scraping for the few signals they need precise control over.
Start with 3 to 5 direct rivals and the specific URLs that matter (bestsellers, pricing page, careers, review profiles). This keeps runs cheap and the output interpretable. Add competitors once your alerts and dashboards are working, rather than scraping everything from day one.



