Competitor Analysis with Web Scraping: The Complete Guide

Quick Answer

Competitor analysis with web scraping tracks competitor pricing, product catalogs, social presence, hiring signals, and content, automatically and at scale. On Apify, you run Actors on schedules, store structured rows in datasets, and feed Sheets, BI tools, or Slack via API and webhooks.

This guide maps each signal to a specific Actor, cron, and downstream destination, plus a working change-detection workflow for competitor pricing and messaging pages.

Why scrape instead of checking manually

Manual competitor research (opening tabs, copying prices into a sheet, skimming reviews) is fine for a one-off snapshot. It breaks down the moment you need it weekly across several rivals.

Approach	Manual checking	Scraping with Apify
Coverage	A handful of pages you remember to revisit	Hundreds of URLs across every competitor
Frequency	Whenever someone has time	Scheduled cron, daily or hourly
History	Lost unless you screenshot	Every row keyed and timestamped for diffs
Alerts	None	Webhook to Slack, email, or BI on threshold breach
Consistency	Subjective, easy to miss	Same fields captured every run

The payoff is not just speed. Structured, timestamped rows let you see trends (discount cadence, review-sentiment drift, hiring spikes) that a manual glance never surfaces. For a wider view across a whole category rather than named rivals, pair this with market intelligence.

Types of data to track

Signal	What to collect	What it tells you
Pricing	Product and PLP prices, promos	Margin pressure, discount cadence
Inventory	In stock, quantity hints	Velocity, supply stress
Reviews & Q&A	Stars, text, volume	Quality gaps, unmet needs
Catalog	New SKUs, categories	Expansion and seasonality
Content & SEO	Titles, H1s, meta, blog cadence	Messaging and keyword pivots
Jobs	Open roles, locations, seniority	Roadmap hints 6–12 months early
SERP	Rankings for target keywords	Organic share of voice
Social	Followers, posts, engagement (public)	Brand momentum and campaigns

Prioritize 3–5 competitors and specific URLs (bestsellers, pricing page, careers, review profiles) so runs stay cheap and interpretable.

Recommended Actors by data type

Data type	Starting point	What you learn	Guide
Amazon listings	Amazon Product Scraper	ASIN-level price, Buy Box, rating shifts	Scrape Amazon products
General e-commerce	E-commerce Scraping Tool	Cross-storefront price and stock	Scrape e-commerce prices · Best e-commerce scrapers
Site copy / blogs	Website Content Crawler	Messaging and content cadence over time	Scrape website content
Google SERP	Google Search Results Scraper	Organic share of voice for your keywords	Scrape Google SERP
Local presence	Google Maps Scraper	Locations, ratings, review velocity	Scrape Google Maps · Best Google Maps scrapers
Reviews (platform-specific)	Apify Store search (Trustpilot, G2, etc.)	Quality gaps and unmet needs	Browse Apify Store
Instagram / social	Instagram Scraper	Posting cadence, campaigns, engagement	Scrape Instagram · Best social media scrapers

Verify input schema, pricing model, and rate limits on each Actor page before production schedules. Not sure which Actor fits a niche site? The most popular Actors list is a reliable starting shortlist.

Workflow: from scrape to decision

Step 1: Scope and URL list

Per competitor, capture:

Hero product or category URLs you must not lose sight of
Public pricing or plans page (SaaS)
Careers landing page
Third-party review profile URLs where buyers compare you

Store these in a sheet or JSON config your runs read via task input, one source of truth prevents drift.

Step 2: Run once, then schedule

Run manually, inspect the dataset schema, then attach a Schedule (Console → Schedules):

Signal	Suggested cadence	Example cron
Pricing	Daily	`0 6 * * *`
Reviews	Daily	`0 7 * * *`
Content / SEO snapshot	Weekly	`0 8 * * 1`
Job postings	Weekly	`0 8 * * 3`
SERP	Weekly	`0 9 * * 1`

Step 3: Normalize and key data

Use a stable key per entity: product URL, ASIN, job ID, or (competitor, keyword) for SERP. Add scrapedAt on every row so BI can time-series and diff.

Step 4: Alerts

Webhook on run succeeded → Make, n8n, or Zapier → rules such as:

Price drops beyond X%
Review average below threshold
Spike in new roles in a new department
Homepage H1 or pricing table text changed (diff against prior crawl)

Integration with analytics tools

Apify datasets are tabular JSON, so they map cleanly to the tools teams already use:

Destination	Pattern
Google Sheets / Airtable	Scheduled append or upsert via integration or low-code workflow; good for shared scorecards.
Spreadsheet + Looker Studio	Sheet as a lightweight semantic layer for exec views.
Warehouse (BigQuery, Snowflake, Redshift)	API or orchestrator loads JSONL batches; join with CRM and product data.
BI (Power BI, Tableau, Looker)	Point at warehouse tables or synced CSV drops in object storage.
Slack / email	Webhook → workflow → formatted message with deep links to product or job URLs.

For programmatic pulls, use the Apify API to list run datasets and page through items, ideal for nightly ETL owned by data engineering.

Working playbook: pricing + messaging change detection

No custom Actor required. Use apify/website-content-crawler in Markdown mode against a short list of high-signal competitor URLs (pricing, features, homepage hero, changelog), schedule weekly, hash each page's Markdown, diff against the prior run.

Schedule input (5 competitors × ~3 pages each):

{
  "startUrls": [
    { "url": "https://competitor-a.com/pricing" },
    { "url": "https://competitor-a.com/changelog" },
    { "url": "https://competitor-b.com/pricing" }
  ],
  "maxCrawlDepth": 0,
  "saveMarkdown": true,
  "crawlerType": "playwright:adaptive"
}

n8n change detector (triggered on Apify run succeeded webhook):

import crypto from 'crypto';
const today = $input.all().map(i => i.json);
const previous = $('Airtable - previous snapshots').all().map(i => i.json);

const changes = [];
for (const page of today) {
  const hash = crypto.createHash('sha256').update(page.markdown || '').digest('hex');
  const prev = previous.find(p => p.url === page.url);
  if (prev && prev.hash !== hash) {
    changes.push({
      url: page.url,
      title: page.metadata?.title,
      prevHash: prev.hash.slice(0, 8),
      newHash: hash.slice(0, 8),
      wordDelta: (page.markdown.split(/\s+/).length - (prev.wordCount || 0)),
    });
  }
}
return changes.map(c => ({ json: c }));

Route changes to an LLM summarization node (Claude or GPT) with prompt: "Here is the old and new Markdown of {{url}}. In 3 bullets, describe what materially changed. Ignore nav, cookie banners, and dates." Post the summary to Slack #competitive with a link to the live page.

Overwrite the Airtable snapshot table with today so next Monday's run diffs against this week.

Job postings as leading indicators

Hiring signal	Often implies
Many ML / data roles	New intelligent features or data products
Enterprise sales in a new geo	Expansion into your territory
Infra / SRE cluster	Scale event or reliability push

How: crawl careers weekly, count roles by department/location, alert on new clusters compared to the prior snapshot.

Content and SEO change detection

Track title tags, H1s, pricing copy, and blog cadence. Workflow: weekly crawl → store text or hash → diff vs previous week → notify on meaningful changes (not trivial nav tweaks). Pair the on-page crawl with a weekly Google SERP scrape to see whether messaging changes move their rankings on the keywords you both target.

Using AI on top of scraped data

Summarize large review exports or price-change CSVs with an LLM. Keep PII out of third-party prompts unless you have a clear lawful basis.

Review batch prompt (template)

You are a competitive intelligence analyst. Summarize these reviews for {Competitor}:
1) Top complaints (with rough frequency)
2) Top praised themes
3) Sentiment trend vs older reviews if timestamps are present

Legal and ethical boundaries

Scrape public pages; do not bypass authentication you are not entitled to automate. Respect terms, robots.txt, and reasonable request rates. Do not use data to misrepresent a competitor. See Is web scraping legal?.

Cost ballpark

Focused competitor stacks (hundreds of URLs, mixed Actors) often land around $10–30/month on starter-style usage; always validate against each Actor’s Pricing tab and a smoke run.

Try Apify: free credits to test competitor scrapes →

Frequently Asked Questions

Common datasets include pricing and availability, product catalog changes, reviews and ratings, marketing and SEO surface text, job postings, SERP positions for your keywords, and public social metrics. Apify’s Store provides Actors for many sites; custom sites can use Crawlee in your own Actor.

Hiring is a leading indicator: new engineering stacks, go-to-market motion, and geographies show up in reqs before press releases. Weekly snapshots of career pages make spikes and new role families easy to detect.

Publicly available factual information is often scrapable under prevailing interpretations in major jurisdictions, but site terms and computer abuse laws still apply. Collect only what you need, avoid circumventing technical barriers improperly, and read our legality overview.

Match cadence to volatility: pricing daily (hourly in promos), reviews and jobs weekly, SERP and marketing copy weekly to monthly. Apify schedules let you set different crons per Actor or task.

Yes. Use run webhooks plus Make, n8n, or Zapier to compare new datasets to baselines and post to Slack or email. You can also load datasets into a warehouse and let BI subscriptions fire when metrics cross thresholds.

Prefer browser-based Actors (Playwright or Puppeteer templates) or a site-specific Store Actor. Add proxies where the README recommends them. See web scraping challenges for mitigation patterns.

Off-the-shelf tools are quick to start but lock you into their schema, sources, and pricing. Scraping with Apify gives you raw, structured ownership of exactly the URLs and fields you choose, plus easy export to your own warehouse or BI. The trade-off is setup time and maintenance when target sites change layout. Many teams run both: a packaged tool for breadth, scraping for the few signals they need precise control over.

Start with 3 to 5 direct rivals and the specific URLs that matter (bestsellers, pricing page, careers, review profiles). This keeps runs cheap and the output interpretable. Add competitors once your alerts and dashboards are working, rather than scraping everything from day one.

Quick Answer​

Why scrape instead of checking manually​

Types of data to track​

Recommended Actors by data type​

Workflow: from scrape to decision​

Step 1: Scope and URL list​

Step 2: Run once, then schedule​

Step 3: Normalize and key data​

Step 4: Alerts​

Integration with analytics tools​

Working playbook: pricing + messaging change detection​

Job postings as leading indicators​

Content and SEO change detection​

Using AI on top of scraped data​

Legal and ethical boundaries​

Cost ballpark​