Skip to main content

Competitor Analysis with Web Scraping: The Complete Guide

Quick Answer

Competitor analysis with web scraping tracks competitor pricing, product catalogs, social presence, hiring signals, and content, automatically and at scale. On Apify, you run Actors on schedules, store structured rows in datasets, and feed Sheets, BI tools, or Slack via API and webhooks.

This guide maps each signal to a specific Actor, cron, and downstream destination, plus a working change-detection workflow for competitor pricing and messaging pages.


Why scrape instead of checking manually

Manual competitor research (opening tabs, copying prices into a sheet, skimming reviews) is fine for a one-off snapshot. It breaks down the moment you need it weekly across several rivals.

ApproachManual checkingScraping with Apify
CoverageA handful of pages you remember to revisitHundreds of URLs across every competitor
FrequencyWhenever someone has timeScheduled cron, daily or hourly
HistoryLost unless you screenshotEvery row keyed and timestamped for diffs
AlertsNoneWebhook to Slack, email, or BI on threshold breach
ConsistencySubjective, easy to missSame fields captured every run

The payoff is not just speed. Structured, timestamped rows let you see trends (discount cadence, review-sentiment drift, hiring spikes) that a manual glance never surfaces. For a wider view across a whole category rather than named rivals, pair this with market intelligence.


Types of data to track

SignalWhat to collectWhat it tells you
PricingProduct and PLP prices, promosMargin pressure, discount cadence
InventoryIn stock, quantity hintsVelocity, supply stress
Reviews & Q&AStars, text, volumeQuality gaps, unmet needs
CatalogNew SKUs, categoriesExpansion and seasonality
Content & SEOTitles, H1s, meta, blog cadenceMessaging and keyword pivots
JobsOpen roles, locations, seniorityRoadmap hints 6–12 months early
SERPRankings for target keywordsOrganic share of voice
SocialFollowers, posts, engagement (public)Brand momentum and campaigns

Prioritize 3–5 competitors and specific URLs (bestsellers, pricing page, careers, review profiles) so runs stay cheap and interpretable.

Apify Affiliate Banner 728x90Apify Affiliate Banner 728x90Apify Affiliate Banner 300x50Apify Affiliate Banner 300x50

Data typeStarting pointWhat you learnGuide
Amazon listingsAmazon Product ScraperASIN-level price, Buy Box, rating shiftsScrape Amazon products
General e-commerceE-commerce Scraping ToolCross-storefront price and stockScrape e-commerce prices · Best e-commerce scrapers
Site copy / blogsWebsite Content CrawlerMessaging and content cadence over timeScrape website content
Google SERPGoogle Search Results ScraperOrganic share of voice for your keywordsScrape Google SERP
Local presenceGoogle Maps ScraperLocations, ratings, review velocityScrape Google Maps · Best Google Maps scrapers
Reviews (platform-specific)Apify Store search (Trustpilot, G2, etc.)Quality gaps and unmet needsBrowse Apify Store
Instagram / socialInstagram ScraperPosting cadence, campaigns, engagementScrape Instagram · Best social media scrapers

Verify input schema, pricing model, and rate limits on each Actor page before production schedules. Not sure which Actor fits a niche site? The most popular Actors list is a reliable starting shortlist.


Workflow: from scrape to decision

Step 1: Scope and URL list

Per competitor, capture:

  • Hero product or category URLs you must not lose sight of
  • Public pricing or plans page (SaaS)
  • Careers landing page
  • Third-party review profile URLs where buyers compare you

Store these in a sheet or JSON config your runs read via task input, one source of truth prevents drift.

Step 2: Run once, then schedule

Run manually, inspect the dataset schema, then attach a Schedule (Console → Schedules):

SignalSuggested cadenceExample cron
PricingDaily0 6 * * *
ReviewsDaily0 7 * * *
Content / SEO snapshotWeekly0 8 * * 1
Job postingsWeekly0 8 * * 3
SERPWeekly0 9 * * 1

Step 3: Normalize and key data

Use a stable key per entity: product URL, ASIN, job ID, or (competitor, keyword) for SERP. Add scrapedAt on every row so BI can time-series and diff.

Step 4: Alerts

Webhook on run succeededMake, n8n, or Zapier → rules such as:

  • Price drops beyond X%
  • Review average below threshold
  • Spike in new roles in a new department
  • Homepage H1 or pricing table text changed (diff against prior crawl)

Integration with analytics tools

Apify datasets are tabular JSON, so they map cleanly to the tools teams already use:

DestinationPattern
Google Sheets / AirtableScheduled append or upsert via integration or low-code workflow; good for shared scorecards.
Spreadsheet + Looker StudioSheet as a lightweight semantic layer for exec views.
Warehouse (BigQuery, Snowflake, Redshift)API or orchestrator loads JSONL batches; join with CRM and product data.
BI (Power BI, Tableau, Looker)Point at warehouse tables or synced CSV drops in object storage.
Slack / emailWebhook → workflow → formatted message with deep links to product or job URLs.

For programmatic pulls, use the Apify API to list run datasets and page through items, ideal for nightly ETL owned by data engineering.


Working playbook: pricing + messaging change detection

No custom Actor required. Use apify/website-content-crawler in Markdown mode against a short list of high-signal competitor URLs (pricing, features, homepage hero, changelog), schedule weekly, hash each page's Markdown, diff against the prior run.

Schedule input (5 competitors × ~3 pages each):

{
"startUrls": [
{ "url": "https://competitor-a.com/pricing" },
{ "url": "https://competitor-a.com/changelog" },
{ "url": "https://competitor-b.com/pricing" }
],
"maxCrawlDepth": 0,
"saveMarkdown": true,
"crawlerType": "playwright:adaptive"
}

n8n change detector (triggered on Apify run succeeded webhook):

import crypto from 'crypto';
const today = $input.all().map(i => i.json);
const previous = $('Airtable - previous snapshots').all().map(i => i.json);

const changes = [];
for (const page of today) {
const hash = crypto.createHash('sha256').update(page.markdown || '').digest('hex');
const prev = previous.find(p => p.url === page.url);
if (prev && prev.hash !== hash) {
changes.push({
url: page.url,
title: page.metadata?.title,
prevHash: prev.hash.slice(0, 8),
newHash: hash.slice(0, 8),
wordDelta: (page.markdown.split(/\s+/).length - (prev.wordCount || 0)),
});
}
}
return changes.map(c => ({ json: c }));

Route changes to an LLM summarization node (Claude or GPT) with prompt: "Here is the old and new Markdown of {{url}}. In 3 bullets, describe what materially changed. Ignore nav, cookie banners, and dates." Post the summary to Slack #competitive with a link to the live page.

Overwrite the Airtable snapshot table with today so next Monday's run diffs against this week.


Job postings as leading indicators

Hiring signalOften implies
Many ML / data rolesNew intelligent features or data products
Enterprise sales in a new geoExpansion into your territory
Infra / SRE clusterScale event or reliability push

How: crawl careers weekly, count roles by department/location, alert on new clusters compared to the prior snapshot.


Content and SEO change detection

Track title tags, H1s, pricing copy, and blog cadence. Workflow: weekly crawl → store text or hash → diff vs previous week → notify on meaningful changes (not trivial nav tweaks). Pair the on-page crawl with a weekly Google SERP scrape to see whether messaging changes move their rankings on the keywords you both target.


Using AI on top of scraped data

Summarize large review exports or price-change CSVs with an LLM. Keep PII out of third-party prompts unless you have a clear lawful basis.

Review batch prompt (template)

You are a competitive intelligence analyst. Summarize these reviews for {Competitor}:
1) Top complaints (with rough frequency)
2) Top praised themes
3) Sentiment trend vs older reviews if timestamps are present

Scrape public pages; do not bypass authentication you are not entitled to automate. Respect terms, robots.txt, and reasonable request rates. Do not use data to misrepresent a competitor. See Is web scraping legal?.


Cost ballpark

Focused competitor stacks (hundreds of URLs, mixed Actors) often land around $10–30/month on starter-style usage; always validate against each Actor’s Pricing tab and a smoke run.


Related: E-commerce price monitoring · Market intelligence · Social media analytics

Try Apify: free credits to test competitor scrapes →

Frequently Asked Questions

Common datasets include pricing and availability, product catalog changes, reviews and ratings, marketing and SEO surface text, job postings, SERP positions for your keywords, and public social metrics. Apify’s Store provides Actors for many sites; custom sites can use Crawlee in your own Actor.

Hiring is a leading indicator: new engineering stacks, go-to-market motion, and geographies show up in reqs before press releases. Weekly snapshots of career pages make spikes and new role families easy to detect.

Publicly available factual information is often scrapable under prevailing interpretations in major jurisdictions, but site terms and computer abuse laws still apply. Collect only what you need, avoid circumventing technical barriers improperly, and read our legality overview.

Match cadence to volatility: pricing daily (hourly in promos), reviews and jobs weekly, SERP and marketing copy weekly to monthly. Apify schedules let you set different crons per Actor or task.

Yes. Use run webhooks plus Make, n8n, or Zapier to compare new datasets to baselines and post to Slack or email. You can also load datasets into a warehouse and let BI subscriptions fire when metrics cross thresholds.

Prefer browser-based Actors (Playwright or Puppeteer templates) or a site-specific Store Actor. Add proxies where the README recommends them. See web scraping challenges for mitigation patterns.

Off-the-shelf tools are quick to start but lock you into their schema, sources, and pricing. Scraping with Apify gives you raw, structured ownership of exactly the URLs and fields you choose, plus easy export to your own warehouse or BI. The trade-off is setup time and maintenance when target sites change layout. Many teams run both: a packaged tool for breadth, scraping for the few signals they need precise control over.

Start with 3 to 5 direct rivals and the specific URLs that matter (bestsellers, pricing page, careers, review profiles). This keeps runs cheap and the output interpretable. Add competitors once your alerts and dashboards are working, rather than scraping everything from day one.

Apify Affiliate Banner 728x90Apify Affiliate Banner 728x90Apify Affiliate Banner 300x50Apify Affiliate Banner 300x50