Skip to main content

Web Scraping Legal Compliance Framework: GDPR, CCPA, and Global Regulations (2026)

· 8 min read
Yassine El Haddad
Software Developer & Automation Specialist

I build production AI agents, web scrapers, and automation pipelines. Most of what I publish here comes from the actual problems they run into: proxies that get banned, anti-bot stacks that fingerprint your client, RAG that drifts when the underlying data moves. Stack: Python, TypeScript, Go, FastAPI, LangChain, Crawlee, Playwright, deployed on AWS, GCP, and Cloudflare.

Web scraping operates in a legal gray zone: no single global law governs it. Instead, multiple intersecting frameworks—CFAA (US), GDPR (EU), CCPA (California), copyright law, and contract law (Terms of Service)—apply. This guide maps what's clearly legal, what's gray, and what's clearly illegal, with a practical compliance checklist. This is not legal advice; consult counsel for your specific situation. Run compliant Actors on Apify · Bright Data compliant datasets

There is no global law specifically for web scraping. Operators must navigate overlapping rules:

  • United States: Computer Fraud and Abuse Act (CFAA), state privacy laws (CCPA, VCDPA, etc.)
  • European Union: GDPR, ePrivacy Directive, national implementations
  • Copyright: Database rights, factual compilation, fair use
  • Contract: Terms of Service (ToS) as binding agreements

Jurisdictions differ. Scraping that is acceptable in the US may violate EU rules if personal data is involved. Always assess by target audience, data type, and where you and the target operate.

CFAA (United States)

The Computer Fraud and Abuse Act (CFAA) criminalizes unauthorized access to "protected computers." The hiQ v. LinkedIn (9th Circuit, 2022) ruling narrowed CFAA's scope: scraping publicly available data without bypassing authentication is generally not "unauthorized access." Bypassing login walls or technical barriers (authentication, rate limits engineered to block bots) can still trigger CFAA liability.

GDPR (European Union)

GDPR applies when you process personal data of EU residents. "Personal data" includes names, emails, IPs, and anything that identifies a natural person. Scraping public profiles, reviews, or directories often involves personal data. You need a legal basis (consent, legitimate interest, etc.) and must honor data subject rights (access, deletion, portability). Fines can reach 4% of global revenue.

CCPA (California)

California Consumer Privacy Act (CCPA) gives consumers rights over personal information. Scraping California residents' data may trigger disclosure, opt-out, and deletion obligations. The definition of "sale" and "share" affects how scraped data can be used commercially.

Copyright protects creative expression, not facts. Scraping factual data (prices, titles, dates) typically does not infringe copyright. Copying large blocks of text, images, or proprietary compilations can. Database rights in the EU add another layer for structured datasets.

Contract Law (Terms of Service)

ToS are binding contracts. Violating ToS (e.g., "no scraping" clauses) can lead to breach-of-contract claims. Courts differ on enforceability: some treat ToS as contracts; others require clearer assent. Ryanair v. PR Aviation (CJEU) held that ToS cannot override statutory database rights. Meta v. BrandTotal involved scraping despite ToS—outcomes vary by jurisdiction and facts.

  • Scraping publicly available data not covered by an explicit ToS prohibition and without bypassing authentication
  • No bypassing auth: Accessing pages that require no login
  • Respecting robots.txt as a voluntary standard (not legally mandatory in most places, but a best practice)
  • Factual data extraction where copyright does not protect the content
  • Rate-limited, polite crawling that does not overload servers

Example: Scraping product names and prices from a public e-commerce category page, with delays between requests, typically falls in the "clearly legal" zone in many jurisdictions—assuming no ToS violation and no personal data mishandling.

What's Legally Gray

  • Scraping personal data (GDPR implications: need legal basis, transparency, rights)
  • Data behind login: Accessing content after authenticating—CFAA and ToS issues
  • Commercial competition: Scraping a competitor's site for pricing or content—competition law and ToS disputes
  • Aggressive rate limits: Pushing limits may be seen as "unauthorized" or DoS-like
  • Reselling scraped data: May trigger CCPA "sale" rules, GDPR restrictions, or contractual claims

When in the gray zone, document your rationale, minimize personal data, and consider legal review before scaling.

What's Clearly Illegal

  • Bypassing authentication (fake credentials, session hijacking, credential stuffing)
  • Ignoring explicit cease-and-desist after being put on notice
  • Scraping + reselling personal data without consent where consent or another legal basis is required
  • Circumventing technical measures (CAPTCHA farms, credential stuffing)
  • Causing harm (DoS, server overload, data breaches)

If you receive a cease-and-desist, stop scraping that target and preserve records for potential defense or negotiation.

Practical Compliance Checklist

CheckAction
robots.txtReview and honor where feasible (crawl-delay, disallow)
ToS reviewRead target's ToS; document if scraping is prohibited
Personal dataIdentify PII; implement consent, anonymization, or deletion
Rate limitsUse delays, respect Retry-After headers, avoid overload
AttributionWhen republishing, credit sources where required
Data retentionDefine retention periods; implement deletion workflows
DocumentationLog scope, legal basis, and data flows

Tools like Apify let you configure proxy rotation and concurrency to stay within polite crawling. Bright Data offers compliant pre-collected datasets for some use cases.

Case Law Summary

CaseCourtOutcome
hiQ v. LinkedIn9th Circuit (US)Scraping publicly available data without bypassing auth is not CFAA "unauthorized access"
Ryanair v. PR AviationCJEUToS cannot override EU database rights
Meta v. BrandTotalUS DistrictDispute over scraping despite ToS; case-specific
SerpApiVariousCommercial SERP APIs; legal posture depends on ToS and jurisdiction

Case law evolves. Rely on current legal advice, not summaries alone.

Jurisdiction vs. Scraping Rules

JurisdictionKey RulesScraping Guidance
US (federal)CFAAPublic data, no auth bypass: generally OK per hiQ
EUGDPR, ePrivacyPersonal data requires legal basis; minimize and document
UKUK GDPRSimilar to EU; post-Brexit adjustments
CaliforniaCCPAConsumer rights; disclosure and opt-out if selling/sharing
GermanyGDPR, UWGStricter on competition; consider local counsel

Data Minimization and Retention

Even when scraping is legally permissible, minimize and limit personal data:

  • Collect only what you need: Avoid scraping full profiles if you only need names and titles.
  • Anonymize where possible: Aggregate or hash identifiers when individual identification is unnecessary.
  • Set retention periods: Define how long you keep scraped data; implement automatic deletion.
  • Document flows: Map where data goes—storage, processors, third parties. Required under GDPR for DPIAs.

Tools like Apify let you define retention policies on datasets. For self-hosted setups, implement cron jobs or TTL on storage.

robots.txt is not legally binding in most jurisdictions. However:

  • Honoring it reduces risk: Courts and plaintiffs may cite disregard as evidence of bad faith.
  • Industry standard: Many targets assume compliance; violating it can trigger ToS enforcement.
  • Crawl-delay: Some sites use Crawl-delay (non-standard). Respecting it shows goodwill.

Parse robots.txt before crawling. Use libraries like robotexclusionrulesparser or Crawlee's built-in handling. When in doubt, err on the side of compliance.

Cease-and-Desist Response

If you receive a cease-and-desist:

  1. Stop immediately — Do not continue scraping that target.
  2. Preserve records — Log what was scraped, when, and for what purpose. Useful for defense or negotiation.
  3. Do not destroy data — Spoliation can hurt you in litigation. Retain until counsel advises.
  4. Consult counsel — Get jurisdiction-specific advice before responding or resuming.
  5. Document communications — Keep copies of all correspondence.

Some operators negotiate limited access or licensing. Others switch to official APIs or alternative sources. Each situation is fact-specific.

When to Consult Counsel

  • Scraping personal data at scale
  • Targets with known litigious postures
  • Plans to resell or monetize scraped data
  • Operations in multiple jurisdictions
  • Before launching after receiving a cease-and-desist
Apify Affiliate Banner 728x90Apify Affiliate Banner 728x90Apify Affiliate Banner 300x50Apify Affiliate Banner 300x50
Start with the checklist

Run through the compliance checklist for every new target. Document decisions. When personal data or commercial use is involved, get legal advice before scaling.



Explore Apify Actors · Bright Data compliant data

Frequently Asked Questions

Not per se. Scraping publicly available data without bypassing authentication is generally legal in the US per hiQ v. LinkedIn. EU and other jurisdictions add GDPR and similar rules when personal data is involved.

Yes, when scraped data includes personal data of EU residents. You need a legal basis (e.g., legitimate interest), transparency, and must honor data subject rights (access, deletion, portability).

Legally, robots.txt is not binding in most jurisdictions, but respecting it is a best practice and reduces legal and reputational risk. Some sites rely on it; ignoring it can trigger ToS or CFAA arguments.

ToS are contracts. Violating them can lead to breach-of-contract claims. Courts differ on enforceability. When in doubt, avoid scraping or seek legal advice. hiQ did not address ToS directly.

Under GDPR, public data may be processed under legitimate interest, but you must still balance interests, provide transparency, and honor rights. Consent is one legal basis; it is not always required if another basis applies.

Stop scraping that target immediately. Preserve records of what you scraped and how. Consult legal counsel before resuming or responding.

Common mistakes and fixes

Received cease-and-desist letter

Stop scraping that target immediately. Consult legal counsel. Preserve records of scraping scope and data handling.

Uncertain if data is personal under GDPR

When in doubt, treat as personal. Implement consent, anonymization, or deletion workflows. Document your legal basis.