Web Scraping Legal Compliance Framework: GDPR, CCPA, and Global Regulations (2026)
Web scraping operates in a legal gray zone: no single global law governs it. Instead, multiple intersecting frameworks—CFAA (US), GDPR (EU), CCPA (California), copyright law, and contract law (Terms of Service)—apply. This guide maps what's clearly legal, what's gray, and what's clearly illegal, with a practical compliance checklist. This is not legal advice; consult counsel for your specific situation. Run compliant Actors on Apify · Bright Data compliant datasets
The Legal Landscape
There is no global law specifically for web scraping. Operators must navigate overlapping rules:
- United States: Computer Fraud and Abuse Act (CFAA), state privacy laws (CCPA, VCDPA, etc.)
- European Union: GDPR, ePrivacy Directive, national implementations
- Copyright: Database rights, factual compilation, fair use
- Contract: Terms of Service (ToS) as binding agreements
Jurisdictions differ. Scraping that is acceptable in the US may violate EU rules if personal data is involved. Always assess by target audience, data type, and where you and the target operate.
Key Legal Frameworks
CFAA (United States)
The Computer Fraud and Abuse Act (CFAA) criminalizes unauthorized access to "protected computers." The hiQ v. LinkedIn (9th Circuit, 2022) ruling narrowed CFAA's scope: scraping publicly available data without bypassing authentication is generally not "unauthorized access." Bypassing login walls or technical barriers (authentication, rate limits engineered to block bots) can still trigger CFAA liability.
GDPR (European Union)
GDPR applies when you process personal data of EU residents. "Personal data" includes names, emails, IPs, and anything that identifies a natural person. Scraping public profiles, reviews, or directories often involves personal data. You need a legal basis (consent, legitimate interest, etc.) and must honor data subject rights (access, deletion, portability). Fines can reach 4% of global revenue.
CCPA (California)
California Consumer Privacy Act (CCPA) gives consumers rights over personal information. Scraping California residents' data may trigger disclosure, opt-out, and deletion obligations. The definition of "sale" and "share" affects how scraped data can be used commercially.
Copyright Law
Copyright protects creative expression, not facts. Scraping factual data (prices, titles, dates) typically does not infringe copyright. Copying large blocks of text, images, or proprietary compilations can. Database rights in the EU add another layer for structured datasets.
Contract Law (Terms of Service)
ToS are binding contracts. Violating ToS (e.g., "no scraping" clauses) can lead to breach-of-contract claims. Courts differ on enforceability: some treat ToS as contracts; others require clearer assent. Ryanair v. PR Aviation (CJEU) held that ToS cannot override statutory database rights. Meta v. BrandTotal involved scraping despite ToS—outcomes vary by jurisdiction and facts.
What's Clearly Legal
- Scraping publicly available data not covered by an explicit ToS prohibition and without bypassing authentication
- No bypassing auth: Accessing pages that require no login
- Respecting robots.txt as a voluntary standard (not legally mandatory in most places, but a best practice)
- Factual data extraction where copyright does not protect the content
- Rate-limited, polite crawling that does not overload servers
Example: Scraping product names and prices from a public e-commerce category page, with delays between requests, typically falls in the "clearly legal" zone in many jurisdictions—assuming no ToS violation and no personal data mishandling.
What's Legally Gray
- Scraping personal data (GDPR implications: need legal basis, transparency, rights)
- Data behind login: Accessing content after authenticating—CFAA and ToS issues
- Commercial competition: Scraping a competitor's site for pricing or content—competition law and ToS disputes
- Aggressive rate limits: Pushing limits may be seen as "unauthorized" or DoS-like
- Reselling scraped data: May trigger CCPA "sale" rules, GDPR restrictions, or contractual claims
When in the gray zone, document your rationale, minimize personal data, and consider legal review before scaling.
What's Clearly Illegal
- Bypassing authentication (fake credentials, session hijacking, credential stuffing)
- Ignoring explicit cease-and-desist after being put on notice
- Scraping + reselling personal data without consent where consent or another legal basis is required
- Circumventing technical measures (CAPTCHA farms, credential stuffing)
- Causing harm (DoS, server overload, data breaches)
If you receive a cease-and-desist, stop scraping that target and preserve records for potential defense or negotiation.
Practical Compliance Checklist
| Check | Action |
|---|---|
| robots.txt | Review and honor where feasible (crawl-delay, disallow) |
| ToS review | Read target's ToS; document if scraping is prohibited |
| Personal data | Identify PII; implement consent, anonymization, or deletion |
| Rate limits | Use delays, respect Retry-After headers, avoid overload |
| Attribution | When republishing, credit sources where required |
| Data retention | Define retention periods; implement deletion workflows |
| Documentation | Log scope, legal basis, and data flows |
Tools like Apify let you configure proxy rotation and concurrency to stay within polite crawling. Bright Data offers compliant pre-collected datasets for some use cases.
Case Law Summary
| Case | Court | Outcome |
|---|---|---|
| hiQ v. LinkedIn | 9th Circuit (US) | Scraping publicly available data without bypassing auth is not CFAA "unauthorized access" |
| Ryanair v. PR Aviation | CJEU | ToS cannot override EU database rights |
| Meta v. BrandTotal | US District | Dispute over scraping despite ToS; case-specific |
| SerpApi | Various | Commercial SERP APIs; legal posture depends on ToS and jurisdiction |
Case law evolves. Rely on current legal advice, not summaries alone.
Jurisdiction vs. Scraping Rules
| Jurisdiction | Key Rules | Scraping Guidance |
|---|---|---|
| US (federal) | CFAA | Public data, no auth bypass: generally OK per hiQ |
| EU | GDPR, ePrivacy | Personal data requires legal basis; minimize and document |
| UK | UK GDPR | Similar to EU; post-Brexit adjustments |
| California | CCPA | Consumer rights; disclosure and opt-out if selling/sharing |
| Germany | GDPR, UWG | Stricter on competition; consider local counsel |
Data Minimization and Retention
Even when scraping is legally permissible, minimize and limit personal data:
- Collect only what you need: Avoid scraping full profiles if you only need names and titles.
- Anonymize where possible: Aggregate or hash identifiers when individual identification is unnecessary.
- Set retention periods: Define how long you keep scraped data; implement automatic deletion.
- Document flows: Map where data goes—storage, processors, third parties. Required under GDPR for DPIAs.
Tools like Apify let you define retention policies on datasets. For self-hosted setups, implement cron jobs or TTL on storage.
robots.txt: Voluntary but Recommended
robots.txt is not legally binding in most jurisdictions. However:
- Honoring it reduces risk: Courts and plaintiffs may cite disregard as evidence of bad faith.
- Industry standard: Many targets assume compliance; violating it can trigger ToS enforcement.
- Crawl-delay: Some sites use
Crawl-delay(non-standard). Respecting it shows goodwill.
Parse robots.txt before crawling. Use libraries like robotexclusionrulesparser or Crawlee's built-in handling. When in doubt, err on the side of compliance.
Cease-and-Desist Response
If you receive a cease-and-desist:
- Stop immediately — Do not continue scraping that target.
- Preserve records — Log what was scraped, when, and for what purpose. Useful for defense or negotiation.
- Do not destroy data — Spoliation can hurt you in litigation. Retain until counsel advises.
- Consult counsel — Get jurisdiction-specific advice before responding or resuming.
- Document communications — Keep copies of all correspondence.
Some operators negotiate limited access or licensing. Others switch to official APIs or alternative sources. Each situation is fact-specific.
When to Consult Counsel
- Scraping personal data at scale
- Targets with known litigious postures
- Plans to resell or monetize scraped data
- Operations in multiple jurisdictions
- Before launching after receiving a cease-and-desist
Run through the compliance checklist for every new target. Document decisions. When personal data or commercial use is involved, get legal advice before scaling.
Not per se. Scraping publicly available data without bypassing authentication is generally legal in the US per hiQ v. LinkedIn. EU and other jurisdictions add GDPR and similar rules when personal data is involved.
Yes, when scraped data includes personal data of EU residents. You need a legal basis (e.g., legitimate interest), transparency, and must honor data subject rights (access, deletion, portability).
Legally, robots.txt is not binding in most jurisdictions, but respecting it is a best practice and reduces legal and reputational risk. Some sites rely on it; ignoring it can trigger ToS or CFAA arguments.
ToS are contracts. Violating them can lead to breach-of-contract claims. Courts differ on enforceability. When in doubt, avoid scraping or seek legal advice. hiQ did not address ToS directly.
Under GDPR, public data may be processed under legitimate interest, but you must still balance interests, provide transparency, and honor rights. Consent is one legal basis; it is not always required if another basis applies.
Stop scraping that target immediately. Preserve records of what you scraped and how. Consult legal counsel before resuming or responding.




