Apify Storage Guide: Datasets, Key-Value Stores & Request Queues

Quick answer

Apify has three storage primitives: Datasets for append-only rows, Key-Value Stores for arbitrary files and run state (including INPUT), and Request Queues for deduplicated URL frontiers in crawls. Retention rule that actually matters: a named storage is kept until you delete it. Everything else is an unnamed (default) storage that expires after 7 days. If production data is unnamed, it will disappear.

Storage types at a glance

Storage	Purpose	Format	Typical use
Dataset	Tabular scrape output	Append-only list of JSON objects	Products, leads, posts, SERP rows
Key-Value Store	Files & blobs	Key → record (JSON, PNG, HTML, …)	`INPUT`, screenshots, checkpoints
Request Queue	Crawl frontier	Deduplicated request objects	BFS/DFS crawls, resume after crash

Official overview: Storage usage.

Datasets

A dataset is an append-only log of items (like rows in a table). Each Actor run gets a default dataset; you can open named datasets for cross-run aggregation.

When to use datasets

Exporting CSV / JSON / Excel for analysts.
Feeding Google Sheets, warehouses, or BI tools.
Passing scrape output to LangChain or other AI pipelines as JSONL-style rows.

JavaScript (Actor SDK)

await Actor.pushData([
  { title: "Product A", price: 29.99, url: "https://example.com/a" },
  { title: "Product B", price: 49.99, url: "https://example.com/b" },
]);

Python (Actor SDK)

await Actor.push_data([
    {'title': 'Product A', 'price': 29.99, 'url': 'https://example.com/a'},
    {'title': 'Product B', 'price': 49.99, 'url': 'https://example.com/b'},
])

Good to know

Items are not updated in place. Append new versions if the source changes.
Paginate reads with API limit / offset on large datasets.

Key-Value Stores

A key-value store holds arbitrary records addressed by key. Every run has a default store; the INPUT key holds run configuration.

When to use key-value stores

Persist crawler state for resumable scraping.
Save screenshots, PDFs, or HTML captures.
Cache intermediate blobs between steps.

JavaScript

await Actor.setValue("screenshot-homepage", screenshotBuffer, {
  contentType: "image/png",
});

await Actor.setValue("crawl-state", { lastPage: 42, totalItems: 1500 });
const state = await Actor.getValue("crawl-state");

Python

await Actor.set_value('screenshot-homepage', screenshot_buffer, content_type='image/png')

await Actor.set_value('crawl-state', {'last_page': 42, 'total_items': 1500})
state = await Actor.get_value('crawl-state')

Request Queues

A request queue tracks which URLs to process, with automatic deduplication and status tracking for each request.

When to use request queues

Site-wide crawls where links are discovered incrementally.
Ensuring each normalized URL is processed once.
Pairing with Crawlee / SDK crawlers that enqueue as they go.

JavaScript

const queue = await Actor.openRequestQueue();
await queue.addRequest({ url: "https://example.com/page1" });
await queue.addRequest({ url: "https://example.com/page2" });

Python

queue = await Actor.open_request_queue()
await queue.add_request('https://example.com/page1')
await queue.add_request('https://example.com/page2')

Only one run should process a given queue at a time; multiple writers can enqueue in some setups. See request queue docs.

Data retention

Per the official retention docs:

Named storages are kept indefinitely on every plan, including Free.
Unnamed (default) storages expire after 7 days by default, on every plan. The dataset, key-value store, and request queue a run creates are all unnamed unless you name them.
Paid plans (Starter / Scale / Business): you can configure a longer data-retention window in your Billing settings. Defaults vary and change; check the Console.
Apify also keeps your 10 most recent run records in the Runs list, but that does not extend the 7-day life of the unnamed storages those runs created.

What this means operationally

Situation	Outcome
Default unnamed dataset from a run	Auto-deleted 7 days after the run
Same dataset renamed `weekly-leads`	Kept until you delete it, on any plan
6-month compliance archive	Must be named, or exported to S3 / Google Drive / your warehouse
Run succeeded but you forgot to rename	Export or rename within 7 days, before the unnamed storage expires

Rename or export within 7 days

The single most common storage mistake is a scheduled scraper writing to its default unnamed dataset: 7 days later, that run's data is gone. If the data matters, either name the storage inside the Actor code (Actor.openDataset('my-leads')) or export on the webhook.

Create a free Apify account

Sign up at Apify and explore Storage in the Console with monthly free credits.

Frequently Asked Questions

Datasets store structured scrape results as JSON rows. Key-value stores hold arbitrary files and blobs (INPUT, screenshots, state). Request queues manage deduplicated URL frontiers for crawlers.

Named storages persist indefinitely on every plan. Unnamed (default) storages expire after 7 days by default. On paid plans (Starter, Scale, Business) the retention window is configurable in Billing settings. See docs.apify.com/platform/storage/usage for the official statement.

Unnamed storages are tied to default run IDs and auto-expire. Named storages are explicitly labeled and kept indefinitely, which makes them suitable for production datasets you must not lose.

Rename the storage (Console Actions → Rename, or open it by name in code with Actor.openDataset('my-name')) to keep it indefinitely. Or export to CSV/JSON, Google Sheets, Google Drive, S3, or your database before the 7-day window closes.

Yes. JSON, CSV, Excel, XML, RSS, and HTML exports are supported from the Console or API. Integrations can push rows to Sheets, Drive, or warehouses.

The default key-value store for each run stores INPUT under the INPUT key. Your scrape results usually go to the default dataset via pushData/push_data.

Quick answer​

Storage types at a glance​

Datasets​

When to use datasets​

JavaScript (Actor SDK)​

Python (Actor SDK)​

Good to know​

Key-Value Stores​

When to use key-value stores​

JavaScript​

Python​

Request Queues​

When to use request queues​

JavaScript​

Python​

Data retention​

What this means operationally​

Related​

Quick answer

Storage types at a glance

Datasets

When to use datasets

JavaScript (Actor SDK)

Python (Actor SDK)

Good to know

Key-Value Stores

When to use key-value stores

JavaScript

Python

Request Queues

When to use request queues

JavaScript

Python

Data retention

What this means operationally

Related