Apify Storage Guide: Datasets, Key-Value Stores & Request Queues
Quick answer
Apify has three storage primitives: Datasets for append-only rows, Key-Value Stores for arbitrary files and run state (including INPUT), and Request Queues for deduplicated URL frontiers in crawls. Retention rule that actually matters: a named storage is kept until you delete it. Everything else is an unnamed (default) storage that expires after 7 days. If production data is unnamed, it will disappear.
Storage types at a glance
| Storage | Purpose | Format | Typical use |
|---|---|---|---|
| Dataset | Tabular scrape output | Append-only list of JSON objects | Products, leads, posts, SERP rows |
| Key-Value Store | Files & blobs | Key → record (JSON, PNG, HTML, …) | INPUT, screenshots, checkpoints |
| Request Queue | Crawl frontier | Deduplicated request objects | BFS/DFS crawls, resume after crash |
Official overview: Storage usage.
Datasets
A dataset is an append-only log of items (like rows in a table). Each Actor run gets a default dataset; you can open named datasets for cross-run aggregation.
When to use datasets
- Exporting CSV / JSON / Excel for analysts.
- Feeding Google Sheets, warehouses, or BI tools.
- Passing scrape output to LangChain or other AI pipelines as JSONL-style rows.
JavaScript (Actor SDK)
await Actor.pushData([
{ title: "Product A", price: 29.99, url: "https://example.com/a" },
{ title: "Product B", price: 49.99, url: "https://example.com/b" },
]);
Python (Actor SDK)
await Actor.push_data([
{'title': 'Product A', 'price': 29.99, 'url': 'https://example.com/a'},
{'title': 'Product B', 'price': 49.99, 'url': 'https://example.com/b'},
])
Good to know
- Items are not updated in place. Append new versions if the source changes.
- Paginate reads with API
limit/offseton large datasets.
Key-Value Stores
A key-value store holds arbitrary records addressed by key. Every run has a default store; the INPUT key holds run configuration.
When to use key-value stores
- Persist crawler state for resumable scraping.
- Save screenshots, PDFs, or HTML captures.
- Cache intermediate blobs between steps.
JavaScript
await Actor.setValue("screenshot-homepage", screenshotBuffer, {
contentType: "image/png",
});
await Actor.setValue("crawl-state", { lastPage: 42, totalItems: 1500 });
const state = await Actor.getValue("crawl-state");
Python
await Actor.set_value('screenshot-homepage', screenshot_buffer, content_type='image/png')
await Actor.set_value('crawl-state', {'last_page': 42, 'total_items': 1500})
state = await Actor.get_value('crawl-state')
Request Queues
A request queue tracks which URLs to process, with automatic deduplication and status tracking for each request.
When to use request queues
- Site-wide crawls where links are discovered incrementally.
- Ensuring each normalized URL is processed once.
- Pairing with Crawlee / SDK crawlers that enqueue as they go.
JavaScript
const queue = await Actor.openRequestQueue();
await queue.addRequest({ url: "https://example.com/page1" });
await queue.addRequest({ url: "https://example.com/page2" });
Python
queue = await Actor.open_request_queue()
await queue.add_request('https://example.com/page1')
await queue.add_request('https://example.com/page2')
Only one run should process a given queue at a time; multiple writers can enqueue in some setups. See request queue docs.
Data retention
Per the official retention docs:
- Named storages are kept indefinitely on every plan, including Free.
- Unnamed (default) storages expire after 7 days by default, on every plan. The dataset, key-value store, and request queue a run creates are all unnamed unless you name them.
- Paid plans (Starter / Scale / Business): you can configure a longer data-retention window in your Billing settings. Defaults vary and change; check the Console.
- Apify also keeps your 10 most recent run records in the Runs list, but that does not extend the 7-day life of the unnamed storages those runs created.
What this means operationally
| Situation | Outcome |
|---|---|
| Default unnamed dataset from a run | Auto-deleted 7 days after the run |
Same dataset renamed weekly-leads | Kept until you delete it, on any plan |
| 6-month compliance archive | Must be named, or exported to S3 / Google Drive / your warehouse |
| Run succeeded but you forgot to rename | Export or rename within 7 days, before the unnamed storage expires |
The single most common storage mistake is a scheduled scraper writing to its default unnamed dataset: 7 days later, that run's data is gone. If the data matters, either name the storage inside the Actor code (Actor.openDataset('my-leads')) or export on the webhook.
Related
Sign up at Apify and explore Storage in the Console with monthly free credits.
Datasets store structured scrape results as JSON rows. Key-value stores hold arbitrary files and blobs (INPUT, screenshots, state). Request queues manage deduplicated URL frontiers for crawlers.
Named storages persist indefinitely on every plan. Unnamed (default) storages expire after 7 days by default. On paid plans (Starter, Scale, Business) the retention window is configurable in Billing settings. See docs.apify.com/platform/storage/usage for the official statement.
Unnamed storages are tied to default run IDs and auto-expire. Named storages are explicitly labeled and kept indefinitely, which makes them suitable for production datasets you must not lose.
Rename the storage (Console Actions → Rename, or open it by name in code with Actor.openDataset('my-name')) to keep it indefinitely. Or export to CSV/JSON, Google Sheets, Google Drive, S3, or your database before the 7-day window closes.
Yes. JSON, CSV, Excel, XML, RSS, and HTML exports are supported from the Console or API. Integrations can push rows to Sheets, Drive, or warehouses.
The default key-value store for each run stores INPUT under the INPUT key. Your scrape results usually go to the default dataset via pushData/push_data.



