> ## Documentation Index
> Fetch the complete documentation index at: https://docs.deck.co/llms.txt
> Use this file to discover all available pages before exploring further.

# Storage & document extraction

> Capture documents, images, videos, and other files from external sources, and extract structured data.

<Note>
  Available as an add-on on paid plans.
</Note>

Tasks can produce files like receipts, invoices, statements, reports, images, videos, and spreadsheets. When storage is enabled on a task, Deck captures the files the agent is instructed to collect and makes them available through the API. If extraction is also enabled, Deck parses supported files and returns structured JSON alongside the raw file.

## Enabling storage on a task

Storage is configured when you create or update a task. Set `storage.enabled` to `true` to capture files. Set `storage.extraction` to `true` to also extract structured data from those files.

```json theme={null}
POST /v2/tasks

{
  "name": "Fetch utility bills",
  "agent_id": "agt_a1b2c3d4...",
  "input_schema": {
    "type": "object",
    "properties": {
      "start_date": { "type": "string" },
      "end_date": { "type": "string" }
    }
  },
  "output_schema": {
    "type": "object",
    "properties": {
      "bill_count": { "type": "integer" }
    }
  },
  "storage": {
    "enabled": true,
    "extraction": true
  }
}
```

| Field                          | Type    | Description                                                                                              |
| ------------------------------ | ------- | -------------------------------------------------------------------------------------------------------- |
| `storage.enabled`              | boolean | Capture files produced during task execution                                                             |
| `storage.extraction`           | boolean | Parse captured files and extract structured data                                                         |
| `storage.extraction_schema`    | object  | JSON Schema describing the fields to extract. Required when `extraction` is `true`.                      |
| `storage.extraction_prompt`    | string  | Optional natural-language guidance for the extraction model.                                             |
| `storage.deduplication`        | boolean | Enable deduplication to skip files that match a previous capture. See [Deduplication](#deduplication).   |
| `storage.deduplication_schema` | object  | JSON Schema describing the fields used for duplicate detection. Required when `deduplication` is `true`. |

## Retrieving storage items

Task run responses include a `storage` array of lightweight summaries. To get extracted data and a download URL, list a task run's storage items or fetch a single item by ID.

```bash theme={null}
curl https://api.deck.co/v2/task-runs/trun_a1b2c3d4/storage \
  -H "Authorization: Bearer sk_live_your_key_here"
```

```json theme={null}
{
  "data": [
    {
      "id": "stor_x1y2z3...",
      "object": "storage",
      "file_name": "statement_jan_2025.pdf",
      "file_type": "application/pdf",
      "file_size": 245678,
      "extraction": null,
      "created_at": "2025-01-23T14:30:00Z"
    },
    {
      "id": "stor_a4b5c6...",
      "object": "storage",
      "file_name": "statement_dec_2024.pdf",
      "file_type": "application/pdf",
      "file_size": 198432,
      "extraction": {
        "company_name": "EnergyLink",
        "account_number": "58291-44720",
        "billing_date": "2024-12-22",
        "amount_due": 6925.18,
        "currency": "USD"
      },
      "created_at": "2025-01-22T09:15:00Z"
    }
  ],
  "has_more": false,
  "next_cursor": null,
  "request_id": "req_f5g6h7..."
}
```

## Storage item fields

| Field         | Task run summary | List items | Get item |
| ------------- | :--------------: | :--------: | :------: |
| `id`          |         ✓        |      ✓     |     ✓    |
| `file_name`   |         ✓        |      ✓     |     ✓    |
| `file_type`   |         ✓        |      ✓     |     ✓    |
| `file_size`   |         ✓        |      ✓     |     ✓    |
| `created_at`  |         ✓        |      ✓     |     ✓    |
| `extraction`  |                  |      ✓     |     ✓    |
| `url`         |                  |            |     ✓    |
| `task_run_id` |                  |            |     ✓    |

<ResponseField name="id" type="string">
  Unique identifier, prefixed with `stor_`.
</ResponseField>

<ResponseField name="file_name" type="string">
  Original file name as it appeared on the source.
</ResponseField>

<ResponseField name="file_type" type="string">
  MIME type (`application/pdf`, `image/png`, `video/mp4`, `text/csv`, etc.).
</ResponseField>

<ResponseField name="file_size" type="integer">
  Size in bytes.
</ResponseField>

<ResponseField name="created_at" type="datetime">
  When the storage item was created.
</ResponseField>

<ResponseField name="extraction" type="object or null">
  Structured data extracted from the file, if extraction is enabled.
</ResponseField>

<ResponseField name="url" type="string">
  Signed download URL.
</ResponseField>

<ResponseField name="task_run_id" type="string">
  The task run that produced this storage item.
</ResponseField>

## Downloading files

Fetch a single storage item by ID. The response includes a signed `url` you can use to download the raw file. URLs are time-limited; if one expires, re-fetch the item to get a fresh one.

```bash theme={null}
curl https://api.deck.co/v2/storage/stor_x1y2z3 \
  -H "Authorization: Bearer sk_live_your_key_here"
```

## Document extraction

When extraction is enabled, Deck parses the captured files and populates the `extraction` field on each storage item with structured JSON.

Extraction works with common document types including PDFs, spreadsheets, invoices, receipts, and reports. The extracted data depends on the document. A utility bill produces different fields than a hotel receipt.

### Custom extraction schemas

Use the `extraction_schema` field on the task's storage config to define exactly what fields you want extracted. Deck uses this schema to guide parsing.

```json theme={null}
{
  "storage": {
    "enabled": true,
    "extraction": true,
    "extraction_schema": {
      "type": "object",
      "properties": {
        "vendor_name": { "type": "string" },
        "total_amount": { "type": "number" },
        "invoice_date": { "type": "string", "format": "date" },
        "line_items": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "description": { "type": "string" },
              "amount": { "type": "number" }
            }
          }
        }
      }
    }
  }
}
```

### Extraction example

A utility bill extraction might return:

```json theme={null}
{
  "company_name": "EnergyLink",
  "account_number": "58291-44720",
  "billing_date": "2025-01-22",
  "billing_period": {
    "start_date": "2024-12-18",
    "end_date": "2025-01-20",
    "total_days": 33
  },
  "amount_due": 8247.41,
  "payment_due_date": "2025-02-14",
  "currency": "USD",
  "service_locations": [
    {
      "service_type": "Fuel",
      "service_address": {
        "street": "4421 OAK VIEW LN UNIT 3A",
        "city": "CAMBRIDGE",
        "state": "MA",
        "postal_code": "02140"
      },
      "total_usage": 5348,
      "total_usage_unit": "Therms",
      "total_charges": 8214.67
    }
  ]
}
```

### Extraction errors

If extraction fails on a file, the `extraction` field on that storage item stays `null`. The raw file is still available for download.

If any file in a task run fails extraction, the task run itself completes with a `failure` result and an `extraction_failed` error in the `errors` array indicating how many files were affected. Successfully extracted files in the same run still return their `extraction` data.

## Deduplication

Deduplication tells Deck to skip files that match one captured by a previous run for the same task and credential, so recurring tasks only return new documents.

You define a set of fields that uniquely identify a document. Deck reads those fields from each captured file and compares them against prior captures. If every field matches, the new file is dropped: it's not stored, no `storage.created` event fires, and it's not extracted even if extraction is enabled.

### Configuration

Set `deduplication` to `true` and provide a `deduplication_schema` on the task's storage config:

```json theme={null}
{
  "storage": {
    "enabled": true,
    "deduplication": true,
    "deduplication_schema": {
      "type": "object",
      "properties": {
        "account_number": {
          "type": "string",
          "description": "The utility account number"
        },
        "billing_period_start": {
          "type": "string",
          "description": "Start date of the billing period (YYYY-MM-DD)"
        }
      }
    }
  }
}
```

| Field                  | Type    | Description                                                                                                           |
| ---------------------- | ------- | --------------------------------------------------------------------------------------------------------------------- |
| `deduplication`        | boolean | Turn deduplication on for this task                                                                                   |
| `deduplication_schema` | object  | JSON Schema with a `properties` map of `field_name → { type, description }`. Required when `deduplication` is `true`. |

Each property must declare a `type`, one of `string`, `integer`, `number`, or `boolean`. Nested objects and arrays aren't supported, so pick top-level scalar fields.

Property names are arbitrary; you make them up, and they're just keys for the result. The `description` is what tells Deck where to find the value on each document, so describe each field precisely. For example, `"Account number printed at the top of the bill"` works better than a vague `"account"`.

### Choosing fields

The fields you list together form the dedup key. Two files match only if **every** field is identical. A few rules of thumb:

* **Pick fields that stay stable for the same logical document.** A monthly bill should have the same account number and billing period every time it's fetched.
* **Avoid volatile fields.** File names, fetch dates, and page numbers will produce false negatives, since the same document looks new every time.
* **Pick enough fields to be unique.** A single field like `vendor_name` will collide across unrelated invoices from the same vendor.
* **Two or three fields is usually right.**

### Field combinations by document type

<Tabs>
  <Tab title="Utility bills">
    Account plus billing period:

    ```json theme={null}
    "deduplication_schema": {
      "type": "object",
      "properties": {
        "account_number": { "type": "string", "description": "Utility account number" },
        "billing_period_start": { "type": "string", "description": "Billing period start date (YYYY-MM-DD)" }
      }
    }
    ```
  </Tab>

  <Tab title="Invoices">
    Vendor plus invoice number:

    ```json theme={null}
    "deduplication_schema": {
      "type": "object",
      "properties": {
        "vendor_name": { "type": "string", "description": "Vendor or supplier name" },
        "invoice_number": { "type": "string", "description": "Invoice number printed on the document" }
      }
    }
    ```
  </Tab>

  <Tab title="Receipts">
    Merchant, date, and total, since receipts often lack a unique ID:

    ```json theme={null}
    "deduplication_schema": {
      "type": "object",
      "properties": {
        "merchant_name": { "type": "string", "description": "Merchant or store name" },
        "transaction_date": { "type": "string", "description": "Date of purchase (YYYY-MM-DD)" },
        "total_amount": { "type": "number", "description": "Total amount charged" }
      }
    }
    ```
  </Tab>

  <Tab title="Bank and credit-card statements">
    Account plus statement period:

    ```json theme={null}
    "deduplication_schema": {
      "type": "object",
      "properties": {
        "account_number": { "type": "string", "description": "Account number as printed on the statement" },
        "statement_period_end": { "type": "string", "description": "Statement period end date (YYYY-MM-DD)" }
      }
    }
    ```
  </Tab>
</Tabs>

### Errors

`deduplication_schema` must be present with at least one property whenever `deduplication` is `true`. Starting a task run without it returns:

```
422 Unprocessable Entity
deduplication_schema must be defined before running a task with deduplication enabled.
```

To disable deduplication, set `deduplication` to `false` (or omit it entirely).

## Events

Storage items emit events you can subscribe to through [event destinations](/events/events):

| Event             | When it fires                                          |
| ----------------- | ------------------------------------------------------ |
| `storage.created` | A new file has been captured and is ready for download |

## Retention

Retention period varies by plan. All files are deleted after 90 days.
