Storage & document extraction

Available as an add-on on paid plans.

Tasks can produce files like receipts, invoices, statements, reports, images, videos, and spreadsheets. When storage is enabled on a task, Deck captures the files the agent is instructed to collect and makes them available through the API. If extraction is also enabled, Deck parses supported files and returns structured JSON alongside the raw file.

Enabling storage on a task

Storage is configured when you create or update a task. Set storage.enabled to true to capture files. Set storage.extraction to true to also extract structured data from those files.

POST /v2/tasks

{
  "name": "Fetch utility bills",
  "agent_id": "agt_a1b2c3d4...",
  "input_schema": {
    "type": "object",
    "properties": {
      "start_date": { "type": "string" },
      "end_date": { "type": "string" }
    }
  },
  "output_schema": {
    "type": "object",
    "properties": {
      "bill_count": { "type": "integer" }
    }
  },
  "storage": {
    "enabled": true,
    "extraction": true
  }
}

Field	Type	Description
`storage.enabled`	boolean	Capture files produced during task execution
`storage.extraction`	boolean	Parse captured files and extract structured data
`storage.extraction_schema`	object	JSON Schema describing the fields to extract. Required when `extraction` is `true`.
`storage.extraction_prompt`	string	Optional natural-language guidance for the extraction model.
`storage.deduplication`	boolean	Enable deduplication to skip files that match a previous capture. See Deduplication.
`storage.deduplication_schema`	object	JSON Schema describing the fields used for duplicate detection. Required when `deduplication` is `true`.

Retrieving storage items

Task run responses include a storage array of lightweight summaries. To get extracted data and a pre-signed download URL, list a task run’s storage items or fetch a single item by ID.

curl https://api.deck.co/v2/task-runs/trun_a1b2c3d4/storage \
  -H "Authorization: Bearer sk_live_your_key_here"

{
  "data": [
    {
      "id": "stor_x1y2z3...",
      "object": "storage",
      "file_name": "statement_jan_2025.pdf",
      "file_type": "application/pdf",
      "file_size": 245678,
      "url": "https://files.deck.co/stor_x1y2z3...?signature=...",
      "extraction": null,
      "created_at": "2025-01-23T14:30:00Z"
    },
    {
      "id": "stor_a4b5c6...",
      "object": "storage",
      "file_name": "statement_dec_2024.pdf",
      "file_type": "application/pdf",
      "file_size": 198432,
      "url": "https://files.deck.co/stor_a4b5c6...?signature=...",
      "extraction": {
        "company_name": "EnergyLink",
        "account_number": "58291-44720",
        "billing_date": "2024-12-22",
        "amount_due": 6925.18,
        "currency": "USD"
      },
      "created_at": "2025-01-22T09:15:00Z"
    }
  ],
  "has_more": false,
  "next_cursor": null,
  "request_id": "req_f5g6h7..."
}

Storage item fields

Field	Task run summary	List items	Get item
`id`	✓	✓	✓
`file_name`	✓	✓	✓
`file_type`	✓	✓	✓
`file_size`	✓	✓	✓
`purpose`	✓	✓	✓
`created_at`	✓	✓	✓
`extraction`		✓	✓
`url`		✓	✓
`task_run_id`			✓

string

Unique identifier, prefixed with stor_.

file_name

string

Original file name as it appeared on the source.

file_type

string

MIME type (application/pdf, image/png, video/mp4, text/csv, etc.).

file_size

integer

Size in bytes.

purpose

string

output for files the agent captures during a run, attachment for files you provide as task input, or extraction for files Deck processed via direct extraction.

created_at

datetime

When the storage item was created.

extraction

object or null

Structured data extracted from the file, if extraction is enabled.

url

string

Signed download URL.

task_run_id

string

The task run that produced this storage item.

Downloading files

Both list and get-by-id responses include a pre-signed url you can use to download the raw file. URLs are time-limited; if one expires, re-fetch the item to get a fresh one.

curl https://api.deck.co/v2/storage/stor_x1y2z3 \
  -H "Authorization: Bearer sk_live_your_key_here"

Providing files as input

Tasks can accept files as input. The file is uploaded to storage and malware-scanned, then either handed to the agent or extracted directly by Deck, depending on the field’s purpose:

`purpose`	What Deck does	Available on
`attachment`	The agent receives the file at run time and uses it on the source.	Enterprise plans
`extraction`	Deck extracts structured JSON from the file directly. The agent is skipped.	Enterprise plans with the extraction and storage add-ons

Both purposes share the same field shape and upload behavior, and differ in what happens after the upload. A single task can declare attachment file inputs or extraction file inputs, not both. Creating a task whose input schema mixes both is rejected with an input_invalid error.

Defining the field

Define a file field in the input schema. The shape is the same for both purposes; only the purpose constant changes.

"resume": {
  "type": "object",
  "properties": {
    "purpose": { "const": "attachment" },
    "file_name": { "type": "string" },
    "content_type": { "type": "string" },
    "data": { "type": "string", "contentEncoding": "base64" }
  }
}

Field	Description
`purpose`	Constant `"attachment"` or `"extraction"`. Marks the field as a file input and selects how Deck handles it.
`file_name`	Original file name, e.g. `resume.pdf`.
`content_type`	MIME type, e.g. `application/pdf`.
`data`	The file contents, base64-encoded.

Sending a file

Provide the file inline as base64 in the task run input:

POST /v2/tasks/task_a1b2c3d4.../run

{
  "credential_id": "cred_a1b2c3d4...",
  "input": {
    "applicant_name": "Jordan Lee",
    "resume": {
      "purpose": "attachment",
      "file_name": "resume.pdf",
      "content_type": "application/pdf",
      "data": "JVBERi0xLjQKJ..."
    }
  }
}

Each file can be up to 20 MB. Larger files, or invalid base64, are rejected synchronously with a validation error and the run isn’t created. After upload, every file is malware-scanned asynchronously. The rest of the run lifecycle depends on the purpose.

Attachments

With purpose: "attachment", the agent receives the file at run time and uses it on the source: uploading it to a portal, attaching it to a form, or referencing it while completing the task. The run stays queued until every attachment passes the scan, then transitions to running and dispatches to the agent. If a file is flagged, the run fails with an attachment_invalid error on the task run object. Listen for task_run.failed or fetch the run to handle it. Deck replaces the base64 data in the stored input with a storage_id reference so the raw bytes aren’t carried through the run. The file becomes a storage item with purpose: "attachment", alongside the output files the agent captures, and appears under the Input tab on the task run in the Console.

Extraction

With purpose: "extraction", Deck processes the file directly against the task’s extraction_schema and returns the structured result on the run. The agent doesn’t execute. Use this when you have a document and want JSON back, with no source interaction. The run transitions to running as soon as the file is uploaded, and finalizes when extraction completes. It doesn’t sit in queued waiting on the scan. The task must have storage and extraction enabled, with an extraction_schema defining the result shape:

POST /v2/tasks

{
  "name": "Extract utility bill",
  "agent_id": "agt_a1b2c3d4...",
  "input_schema": {
    "type": "object",
    "properties": {
      "bill": {
        "type": "object",
        "properties": {
          "purpose": { "const": "extraction" },
          "file_name": { "type": "string" },
          "content_type": { "type": "string" },
          "data": { "type": "string", "contentEncoding": "base64" }
        }
      }
    },
    "required": ["bill"]
  },
  "storage": {
    "enabled": true,
    "extraction": true,
    "extraction_schema": {
      "type": "object",
      "properties": {
        "vendor_name": { "type": "string" },
        "total_amount": { "type": "number" },
        "invoice_date": { "type": "string", "format": "date" }
      }
    }
  }
}

To run it, send the file in the extraction field. The run still needs a credential_id or source_id to link the extraction to a user or source, even though the agent doesn’t execute.

POST /v2/tasks/task_a1b2c3d4.../run

{
  "credential_id": "cred_a1b2c3d4...",
  "input": {
    "bill": {
      "purpose": "extraction",
      "file_name": "january-bill.pdf",
      "content_type": "application/pdf",
      "data": "JVBERi0xLjQKJ..."
    }
  }
}

A task accepts one extraction input per run. The file becomes a storage item with purpose: "extraction" carrying the extracted data, matching the fields defined in extraction_schema. See Document extraction for guidance on writing extraction schemas.

Reusing a file across runs

Once a file is uploaded, you can reference it on a later run instead of sending the bytes again. Pass the storage_id in place of data, keeping the same purpose:

"resume": {
  "purpose": "attachment",
  "storage_id": "stor_x1y2z3..."
}

The same shape works with purpose: "extraction". Deck verifies the file belongs to your organization, then copies it for the new run so each run keeps its own input files.

Document extraction

When extraction is enabled, Deck parses the captured files and populates the extraction field on each storage item with structured JSON. Extraction works with common document types including PDFs, spreadsheets, invoices, receipts, and reports. The extracted data depends on the document. A utility bill produces different fields than a hotel receipt.

Custom extraction schemas

Use the extraction_schema field on the task’s storage config to define exactly what fields you want extracted. Deck uses this schema to guide parsing.

{
  "storage": {
    "enabled": true,
    "extraction": true,
    "extraction_schema": {
      "type": "object",
      "properties": {
        "vendor_name": { "type": "string" },
        "total_amount": { "type": "number" },
        "invoice_date": { "type": "string", "format": "date" },
        "line_items": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "description": { "type": "string" },
              "amount": { "type": "number" }
            }
          }
        }
      }
    }
  }
}

Extraction example

A utility bill extraction might return:

{
  "company_name": "EnergyLink",
  "account_number": "58291-44720",
  "billing_date": "2025-01-22",
  "billing_period": {
    "start_date": "2024-12-18",
    "end_date": "2025-01-20",
    "total_days": 33
  },
  "amount_due": 8247.41,
  "payment_due_date": "2025-02-14",
  "currency": "USD",
  "service_locations": [
    {
      "service_type": "Fuel",
      "service_address": {
        "street": "4421 OAK VIEW LN UNIT 3A",
        "city": "CAMBRIDGE",
        "state": "MA",
        "postal_code": "02140"
      },
      "total_usage": 5348,
      "total_usage_unit": "Therms",
      "total_charges": 8214.67
    }
  ]
}

Extraction errors

If extraction fails on a file, the extraction field on that storage item stays null. The raw file is still available for download. If any file in a task run fails extraction, the task run itself completes with a failure result and an extraction_failed error in the errors array indicating how many files were affected. Successfully extracted files in the same run still return their extraction data.

Deduplication

Deduplication tells Deck to skip files that match one captured by a previous run for the same task and credential, so recurring tasks only return new documents. You define a set of fields that uniquely identify a document. Deck reads those fields from each captured file and compares them against prior captures. If every field matches, the new file is dropped: it’s not stored, no storage.created event fires, and it’s not extracted even if extraction is enabled.

Configuration

Set deduplication to true and provide a deduplication_schema on the task’s storage config:

{
  "storage": {
    "enabled": true,
    "deduplication": true,
    "deduplication_schema": {
      "type": "object",
      "properties": {
        "account_number": {
          "type": "string",
          "description": "The utility account number"
        },
        "billing_period_start": {
          "type": "string",
          "description": "Start date of the billing period (YYYY-MM-DD)"
        }
      }
    }
  }
}

Field	Type	Description
`deduplication`	boolean	Turn deduplication on for this task
`deduplication_schema`	object	JSON Schema with a `properties` map of `field_name → { type, description }`. Required when `deduplication` is `true`.

Each property must declare a type, one of string, integer, number, or boolean. Nested objects and arrays aren’t supported, so pick top-level scalar fields. Property names are arbitrary; you make them up, and they’re just keys for the result. The description is what tells Deck where to find the value on each document, so describe each field precisely. For example, "Account number printed at the top of the bill" works better than a vague "account".

Choosing fields

The fields you list together form the dedup key. Two files match only if every field is identical. A few rules of thumb:

Pick fields that stay stable for the same logical document. A monthly bill should have the same account number and billing period every time it’s fetched.
Avoid volatile fields. File names, fetch dates, and page numbers will produce false negatives, since the same document looks new every time.
Pick enough fields to be unique. A single field like vendor_name will collide across unrelated invoices from the same vendor.
Two or three fields is usually right.

Field combinations by document type

Utility bills
Invoices
Receipts
Bank and credit-card statements

Account plus billing period:

"deduplication_schema": {
  "type": "object",
  "properties": {
    "account_number": { "type": "string", "description": "Utility account number" },
    "billing_period_start": { "type": "string", "description": "Billing period start date (YYYY-MM-DD)" }
  }
}

Vendor plus invoice number:

"deduplication_schema": {
  "type": "object",
  "properties": {
    "vendor_name": { "type": "string", "description": "Vendor or supplier name" },
    "invoice_number": { "type": "string", "description": "Invoice number printed on the document" }
  }
}

Merchant, date, and total, since receipts often lack a unique ID:

"deduplication_schema": {
  "type": "object",
  "properties": {
    "merchant_name": { "type": "string", "description": "Merchant or store name" },
    "transaction_date": { "type": "string", "description": "Date of purchase (YYYY-MM-DD)" },
    "total_amount": { "type": "number", "description": "Total amount charged" }
  }
}

Account plus statement period:

"deduplication_schema": {
  "type": "object",
  "properties": {
    "account_number": { "type": "string", "description": "Account number as printed on the statement" },
    "statement_period_end": { "type": "string", "description": "Statement period end date (YYYY-MM-DD)" }
  }
}

Errors

deduplication_schema must be present with at least one property whenever deduplication is true. Starting a task run without it returns:

422 Unprocessable Entity
deduplication_schema must be defined before running a task with deduplication enabled.

To disable deduplication, set deduplication to false (or omit it entirely).

Events

Storage items emit events you can subscribe to through event destinations:

Event	When it fires
`storage.created`	A new file has been captured and is ready for download

Retention

Retention period varies by plan. All files are deleted after 90 days.

​Enabling storage on a task

​Retrieving storage items

​Storage item fields

​Downloading files

​Providing files as input

​Defining the field

​Sending a file

​Attachments

​Extraction

​Reusing a file across runs

​Document extraction

​Custom extraction schemas

​Extraction example

​Extraction errors

​Deduplication

​Configuration

​Choosing fields

​Field combinations by document type

​Errors

​Events

​Retention

Enabling storage on a task

Retrieving storage items

Storage item fields

Downloading files

Providing files as input

Defining the field

Sending a file

Attachments

Extraction

Reusing a file across runs

Document extraction

Custom extraction schemas

Extraction example

Extraction errors

Deduplication

Configuration

Choosing fields

Field combinations by document type

Errors

Events

Retention