All posts
technical5 min read

Document processing API: getting started guide

A developer guide to integrating PaperAI's document processing capabilities into your application via API — from authentication to upload, extraction, and webhook notifications.

By PaperAI Team

PaperAI's API lets you integrate document processing directly into your application. Upload documents, trigger AI conversion and extraction, and receive structured results programmatically — without your users needing to interact with the PaperAI interface.

API access is available on Scale and Enterprise plans.

Overview

The PaperAI API follows REST conventions:

  • Base URL: Available in your PaperAI dashboard under Settings → API
  • Authentication: API key passed in the Authorization header
  • Format: JSON request and response bodies
  • File upload: Multipart form data for document uploads

Core workflow

1. Upload a document

Upload a document file (PDF, image, Word, etc.) to create a document record:

POST /api/v1/documents
Content-Type: multipart/form-data
Authorization: Bearer YOUR_API_KEY

file: [binary document file]
folder_id: (optional) target folder
flow_id: (optional) Smart Flow to apply automatically

The response includes a document_id that you use for subsequent operations.

2. Trigger conversion

If you did not specify a flow_id during upload, trigger conversion manually:

POST /api/v1/documents/{document_id}/convert
Authorization: Bearer YOUR_API_KEY

{
  "model": "standard",
  "flow_id": "your-flow-id"
}

Model options: "standard" or "premium". The Flow defines extraction fields and settings.

3. Check status

Poll for conversion status or use webhooks (recommended):

GET /api/v1/documents/{document_id}/status
Authorization: Bearer YOUR_API_KEY

Status values: pending, processing, completed, failed.

4. Retrieve results

Once conversion is complete, retrieve the structured output:

GET /api/v1/documents/{document_id}/result
Authorization: Bearer YOUR_API_KEY

The response includes:

  • markdown: Full document conversion as Markdown
  • extracted_data: Structured JSON with your defined extraction fields
  • confidence_score: Overall accuracy confidence (0-100)
  • field_scores: Per-field confidence scores

5. Approve or reject

Programmatically approve or reject based on your business rules:

POST /api/v1/documents/{document_id}/approve
Authorization: Bearer YOUR_API_KEY

Webhooks

Instead of polling for status, configure webhooks to receive notifications:

POST /api/v1/webhooks
Authorization: Bearer YOUR_API_KEY

{
  "url": "https://your-app.com/webhooks/paperai",
  "events": ["document.completed", "document.failed"]
}

Webhook payloads include the document ID, status, confidence score, and extracted data — everything you need to process the result in your application.

Building a document pipeline

A typical integration pattern:

  1. Your application receives a document (email attachment, form upload, etc.)
  2. Upload it to PaperAI via API with the appropriate Smart Flow
  3. PaperAI processes the document and sends a webhook when complete
  4. Your webhook handler checks the confidence score
  5. High-confidence results: auto-import into your system
  6. Low-confidence results: queue for human review in your application or PaperAI's interface

Error handling

Robust error handling is essential for production integrations. The PaperAI API uses standard HTTP status codes and returns structured error responses.

Common error scenarios and how to handle them:

HTTP 400 Bad Request
Response:
  "error": "invalid_file_type",
  "message": "Unsupported file format. Supported: PDF, PNG, JPG, TIFF, DOCX, XLSX"
HTTP 401 Unauthorized
Response:
  "error": "invalid_api_key",
  "message": "The provided API key is invalid or has been revoked"
HTTP 429 Too Many Requests
Response:
  "error": "rate_limit_exceeded",
  "message": "Rate limit exceeded. Retry after 30 seconds",
  "retry_after": 30
HTTP 500 Internal Server Error
Response:
  "error": "processing_failed",
  "message": "Document processing failed. Please retry."

Recommended retry strategy:

  • For 429 errors, respect the retry_after value in the response.
  • For 500 errors, implement exponential backoff: wait 1 second, then 2, then 4, up to a maximum of 60 seconds. Limit total retries to 5 attempts.
  • For 400 errors, do not retry — these indicate a problem with the request that must be fixed in your code.
  • For 401 errors, do not retry — verify your API key is correct and active.

A typical error-handling wrapper looks like this:

function processDocument(file):
    retries = 0
    max_retries = 5
    delay = 1

    while retries < max_retries:
        response = POST /api/v1/documents with file

        if response.status == 200:
            return response.document_id

        if response.status == 429:
            wait(response.retry_after)
            retries += 1
            continue

        if response.status >= 500:
            wait(delay)
            delay = min(delay * 2, 60)
            retries += 1
            continue

        // 4xx errors - do not retry
        throw ClientError(response.error)

    throw MaxRetriesExceeded()

Pagination for large result sets

When working with folders containing hundreds or thousands of documents, list endpoints return paginated results. The API uses cursor-based pagination for consistent performance regardless of dataset size.

GET /api/v1/documents?folder_id=abc123&limit=50
Authorization: Bearer YOUR_API_KEY

The response includes pagination metadata:

Response:
  "documents": [...],
  "pagination":
    "next_cursor": "eyJpZCI6MTAwfQ==",
    "has_more": true,
    "total_count": 347

To fetch the next page, include the cursor:

GET /api/v1/documents?folder_id=abc123&limit=50&cursor=eyJpZCI6MTAwfQ==
Authorization: Bearer YOUR_API_KEY

Pagination best practices:

  • Use a limit between 20 and 100. Larger pages reduce the number of requests but increase response time and memory usage.
  • Always check has_more before making the next request. Do not assume a fixed number of pages.
  • Store the next_cursor value and use it for the subsequent request. Cursors are opaque strings — do not parse or modify them.
  • For bulk export operations, process each page as it arrives rather than loading all pages into memory first.

Security best practices

API keys grant access to your organization's documents and extracted data. Treat them with the same care as database credentials or encryption keys.

Key management:

  • Never embed API keys directly in client-side code, mobile applications, or public repositories. Use environment variables or a secrets manager (AWS Secrets Manager, HashiCorp Vault, Azure Key Vault).
  • Rotate API keys periodically — at minimum every 90 days, and immediately if a key may have been exposed.
  • Use separate API keys for development, staging, and production environments. This limits the blast radius if a development key is accidentally committed to version control.
  • PaperAI supports multiple active API keys per organization. Create dedicated keys for each integration rather than sharing a single key across all systems.

Network security:

  • All API communication must use HTTPS. The API rejects plain HTTP requests.
  • If your infrastructure supports it, restrict outbound API calls to PaperAI's published IP ranges using firewall rules or network security groups.
  • For webhook endpoints, verify the webhook signature included in the X-PaperAI-Signature header to confirm the request originated from PaperAI and was not tampered with in transit.

Data handling:

  • Minimize the retention of extracted data in your systems. Pull the data you need and avoid storing full document content unless your use case requires it.
  • Apply the principle of least privilege: grant API keys only the scopes they need. Read-only keys should be used for reporting integrations that do not need to upload or modify documents.
  • Log all API interactions on your side for audit purposes, but redact the API key from logs (log only the last four characters for identification).

Rate limits and best practices

  • Rate limits: Vary by plan. Scale plan includes generous limits for production use.
  • File size: Maximum 50 MB per document
  • Batch uploads: Upload documents in parallel for faster throughput
  • Error handling: Implement exponential backoff for retries on transient errors
  • Idempotency: Use unique client-generated IDs to prevent duplicate processing

Getting started

  1. Sign up for a Scale or Enterprise plan
  2. Generate an API key in Settings → API
  3. Test with a single document upload
  4. Set up webhooks for production use
  5. Build your integration

Full API reference documentation is available at /docs.

Ready to try this yourself?

Start free with 100 credits. No credit card required.

Get Started Free

Product updates & tips