PaperAI's API lets you integrate document processing directly into your application. Upload documents, trigger AI conversion and extraction, and receive structured results programmatically — without your users needing to interact with the PaperAI interface.
API access is available on Scale and Enterprise plans.
Overview
The PaperAI API follows REST conventions:
- Base URL: Available in your PaperAI dashboard under Settings → API
- Authentication: API key passed in the
Authorizationheader - Format: JSON request and response bodies
- File upload: Multipart form data for document uploads
Core workflow
1. Upload a document
Upload a document file (PDF, image, Word, etc.) to create a document record:
POST /api/v1/documents
Content-Type: multipart/form-data
Authorization: Bearer YOUR_API_KEY
file: [binary document file]
folder_id: (optional) target folder
flow_id: (optional) Smart Flow to apply automatically
The response includes a document_id that you use for subsequent operations.
2. Trigger conversion
If you did not specify a flow_id during upload, trigger conversion manually:
POST /api/v1/documents/{document_id}/convert
Authorization: Bearer YOUR_API_KEY
{
"model": "standard",
"flow_id": "your-flow-id"
}
Model options: "standard" or "premium". The Flow defines extraction fields and settings.
3. Check status
Poll for conversion status or use webhooks (recommended):
GET /api/v1/documents/{document_id}/status
Authorization: Bearer YOUR_API_KEY
Status values: pending, processing, completed, failed.
4. Retrieve results
Once conversion is complete, retrieve the structured output:
GET /api/v1/documents/{document_id}/result
Authorization: Bearer YOUR_API_KEY
The response includes:
markdown: Full document conversion as Markdownextracted_data: Structured JSON with your defined extraction fieldsconfidence_score: Overall accuracy confidence (0-100)field_scores: Per-field confidence scores
5. Approve or reject
Programmatically approve or reject based on your business rules:
POST /api/v1/documents/{document_id}/approve
Authorization: Bearer YOUR_API_KEY
Webhooks
Instead of polling for status, configure webhooks to receive notifications:
POST /api/v1/webhooks
Authorization: Bearer YOUR_API_KEY
{
"url": "https://your-app.com/webhooks/paperai",
"events": ["document.completed", "document.failed"]
}
Webhook payloads include the document ID, status, confidence score, and extracted data — everything you need to process the result in your application.
Building a document pipeline
A typical integration pattern:
- Your application receives a document (email attachment, form upload, etc.)
- Upload it to PaperAI via API with the appropriate Smart Flow
- PaperAI processes the document and sends a webhook when complete
- Your webhook handler checks the confidence score
- High-confidence results: auto-import into your system
- Low-confidence results: queue for human review in your application or PaperAI's interface
Error handling
Robust error handling is essential for production integrations. The PaperAI API uses standard HTTP status codes and returns structured error responses.
Common error scenarios and how to handle them:
HTTP 400 Bad Request
Response:
"error": "invalid_file_type",
"message": "Unsupported file format. Supported: PDF, PNG, JPG, TIFF, DOCX, XLSX"
HTTP 401 Unauthorized
Response:
"error": "invalid_api_key",
"message": "The provided API key is invalid or has been revoked"
HTTP 429 Too Many Requests
Response:
"error": "rate_limit_exceeded",
"message": "Rate limit exceeded. Retry after 30 seconds",
"retry_after": 30
HTTP 500 Internal Server Error
Response:
"error": "processing_failed",
"message": "Document processing failed. Please retry."
Recommended retry strategy:
- For 429 errors, respect the
retry_aftervalue in the response. - For 500 errors, implement exponential backoff: wait 1 second, then 2, then 4, up to a maximum of 60 seconds. Limit total retries to 5 attempts.
- For 400 errors, do not retry — these indicate a problem with the request that must be fixed in your code.
- For 401 errors, do not retry — verify your API key is correct and active.
A typical error-handling wrapper looks like this:
function processDocument(file):
retries = 0
max_retries = 5
delay = 1
while retries < max_retries:
response = POST /api/v1/documents with file
if response.status == 200:
return response.document_id
if response.status == 429:
wait(response.retry_after)
retries += 1
continue
if response.status >= 500:
wait(delay)
delay = min(delay * 2, 60)
retries += 1
continue
// 4xx errors - do not retry
throw ClientError(response.error)
throw MaxRetriesExceeded()
Pagination for large result sets
When working with folders containing hundreds or thousands of documents, list endpoints return paginated results. The API uses cursor-based pagination for consistent performance regardless of dataset size.
GET /api/v1/documents?folder_id=abc123&limit=50
Authorization: Bearer YOUR_API_KEY
The response includes pagination metadata:
Response:
"documents": [...],
"pagination":
"next_cursor": "eyJpZCI6MTAwfQ==",
"has_more": true,
"total_count": 347
To fetch the next page, include the cursor:
GET /api/v1/documents?folder_id=abc123&limit=50&cursor=eyJpZCI6MTAwfQ==
Authorization: Bearer YOUR_API_KEY
Pagination best practices:
- Use a
limitbetween 20 and 100. Larger pages reduce the number of requests but increase response time and memory usage. - Always check
has_morebefore making the next request. Do not assume a fixed number of pages. - Store the
next_cursorvalue and use it for the subsequent request. Cursors are opaque strings — do not parse or modify them. - For bulk export operations, process each page as it arrives rather than loading all pages into memory first.
Security best practices
API keys grant access to your organization's documents and extracted data. Treat them with the same care as database credentials or encryption keys.
Key management:
- Never embed API keys directly in client-side code, mobile applications, or public repositories. Use environment variables or a secrets manager (AWS Secrets Manager, HashiCorp Vault, Azure Key Vault).
- Rotate API keys periodically — at minimum every 90 days, and immediately if a key may have been exposed.
- Use separate API keys for development, staging, and production environments. This limits the blast radius if a development key is accidentally committed to version control.
- PaperAI supports multiple active API keys per organization. Create dedicated keys for each integration rather than sharing a single key across all systems.
Network security:
- All API communication must use HTTPS. The API rejects plain HTTP requests.
- If your infrastructure supports it, restrict outbound API calls to PaperAI's published IP ranges using firewall rules or network security groups.
- For webhook endpoints, verify the webhook signature included in the
X-PaperAI-Signatureheader to confirm the request originated from PaperAI and was not tampered with in transit.
Data handling:
- Minimize the retention of extracted data in your systems. Pull the data you need and avoid storing full document content unless your use case requires it.
- Apply the principle of least privilege: grant API keys only the scopes they need. Read-only keys should be used for reporting integrations that do not need to upload or modify documents.
- Log all API interactions on your side for audit purposes, but redact the API key from logs (log only the last four characters for identification).
Rate limits and best practices
- Rate limits: Vary by plan. Scale plan includes generous limits for production use.
- File size: Maximum 50 MB per document
- Batch uploads: Upload documents in parallel for faster throughput
- Error handling: Implement exponential backoff for retries on transient errors
- Idempotency: Use unique client-generated IDs to prevent duplicate processing
Getting started
- Sign up for a Scale or Enterprise plan
- Generate an API key in Settings → API
- Test with a single document upload
- Set up webhooks for production use
- Build your integration
Full API reference documentation is available at /docs.