Written with Claude
I started by building an AI sandbox for financial reports.
Upload a PDF, extract text and tables, run analysis, let an agent answer questions. Simple enough — until I looked at where the data was going.
A typical pipeline:
- Upload PDF to blob storage
- Send to a parser service
- Store extracted text somewhere else
- Render page images in another service
- Run inference in yet another
- Save chat results in a sixth place
Five iterations later, I had 5x more Markdown and image artifacts scattered across subprocessors, with no reliable way to trace them back to a single document. For financial PDFs, that's a compliance problem.
So I colocated everything.
One document, one URL
When you upload a PDF to OkraPDF, you get a single base URL. Everything lives under it:
https://api.okrapdf.com/document/{docId}/
/chat/completions ← OpenAI-compatible query endpoint
/status ← Processing state
/pages ← Page images
/nodes ← Extracted entities
/export ← Markdown, Excel, DOCX
Any client that speaks the OpenAI protocol can use this directly — the OpenAI SDK, Vercel AI SDK, LangChain, or curl. Each document is a model endpoint.
# Upload
DOC_ID=$(curl -s -X POST https://api.okrapdf.com/v1/documents \
-H "Authorization: Bearer $OKRA_API_KEY" \
-F "file=@report.pdf" | jq -r '.documentId')
# Query
curl -X POST https://api.okrapdf.com/document/$DOC_ID/chat/completions \
-H "Authorization: Bearer $OKRA_API_KEY" \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"What was net income?"}]}'
Three HTTP calls. One subprocessor. Your agent doesn't need to know anything about PDFs — it just calls an endpoint.
The surprise: colocation made it faster
I moved the pipeline into an edge runtime and colocated storage, parsing, rendering, and inference coordination into one place. The goal was privacy — fewer services touching sensitive documents.
The surprise was performance. Because core logic runs through bindings instead of network hops between services, everything got noticeably faster. Page image rendering happens in the same runtime as the source PDF. Chat completions read from a colocated database, not a remote Postgres.
Colocate more = better isolation + better performance. That's the architectural bet.
Per-document control
Real-world PDFs are messy. A single global config doesn't work when one file is a clean digital report and the next is a scanned 1990s filing.
With OkraPDF, config is per-document:
- Parsing strategy — choose based on PDF complexity, change it later without re-uploading
- Vendor selection — pick the AI vendor per document. Need a BAA vendor for medical records? Use it for those docs only. That's the only vendor that sees those bytes.
- Per-document chat — query any document directly without app-level routing
- Edge previews — page images rendered in the same binding as the source PDF
And because everything is colocated under one document ID, deletion is atomic:
DELETE /document/{docId}
PDF, derived markdown, preview images, chat history — gone in one call. No orphaned artifacts across 5 services.
Real numbers
We ran the full FinanceBench evaluation — 129 questions across 10 SEC filings:
| Metric | Value |
|---|---|
| Pass rate | 86.8% (112/129) |
| Cost per question | $0.009 |
| Total eval cost | $1.15 |
Questions range from metric extraction ("What is Amazon's FY2019 net income?") to multi-step reasoning ("What is AMD's quick ratio and what does it imply about their liquidity?"). Sub-cent per question across dense 200-page filings.
Live demo: okrapdf.com/demo/financebench
Use your existing SDK
You don't need a new client. Each document exposes a standard OpenAI-compatible endpoint.
Vercel AI SDK:
import { createOkra } from "@okrapdf/ai-sdk";
import { streamText } from "ai";
const okra = createOkra({ apiKey: process.env.OKRA_API_KEY });
const result = streamText({
model: okra("doc-abc123"),
messages: [{ role: "user", content: "What is the revenue?" }],
});
OpenAI SDK:
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.OKRA_API_KEY,
baseURL: "https://api.okrapdf.com/v1/documents/doc-abc123",
});
const completion = await client.chat.completions.create({
model: "default",
messages: [{ role: "user", content: "What was net income?" }],
});
Works with LangChain, CrewAI, or any agent framework that speaks the OpenAI protocol. Your agent calls it like any other model — because it is one.
Try it
npm install okrapdf
npx okra upload report.pdf
npx okra chat "What was net income in FY2023?"
Top comments (0)