DEV Community

Steven Tsao
Steven Tsao

Posted on • Edited on

Your AI Agent Shouldn't Parse PDFs. Delegate to a Subagent.

Written with Claude

I started by building an AI sandbox for financial reports.

Upload a PDF, extract text and tables, run analysis, let an agent answer questions. Simple enough — until I looked at where the data was going.

A typical pipeline:

  1. Upload PDF to blob storage
  2. Send to a parser service
  3. Store extracted text somewhere else
  4. Render page images in another service
  5. Run inference in yet another
  6. Save chat results in a sixth place

Five iterations later, I had 5x more Markdown and image artifacts scattered across subprocessors, with no reliable way to trace them back to a single document. For financial PDFs, that's a compliance problem.

So I colocated everything.


One document, one URL

When you upload a PDF to OkraPDF, you get a single base URL. Everything lives under it:

https://api.okrapdf.com/document/{docId}/
  /chat/completions   ← OpenAI-compatible query endpoint
  /status              ← Processing state
  /pages               ← Page images
  /nodes               ← Extracted entities
  /export              ← Markdown, Excel, DOCX
Enter fullscreen mode Exit fullscreen mode

Any client that speaks the OpenAI protocol can use this directly — the OpenAI SDK, Vercel AI SDK, LangChain, or curl. Each document is a model endpoint.

# Upload
DOC_ID=$(curl -s -X POST https://api.okrapdf.com/v1/documents \
  -H "Authorization: Bearer $OKRA_API_KEY" \
  -F "file=@report.pdf" | jq -r '.documentId')

# Query
curl -X POST https://api.okrapdf.com/document/$DOC_ID/chat/completions \
  -H "Authorization: Bearer $OKRA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"What was net income?"}]}'
Enter fullscreen mode Exit fullscreen mode

Three HTTP calls. One subprocessor. Your agent doesn't need to know anything about PDFs — it just calls an endpoint.


The surprise: colocation made it faster

I moved the pipeline into an edge runtime and colocated storage, parsing, rendering, and inference coordination into one place. The goal was privacy — fewer services touching sensitive documents.

The surprise was performance. Because core logic runs through bindings instead of network hops between services, everything got noticeably faster. Page image rendering happens in the same runtime as the source PDF. Chat completions read from a colocated database, not a remote Postgres.

Colocate more = better isolation + better performance. That's the architectural bet.


Per-document control

Real-world PDFs are messy. A single global config doesn't work when one file is a clean digital report and the next is a scanned 1990s filing.

With OkraPDF, config is per-document:

  • Parsing strategy — choose based on PDF complexity, change it later without re-uploading
  • Vendor selection — pick the AI vendor per document. Need a BAA vendor for medical records? Use it for those docs only. That's the only vendor that sees those bytes.
  • Per-document chat — query any document directly without app-level routing
  • Edge previews — page images rendered in the same binding as the source PDF

And because everything is colocated under one document ID, deletion is atomic:

DELETE /document/{docId}
Enter fullscreen mode Exit fullscreen mode

PDF, derived markdown, preview images, chat history — gone in one call. No orphaned artifacts across 5 services.


Real numbers

We ran the full FinanceBench evaluation — 129 questions across 10 SEC filings:

Metric Value
Pass rate 86.8% (112/129)
Cost per question $0.009
Total eval cost $1.15

Questions range from metric extraction ("What is Amazon's FY2019 net income?") to multi-step reasoning ("What is AMD's quick ratio and what does it imply about their liquidity?"). Sub-cent per question across dense 200-page filings.

Live demo: okrapdf.com/demo/financebench


Use your existing SDK

You don't need a new client. Each document exposes a standard OpenAI-compatible endpoint.

Vercel AI SDK:

import { createOkra } from "@okrapdf/ai-sdk";
import { streamText } from "ai";

const okra = createOkra({ apiKey: process.env.OKRA_API_KEY });

const result = streamText({
  model: okra("doc-abc123"),
  messages: [{ role: "user", content: "What is the revenue?" }],
});
Enter fullscreen mode Exit fullscreen mode

OpenAI SDK:

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.OKRA_API_KEY,
  baseURL: "https://api.okrapdf.com/v1/documents/doc-abc123",
});

const completion = await client.chat.completions.create({
  model: "default",
  messages: [{ role: "user", content: "What was net income?" }],
});
Enter fullscreen mode Exit fullscreen mode

Works with LangChain, CrewAI, or any agent framework that speaks the OpenAI protocol. Your agent calls it like any other model — because it is one.


Try it

npm install okrapdf
npx okra upload report.pdf
npx okra chat "What was net income in FY2023?"
Enter fullscreen mode Exit fullscreen mode

Top comments (0)