From unstructured input to validated JSON
Point ParseHawk at a document, describe the shape you want back, and get clean, schema-checked data — without training a model.
Any document in
PDFs, scans, images, plain text, and Markdown all become structured JSON.
Your own schemas
Describe exactly the fields you want back with JSON Schema Draft 2020-12.
Zero-shot or few-shot
Start with instructions and a schema, then add examples when a type needs guidance.
Validated output
Every result is checked against your schema and stored as canonical job.result.data.
Runs on your hardware
vLLM on Linux NVIDIA and vLLM Metal on Apple Silicon — a server or your MacBook.
Private by default
Files, jobs, extractors, and results stay local. Nothing leaves your machine.
For teams working with documents that must stay private
ParseHawk is built for developers and teams handling sensitive files — the kind of data that simply should not leave your own infrastructure.
Developers
Wire extraction into apps, services, and agents through one local REST API.
Teams & ops
Stand up shared local extraction for invoices, receipts, and back-office files.
Regulated work
Keep medical, legal, and financial documents on infrastructure you fully control.
Common inputs
Invoices
Receipts
Contracts
Internal docs
Customer files
Medical records
Financial records
One local API, three ways to drive it
ParseHawk exposes a single local REST API. The CLI and web UI are clients of that same API — reach for whichever fits the job.
01
REST API
Programmatic extraction for apps, services, and agents.
02
CLI
Run the local stack and one-shot extractions from your shell.
03
Web UI
Upload, pick an extractor, run, and inspect the result.
Run document AI on your own hardware.
Clone the repo, install the CLI, and start extracting locally in minutes.
Built with open source