Systems we've shipped.
Client names anonymized. Outcomes are real.
Series B SaaS — Legal tech
Contract review agent: 94% accuracy, 80% faster than human review
Built a multi-agent pipeline that ingested contracts via API, extracted key clauses using structured LLM output, flagged risk areas against a custom rubric, and routed edge cases to human reviewers. Shipped with an eval harness of 500 labeled contracts.
Fortune 500 — Financial services
On-prem LLM deployment for internal analyst tooling — zero data leaves the perimeter
Deployed Llama 3.1 70B on-premises behind the client firewall. Built a RAG pipeline over 10 years of internal research documents. Integrated with existing Bloomberg Terminal workflows. Full FedRAMP documentation package included in handoff.
Series A — Healthcare AI
Eval framework that reduced clinical AI false positive rate by 34%
Audited an existing clinical NLP system and found systematic bias in the training eval set. Rebuilt the evaluation pipeline with clinician-annotated ground truth. Implemented continuous monitoring that alerts on demographic performance drift.
Growth-stage — Developer tools
Cut embedding pipeline cost by 70% while improving retrieval quality
Replaced a naive full-doc embedding approach with a hybrid chunking strategy and cohere reranking. Migrated from a managed vector DB to self-hosted Weaviate. Implemented async batch embedding that reduced API costs from $18K/month to $5K/month.