R&D

The Foundry

Where we build what doesn't exist yet.

The Foundry is BespokeWorks' research and development arm, led by Theo Coleman. We build AI systems from first principles, test them against the hardest public benchmarks, and ship what works directly into client products.

The work here isn't theoretical. Every technique we develop in The Foundry becomes part of the systems we deploy for businesses.

Flagship

Benchmark-Breaking Retrieval AI

We built a question-answering system that outperforms every published alternative. When given a complex question that requires connecting information across multiple documents, our system finds the right answer more reliably than any other publicly documented approach.

On HotpotQA — a standard academic benchmark for multi-hop reasoning used across research labs and industry — our system achieved an F1 score of 86.8%.

BespokeWorks Foundry Ours
86.8%
StepChain GraphRAG Published SOTA
79.5%
Standard RAG Pipeline
72.0%

Why this matters: Unlike most high-scoring systems, ours doesn't require training on the benchmark dataset. Fine-tuned models score higher on the specific data they're trained on, but can't generalise. Our system is training-free — it works out of the box on any domain. Finance documents, medical records, legal contracts. Same architecture, same accuracy. That's the difference between a research result and a production system.

When your AI gets the answer wrong, someone has to catch it. Higher accuracy means fewer errors, less human review, and more trust in the system. The gap between 79.5% and 86.8% isn't academic — it's the difference between a system that needs constant supervision and one that works.

From The Foundry

Beating Every Published Benchmark for Multi-Hop QA
Our training-free RAG system achieved F1 86.8% on HotpotQA — outperforming StepChain GraphRAG (79.5%) and every other published result.
Read the case study →
Generating Full MRI Scans from Partial Data
Using diffusion models to help clinicians work with complete imaging when only fragments are available.
Case study coming soon
A Blog Generator That Passes for Human
V3 multi-agent pipeline with quality gates, author voice system, and measurable anti-AI detection.
Case study coming soon

How Research Becomes Product

Everything we build in The Foundry eventually ships to clients. The pipeline is straightforward.

1
Research
Build new techniques from first principles
2
Benchmark
Test against public academic datasets
3
Harden
Production-grade reliability and speed
4
Deploy
Integrate into client systems

When we build a chatbot or knowledge system for your business, it's running the same architecture we've stress-tested against the hardest public benchmarks. You get research-grade AI without the research timeline.

Talk to us about what we're building

Whether you need a system that answers complex questions, generates content, or automates reasoning over your data — the technology behind it is already built and tested.