In Development

Quarry

AI Knowledge Infrastructure Platform

Production AI systems platform evolving from retrieval and RAG into agents, memory, evaluation, observability, and inference infrastructure.

Problem

The problem

Most RAG systems are demos. They break under real workloads: irrelevant chunks, hallucinated citations, no observability, no evaluation, and no path to agents. Quarry is built as a production knowledge infrastructure layer — the substrate teams need to move AI from prototype to product.

System Design

How it's built

Architecture

Retrieval-Augmented Generation Pipeline

Document Ingest

PDF, HTML, Markdown

step

Chunking

Semantic + overlap

step

Embeddings

text-embedding-3

step

Vector Store

pgvector / Postgres

step

Retrieval

Hybrid + re-rank

step

LLM Orchestration

LangGraph

step

Response

Streaming + citations

step

Tech Stack

What powers it

FastAPIPostgreSQLRedisDockerRAGEmbeddingspgvectorLangGraph

Challenges

What was hard

Chunking strategies that preserve semantic units across document types
Hybrid retrieval (dense + BM25) with a re-ranker to fight relevance drift
Streaming responses with grounded citations and safe fallbacks
Multi-tenant isolation and rate limiting at the API boundary
Evaluation harness that can score both retrieval quality and complete answers

Design Decisions

Why it's built this way

Postgres + pgvector over a dedicated vector DB

One system to operate, transactional guarantees, and mature tooling. Vector DBs can be added later if scale demands it.

FastAPI + async everywhere

Non-blocking IO for embedding and LLM calls; simple to reason about; easy to instrument.

LangGraph for agent orchestration

Explicit state machine over ad-hoc chains — makes reliability, retries, and observability first-class.

Redis for hot paths

Semantic cache, rate limits, and short-term memory sit in Redis to keep latency low.

Docker-first deployment

Reproducible environments across dev, CI, and prod; friction-free onboarding for contributors.

Lessons Learned

What I'd tell my past self

Retrieval quality — not model choice — dominates complete answer quality in most cases.
Every RAG system needs an eval harness on day one, not day one hundred.
Observability (traces, spans, token counts, retrieval hits) is worth building before you scale.
The interesting engineering is at the boundaries: chunking, re-ranking, and orchestration.

Next case study

Aeroguard

Flight anomaly detection with unsupervised ML

Back to all projects