In Development

Quarry

AI Knowledge Infrastructure Platform

Production AI systems platform evolving from retrieval and RAG into agents, memory, evaluation, observability, and inference infrastructure.

Problem

The problem

Most RAG systems are demos. They break under real workloads: irrelevant chunks, hallucinated citations, no observability, no evaluation, and no path to agents. Quarry is built as a production knowledge infrastructure layer — the substrate teams need to move AI from prototype to product.

System Design

How it's built

Architecture

Retrieval-Augmented Generation Pipeline

01
Document Ingest
PDF, HTML, Markdown
step
02
Chunking
Semantic + overlap
step
03
Embeddings
text-embedding-3
step
04
Vector Store
pgvector / Postgres
step
05
Retrieval
Hybrid + re-rank
step
06
LLM Orchestration
LangGraph
step
07
Response
Streaming + citations
step
Tech Stack

What powers it

FastAPIPostgreSQLRedisDockerRAGEmbeddingspgvectorLangGraph
Challenges

What was hard

  • Chunking strategies that preserve semantic units across document types
  • Hybrid retrieval (dense + BM25) with a re-ranker to fight relevance drift
  • Streaming responses with grounded citations and safe fallbacks
  • Multi-tenant isolation and rate limiting at the API boundary
  • Evaluation harness that can score both retrieval quality and complete answers
Design Decisions

Why it's built this way

Postgres + pgvector over a dedicated vector DB

One system to operate, transactional guarantees, and mature tooling. Vector DBs can be added later if scale demands it.

FastAPI + async everywhere

Non-blocking IO for embedding and LLM calls; simple to reason about; easy to instrument.

LangGraph for agent orchestration

Explicit state machine over ad-hoc chains — makes reliability, retries, and observability first-class.

Redis for hot paths

Semantic cache, rate limits, and short-term memory sit in Redis to keep latency low.

Docker-first deployment

Reproducible environments across dev, CI, and prod; friction-free onboarding for contributors.

Lessons Learned

What I'd tell my past self

  • Retrieval quality — not model choice — dominates complete answer quality in most cases.
  • Every RAG system needs an eval harness on day one, not day one hundred.
  • Observability (traces, spans, token counts, retrieval hits) is worth building before you scale.
  • The interesting engineering is at the boundaries: chunking, re-ranking, and orchestration.
Next case study
Aeroguard
Flight anomaly detection with unsupervised ML