Back to Projects
RAG Starter Kit — Production-Grade AI Document Chat Platform
A full-stack, production-ready Retrieval-Augmented Generation platform built on Next.js 15 App Router that lets teams ship an AI-powered document chatbot with hybrid retrieval, enterprise authentication, and real-time collaboration — using 100% free-tier AI providers by default. The architecture spans a 5-stage document ingestion pipeline, pgvector hybrid search with Reciprocal Rank Fusion, SAML 2.0 SSO, Inngest background jobs, and full OpenTelemetry tracing, making it equally usable as a weekend side project or a commercial SaaS foundation.
## Overview
Most RAG tutorials produce a toy: a single-file script that embeds a PDF and queries it with `gpt-4`. **RAG Starter Kit** takes the opposite approach — it is the production infrastructure you would build if you were launching a commercial AI document chat SaaS. It solves the bootstrapping gap by providing enterprise auth, real-time collaboration, background processing, observability, and a state-of-the-art hybrid retrieval pipeline, all configured to run on free-tier services by default.
---
## Architecture
```
Presentation → Next.js 15 RSC + React 19 + Tailwind CSS 4 + shadcn/ui
API → Next.js Route Handlers (RESTful, Zod-validated, rate-limited)
AI / RAG → Vercel AI SDK + LangChain.js + custom hybrid retrieval pipeline
Storage → PostgreSQL 16 + pgvector (Prisma 7) + Redis (Upstash)
Files → Cloudinary (upload, transformation, media library)
Background → Inngest (event-driven job queue, DLQ, exponential backoff)
Real-time → Ably (WebSocket / SSE multi-user collaboration)
Auth → NextAuth v5 + SAML 2.0 (samlify) + TOTP (otpauth)
Observability → OpenTelemetry + Pino logging + Plausible + PostHog
```
The `src/lib/` directory spans **44 subdirectories** — ai, rag, auth, billing, cache, collaboration, compliance, db, eval, experiments, export, i18n, inngest, monitoring, multimodal, notifications, offline, performance, plugins, pwa, realtime, resilience, security, tracing, webhooks, white-label, and more — covering the full surface area of a production platform.
---
## Key Technical Achievements
### Hybrid Retrieval with Reciprocal Rank Fusion
The retrieval engine (`lib/rag/retrieval/hybrid.ts`) runs vector cosine similarity search and BM25-style keyword search **in parallel** via `Promise.all`. Results are merged using the standard Reciprocal Rank Fusion formula:
```
score = Σ 1 / (k + rank) where k = 60
```
Post-fusion, a Jaccard similarity check (`|intersection| / |union|` of token sets, threshold 0.9) deduplicates near-identical chunks before they reach the LLM context window. This matches the retrieval quality of production search systems without requiring a dedicated search infrastructure.
### Document Ingestion Pipeline
The 5-stage pipeline (`lib/rag/ingestion/pipeline.ts`, 1,386 lines) handles:
1. **Validate** — Magic-bytes binary signature verification for 12+ formats (PDF `%PDF-`, DOCX ZIP `PK`, PNG `PNG`, JPEG `ÿØÿ`, WebP, TIFF, GIF, BMP). A file claiming to be a PDF but containing an EXE header is rejected before it touches the filesystem.
2. **Scan** — ClamAV virus scan integration for uploaded files.
3. **Parse** — Type-specific parsers with automatic Tesseract.js OCR fallback for scanned PDFs that contain no extractable text layer.
4. **Chunk** — Workspace-selectable chunking strategies (fixed / semantic / hierarchical / late) fetched from the database per-workspace at ingestion time.
5. **Embed & Store** — Google Gemini embeddings in batches of 100, inserted via Prisma `$executeRaw` parameterized queries directly into pgvector — preventing SQL injection while bypassing the ORM's lack of native vector type support.
Non-recoverable failures go to a dead-letter queue; transient failures retry with exponential backoff.
### Enterprise Authentication
Three auth paths coexist: NextAuth v5 (credential + OAuth), SAML 2.0 via `samlify` for Okta/Azure AD SSO, and TOTP via `otpauth`. Row-level workspace isolation ensures users can only query their own documents regardless of auth method.
### Streaming RAG Response
The RAG engine exposes both `generateRAGResponse` (batch) and `streamRAGResponse` (async generator). The streaming path uses Vercel AI SDK's `streamText` to push tokens to the client as they arrive, with source citation markers (`[1]`, `[2]`) injected into the system prompt from retrieved chunk metadata.
---
## Tech Stack
| Layer | Technologies |
|---|---|
| Framework | Next.js 15 · React 19 · TypeScript |
| Database | PostgreSQL · pgvector · Prisma 7 · Redis |
| AI | Vercel AI SDK · LangChain.js · OpenAI · Google Gemini · Ollama |
| Auth | NextAuth v5 · SAML 2.0 · TOTP |
| Background | Inngest · Cloudinary · Tesseract.js |
| Real-time | Ably WebSockets |
| Payments | Stripe |
| Observability | OpenTelemetry · Pino · Plausible · PostHog |
| UI | Tailwind CSS 4 · shadcn/ui · Framer Motion |
| Testing | Vitest · Playwright · Artillery · k6 |
---
## What This Demonstrates
This project demonstrates the ability to architect and implement a complete AI product backend: not just LLM API calls, but the retrieval quality engineering (RRF), the security engineering (magic-bytes validation, parameterized SQL), the operational engineering (OTel tracing, DLQ, backoff), and the enterprise go-to-market requirements (SAML SSO, RBAC, billing) that separate production systems from demos.