Back to Projects

RAG Starter Kit — Production-Grade AI Document Chat Platform

A full-stack, production-ready Retrieval-Augmented Generation platform built on Next.js 15 App Router that lets teams ship an AI-powered document chatbot with hybrid retrieval, enterprise authentication, and real-time collaboration — using 100% free-tier AI providers by default. The architecture spans a 5-stage document ingestion pipeline, pgvector hybrid search with Reciprocal Rank Fusion, SAML 2.0 SSO, Inngest background jobs, and full OpenTelemetry tracing, making it equally usable as a weekend side project or a commercial SaaS foundation.

## Overview Most RAG tutorials produce a toy: a single-file script that embeds a PDF and queries it with `gpt-4`. **RAG Starter Kit** takes the opposite approach — it is the production infrastructure you would build if you were launching a commercial AI document chat SaaS. It solves the bootstrapping gap by providing enterprise auth, real-time collaboration, background processing, observability, and a state-of-the-art hybrid retrieval pipeline, all configured to run on free-tier services by default. --- ## Architecture ``` Presentation → Next.js 15 RSC + React 19 + Tailwind CSS 4 + shadcn/ui API → Next.js Route Handlers (RESTful, Zod-validated, rate-limited) AI / RAG → Vercel AI SDK + LangChain.js + custom hybrid retrieval pipeline Storage → PostgreSQL 16 + pgvector (Prisma 7) + Redis (Upstash) Files → Cloudinary (upload, transformation, media library) Background → Inngest (event-driven job queue, DLQ, exponential backoff) Real-time → Ably (WebSocket / SSE multi-user collaboration) Auth → NextAuth v5 + SAML 2.0 (samlify) + TOTP (otpauth) Observability → OpenTelemetry + Pino logging + Plausible + PostHog ``` The `src/lib/` directory spans **44 subdirectories** — ai, rag, auth, billing, cache, collaboration, compliance, db, eval, experiments, export, i18n, inngest, monitoring, multimodal, notifications, offline, performance, plugins, pwa, realtime, resilience, security, tracing, webhooks, white-label, and more — covering the full surface area of a production platform. --- ## Key Technical Achievements ### Hybrid Retrieval with Reciprocal Rank Fusion The retrieval engine (`lib/rag/retrieval/hybrid.ts`) runs vector cosine similarity search and BM25-style keyword search **in parallel** via `Promise.all`. Results are merged using the standard Reciprocal Rank Fusion formula: ``` score = Σ 1 / (k + rank) where k = 60 ``` Post-fusion, a Jaccard similarity check (`|intersection| / |union|` of token sets, threshold 0.9) deduplicates near-identical chunks before they reach the LLM context window. This matches the retrieval quality of production search systems without requiring a dedicated search infrastructure. ### Document Ingestion Pipeline The 5-stage pipeline (`lib/rag/ingestion/pipeline.ts`, 1,386 lines) handles: 1. **Validate** — Magic-bytes binary signature verification for 12+ formats (PDF `%PDF-`, DOCX ZIP `PK`, PNG `‰PNG`, JPEG `ÿØÿ`, WebP, TIFF, GIF, BMP). A file claiming to be a PDF but containing an EXE header is rejected before it touches the filesystem. 2. **Scan** — ClamAV virus scan integration for uploaded files. 3. **Parse** — Type-specific parsers with automatic Tesseract.js OCR fallback for scanned PDFs that contain no extractable text layer. 4. **Chunk** — Workspace-selectable chunking strategies (fixed / semantic / hierarchical / late) fetched from the database per-workspace at ingestion time. 5. **Embed & Store** — Google Gemini embeddings in batches of 100, inserted via Prisma `$executeRaw` parameterized queries directly into pgvector — preventing SQL injection while bypassing the ORM's lack of native vector type support. Non-recoverable failures go to a dead-letter queue; transient failures retry with exponential backoff. ### Enterprise Authentication Three auth paths coexist: NextAuth v5 (credential + OAuth), SAML 2.0 via `samlify` for Okta/Azure AD SSO, and TOTP via `otpauth`. Row-level workspace isolation ensures users can only query their own documents regardless of auth method. ### Streaming RAG Response The RAG engine exposes both `generateRAGResponse` (batch) and `streamRAGResponse` (async generator). The streaming path uses Vercel AI SDK's `streamText` to push tokens to the client as they arrive, with source citation markers (`[1]`, `[2]`) injected into the system prompt from retrieved chunk metadata. --- ## Tech Stack | Layer | Technologies | |---|---| | Framework | Next.js 15 · React 19 · TypeScript | | Database | PostgreSQL · pgvector · Prisma 7 · Redis | | AI | Vercel AI SDK · LangChain.js · OpenAI · Google Gemini · Ollama | | Auth | NextAuth v5 · SAML 2.0 · TOTP | | Background | Inngest · Cloudinary · Tesseract.js | | Real-time | Ably WebSockets | | Payments | Stripe | | Observability | OpenTelemetry · Pino · Plausible · PostHog | | UI | Tailwind CSS 4 · shadcn/ui · Framer Motion | | Testing | Vitest · Playwright · Artillery · k6 | --- ## What This Demonstrates This project demonstrates the ability to architect and implement a complete AI product backend: not just LLM API calls, but the retrieval quality engineering (RRF), the security engineering (magic-bytes validation, parameterized SQL), the operational engineering (OTel tracing, DLQ, backoff), and the enterprise go-to-market requirements (SAML SSO, RBAC, billing) that separate production systems from demos.