ML Architecture
Retrieval-Augmented Generation (RAG) Pipeline
Query embedding, vector retrieval, prompt augmentation and LLM response generation.
Prompt
A Retrieval-Augmented Generation (RAG) pipeline, left-to-right horizontal layout. Stage 1 β User Query: - A short text query enters the system. Stage 2 β Query Embedding: - The query is encoded by a sentence-embedding model into a dense vector q. Stage 3 β Vector Retrieval: - q is matched against a vector database (drawn as a stack of vectors with a "Vector Store" label). - Top-k nearest neighbors (k=4) are retrieved as context chunks. Stage 4 β Prompt Construction: - The original query and the retrieved chunks are concatenated into an augmented prompt template. Stage 5 β LLM Generation: - The augmented prompt is fed to an LLM (e.g., GPT-class model) which produces the final grounded response. Outside the main flow, on top: an offline indexing pipeline showing documents -> chunker -> embedder -> vector store. Connect with a dashed arrow into Stage 3. Style: clean academic vector, navy and amber palette, white background, sans-serif labels.Use in Generator
When to use
For RAG / question-answering / knowledge-grounded generation papers and engineering blog posts.
Variations
With re-ranker stage
Insert a re-ranking stage between vector retrieval and prompt construction. The re-ranker (a cross-encoder) scores each retrieved chunk against the query and reorders them, keeping the top-k'.
Hybrid (sparse + dense) retrieval
Replace the single vector retrieval with two parallel retrievers: BM25 sparse retrieval and dense embedding retrieval. Their results are merged via reciprocal rank fusion before prompt construction.
Tips
- Always include the offline indexing branch β without it readers don't see how the vector store was built.
- Use k=4 or k=5 in the figure. Larger k crowds the layout; smaller k looks toy.
- Annotate the prompt template inline if space allows β it shows readers what the LLM actually sees.
FAQ
Can I show citation generation in the output?
Add "The LLM output includes inline citation markers [1], [2] referring back to retrieved chunks."
