RAG / Vector Search
AI/ML Advanced 1 year experience

Summary#

I build RAG systems from the embedding pipeline: choosing models, tuning chunk strategies, designing retrieval filters, and assembling context windows that produce grounded, source-attributed responses. Both of my recent AI projects use Qdrant as the vector store, with different embedding models and retrieval strategies matched to the problem.

How I Apply This Skill#

  • Embedded 1,000+ Obsidian notes using OpenAI text-embedding-3-small (1,536 dimensions), split into 5,000 chunks with a 2,000-character window and 400-character overlap, stored in a self-hosted Qdrant instance
  • Tuned cosine similarity thresholds separately for two use cases: 0.70 for auto-linking related notes (precision-focused), and 0.50 for chatbot retrieval (recall-focused to avoid missing relevant sources)
  • Built a 5-stage schema retrieval pipeline for Text-to-SQL using all-mpnet-base-v2 (768 dimensions): embed query → concept mapping → Qdrant filtered search → schema enrichment → ranked, deduplicated context assembly passed to Claude
  • Indexed a RAG help system for Text-to-SQL: 20 articles chunked, embedded, and stored in Qdrant, with semantic search returning source-attributed responses that cite the specific article and chunk
  • Applied deduplication at retrieval time in both projects to ensure source diversity — preventing a single dominant document from consuming the context window
  • Implemented hybrid vector + BM25 keyword search with Reciprocal Rank Fusion in the RAG Document Assistant, fixing exact-term retrieval failures (e.g., “FIPS 200”) that vector-only search systematically missed
  • Added cross-encoder re-ranking (ms-marco-MiniLM-L-6-v2) to the Document Assistant’s retrieval pipeline, adding ~100ms overhead for significantly improved relevance ordering across 51,000+ chunks in 4 file formats

Key Strengths#

  • Embedding pipeline design: model selection, dimensionality, chunking strategy, and overlap tuned per use case
  • Qdrant: filtered scroll, semantic search, collection management, self-hosted deployment
  • Retrieval tuning: similarity thresholds, deduplication, and relevance ranking before context assembly
  • Source attribution: every answer traces to a specific chunk and document, with relevance scores exposed to the user
  • Multi-purpose RAG: applying the same vector store to distinct retrieval tasks (chatbot, auto-linking, schema lookup, help system) within a single project
← Back to Skills