Building RAG Pipelines with Kernel Memory

Table of Contents

The biggest limitation of LLMs isn’t their intelligence; it’s their memory. They know everything about the world up to 2023, but they know nothing about your company’s internal error codes, your new product manual, or your project documentation.

Retrieval Augmented Generation (RAG) is the architecture that solves this. It allows you to “inject” relevant knowledge into the AI’s prompt before it answers.

In this post, we’ll use Kernel Memory—an open-source service from Microsoft—to build a Troubleshooting Bot that knows how to fix your specific application errors.

The “Memory” Problem
#

If you ask ChatGPT “How do I fix Error Code 99 in Project Omega?”, it will hallucinate a plausible-sounding but incorrect answer. It doesn’t know what “Project Omega” is.

With RAG, the flow changes:

User: “How do I fix Error Code 99?”
System: Searches your database for “Error Code 99”.
System: Finds a document: “Error 99: Flux Capacitor misalignment. Fix: Rotate 90 degrees.”
System: Sends the user’s question plus that document to the AI.
AI: “To fix Error Code 99, you need to rotate the Flux Capacitor 90 degrees.”

Building the Bot
#

We’ll use the Microsoft.KernelMemory.Core package. It handles the entire pipeline: reading files, splitting them into chunks, generating embeddings (vectors), storing them, and searching them.

1. Setup
#

Prerequisites: We’ll use the “batteries included” package which allows running Kernel Memory in a serverless (embedded) mode without setting up a separate web service.

1
dotnet add package Microsoft.KernelMemory

2. The Code
#

We’ll simulate a scenario where we have a markdown file containing internal troubleshooting steps. We’ll ingest this text and then ask the AI about it.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
using Microsoft.KernelMemory;

// 1. Initialize Kernel Memory
// In production, you'd point this to Azure AI Search or Qdrant.
// For this demo, we use the "Serverless" mode which runs entirely in memory 
// but still needs an LLM for embeddings and generation (OpenAI here).
var memory = new KernelMemoryBuilder()
    .WithOpenAIDefaults(Environment.GetEnvironmentVariable("OPENAI_API_KEY"))
    .Build();

// 2. The Knowledge Base
// Imagine this content came from a 'troubleshooting.md' file or a Wiki.
var troubleshootingGuide = @"
# Project Omega Troubleshooting Guide

## Error Code 101: Connection Refused
Cause: The firewall is blocking port 8080.
Fix: Run `netsh advfirewall set allprofiles state off` (Not recommended for prod) 
or add an allow rule for TCP 8080.

## Error Code 202: Out of Coffee
Cause: The developer is tired.
Fix: Refill the pot immediately. Critical priority.

## Error Code 303: Quantum Entanglement
Cause: You observed the electron.
Fix: Stop looking at it.
";

Console.WriteLine("Ingesting knowledge base...");

// 3. Ingest Data
// This splits the text into paragraphs, converts them to vectors, and stores them.
await memory.ImportTextAsync(troubleshootingGuide, documentId: "guide-v1");

// 4. Ask Questions
Console.WriteLine("Ready! Ask a question about error codes.");

var question = "How do I fix the quantum error?";
Console.WriteLine($"\nQuestion: {question}");

var answer = await memory.AskAsync(question);

Console.WriteLine($"\nAnswer: {answer.Result}");

// 5. Cite Sources
// RAG allows you to show *where* the answer came from.
Console.WriteLine("\n--- Sources ---");
foreach (var source in answer.RelevantSources)
{
    Console.WriteLine($"Source: {source.SourceName} (Relevance: {source.Partitions.First().Relevance:P1})");
}

Expected Output
#

1
2
3
4
5
6
Question: How do I fix the quantum error?

Answer: To fix the quantum error (Error Code 303), you simply need to stop looking at the electron.

--- Sources ---
Source: guide-v1 (Relevance: 89.5%)

Under the Hood: Embeddings & Vector DBs
#

Kernel Memory abstracts away the complexity, but it’s important to know what’s happening:

Chunking: The text is split into smaller pieces (e.g., paragraphs). If we didn’t do this, the whole document might not fit in the LLM’s context window.
Embeddings: Each chunk is sent to an Embedding Model (like text-embedding-3-small). This model turns text into a list of numbers (a vector) representing its meaning.
Vector Search: When you ask a question, your question is also turned into a vector. The system finds chunks that are mathematically “close” to your question’s vector.

Going to Production
#

For a real app, you wouldn’t use the in-memory storage (which is lost when the app restarts). You would configure a persistent Vector Database like Qdrant, Azure AI Search, or Postgres.

You’ll need to install the specific adapter package, e.g., Microsoft.KernelMemory.MemoryDb.Qdrant.

1
2
3
4
var memory = new KernelMemoryBuilder()
    .WithOpenAIDefaults(apiKey)
    .WithQdrantMemoryDb("http://localhost:6333") // Requires Qdrant adapter package
    .Build();

This allows you to ingest gigabytes of documents and search them in milliseconds.

Building RAG Pipelines with Kernel Memory

Building RAG Pipelines with Kernel Memory

Chris Malpass

The “Memory” Problem
#

Building the Bot
#

1. Setup
#

2. The Code
#

Expected Output
#

Under the Hood: Embeddings & Vector DBs
#

Going to Production
#

Further Reading
#

Building RAG Pipelines with Kernel Memory

The “Memory” Problem#

Building the Bot#

1. Setup#

2. The Code#

Expected Output#

Under the Hood: Embeddings & Vector DBs#

Going to Production#

Further Reading#

The “Memory” Problem
#

Building the Bot
#

1. Setup
#

2. The Code
#

Expected Output
#

Under the Hood: Embeddings & Vector DBs
#

Going to Production
#

Further Reading
#