Integrate Kernel Memory with Semantic Kernel

In the previous post, we have learned how Kernel Memory, an open-source service by Microsoft, can greatly simplify the implementation of Retrieval Augmented Generation (RAG) experiences, which enable to use the power of LLMs in combination with private data, like organizational documents. What we have seen in the previous post was very powerful, but it was also a bit limited. We were able, in fact, to ask direct questions related to our documents, like What is Contoso Electronics?, but what if you need to perform a more complex task with these documents, like having a continuous chat experience or using the information stored into the document to perform other activities? That’s the right job for Semantic Kernel which, thanks to the usage of plugins, functions calling and planners, allows to build complex AI workflows.

In ths post, we’re going to see how we can combine Semantic Kernel and Kernel Memory, thanks to the usage of a dedicated plugin. We’re going to build two different scenarios:

A console application that is going to combine two of the scenarios we’ve already seen: using a prompt function to convert a text into a business mail (which was explained here) and using Kernel Memory to store as embeddings the employee handbook of a fictitious company (which we explained here). In this case, we don’t want just to answer a question about the handbook, but we want to convert the answer into a business mail.
A continuous chat experience, where we’re going to ask multiple questions about the handbook, retaining the context of the conversation.

To go through the samples, I assume you have already tested the project I made available on GitHub. Specifically, I’m going to assume that you have already used the Blazor application to upload the employee handbook and convert it into embeddings, which have been stored on an Azure AI Search instance. If you haven’t done it yet, please follow the instructions in the previous post to do it.

Regardless of the scenario, the way we set up the plugin is the same. Let’s take a look!

Setting up the plugin

Let’s start by setting up the project. The first step is to initialize Semantic Kernel in the same way we did in all the other posts about this library:

1
2
3
4
5
6
7


string apiKey = "AzureOpenAI:ApiKey";
string deploymentChatName = "AzureOpenAI:DeploymentChatName";
string endpoint = "AzureOpenAI:Endpoint";

var kernel = Kernel.CreateBuilder()
    .AddAzureOpenAIChatCompletion(deploymentChatName, endpoint, apiKey)
    .Build();

Now we need to import the Kernel Memory plugin so, as first step, we need to install the NuGet package called Microsoft.KernelMemory.SemanticKernelPlugin. Now we can use the MemoryPlugin class offered by this library which, however, requires to initialize the KernelMemory instance we want to use. In the previous post, we have used Kernel Memory in serverless mode (which means that the service is hosted by the application itself), so we’ll continue to use the same approach in this post. However, keep in mind that you are free to use the dedicated service if you need a more scalable solution. As such, we initialize the KernelMemory object in the same way we did in the Blazor application we have built in the previous post, by using the KernelMemoryBuilder class:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31


string apiKey = "AzureOpenAI:ApiKey";
string deploymentChatName = "AzureOpenAI:DeploymentChatName";
string deploymentEmbeddingName = "AzureOpenAI:DeploymentEmbeddingName";
string endpoint = "AzureOpenAI:Endpoint";

string searchApiKey = "AzureSearch:ApiKey";
string searchEndpoint = "AzureSearch:Endpoint";

var embeddingConfig = new AzureOpenAIConfig
{
    APIKey = apiKey,
    Deployment = deploymentEmbeddingName,
    Endpoint = endpoint,
    APIType = AzureOpenAIConfig.APITypes.EmbeddingGeneration,
    Auth = AzureOpenAIConfig.AuthTypes.APIKey
};

var chatConfig = new AzureOpenAIConfig
{
    APIKey = apiKey,
    Deployment = deploymentChatName,
    Endpoint = endpoint,
    APIType = AzureOpenAIConfig.APITypes.ChatCompletion,
    Auth = AzureOpenAIConfig.AuthTypes.APIKey
};

var kernelMemory = new KernelMemoryBuilder()
    .WithAzureOpenAITextGeneration(chatConfig)
    .WithAzureOpenAITextEmbeddingGeneration(embeddingConfig)
    .WithAzureAISearchMemoryDb(searchEndpoint, searchApiKey)
    .Build<MemoryServerless>();

Kernel Memory requires a text generation model (to generate answers out from the question) and an embedding model (to convert documents into vectors, so that they can be stored in a vector database). As such, we provide the configuration for both of them using the AzureOpenAIConfig class (you can switch to the OpenAIConfig class in case you’re using the OpenAI APIs). In this sample, we’re going to use also the same Azure AI Search instance we have used in the previous post, so we also initialize it using the WithAzureAISearchMemoryDb() method by providing the same endpoint and API key. Finally, we generate a KernelMemory object by calling the Build<MemoryServerless>() method, which will initialize the service in serverless mode.

Now we can create a MemoryPlugin object and load it into Semantic Kernel:

1
2


var plugin = new MemoryPlugin(kernelMemory, waitForIngestionToComplete: true);
kernel.ImportPluginFromObject(plugin, "memory");

This approach is similar to the one we have seen when we talked about the Bing and Microsoft Graph plugins: we create a new instance of the MemoryPlugin class and then we import it into Semantic Kernel using the ImportPluginFromObject() method, by supplying also the plugin name.

Since we want to generate business mails out of the answers we get about the employee handbook, let’s not forget to import our MailPlugin which contains the WriteBusinessMail prompt function (reference this post for details on the plugin):

1
2


var pluginsDirectory = Path.Combine(Directory.GetCurrentDirectory(), "Plugins", "MailPlugin");
kernel.ImportPluginFromPromptDirectory(pluginsDirectory, "MailPlugin");

Now let’s use the function calling feature of Semantic Kernel to define our ask and to let the framework automatically pick the two plugins we loaded: the MemoryPlugin, to query Azure AI Search, and the MailPlugin, to convert the answer into a business mail. In order to do that, however, we need to slightly change the way we define our prompt. Let’s take a look at the complete code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24


OpenAIPromptExecutionSettings settings = new()
{
    ToolCallBehavior = ToolCallBehavior.AutoInvokeKernelFunctions,
};


var prompt = @"
            Question to Kernel Memory: {{$input}}

            Kernel Memory Answer: {{memory.ask}}

            If the answer is empty say 'I don't know', otherwise reply with a business mail to share the answer.
            ";


KernelArguments arguments = new KernelArguments(settings)
{
    { "input", "What is Contoso Electronics?" },
};

var response = await kernel.InvokePromptAsync(prompt, arguments);

Console.WriteLine(response.GetValue<string>());
Console.ReadLine();

First, we set the ToolCallBehavior property of the OpenAIPromptExecutionSettings to AutoInvokeKernelFunctions, which means that Semantic Kernel won’t just identify the plugins to use to satisfy the request, but it will also automatically call the functions defined by the plugins. Then, we define the prompt, which is a bit different from what have seen so far. The first part should be familiar: we’re using the templating feature of prompt functions to define a placeholder called input, that we’re going to replace later with the real question of the user. Let’s look now at something new: in a template, we can ask Semantic Kernel to explicitly invoke a function. In this case, we’re asking to invoke the ask function provided by the MemoryPlugin, which will combine the LLM capabilities with a semantic search on our vector database (in this case, Azure AI Search). Implicitly, the same input placeholder we have provided will be used as a parameter for the function.

To summarize, the prompt will be executed in this way:

The user asks a question, like What is Contoso Electronics?. The question is embedded into the prompt.
We invoke the ask function using, as input, the question. The generated answer will be embedded into the prompt as well.
Finally, we ask to turn the answer into a business mail. If Kernel Memory isn’t able to find an answer, we instruct the LLM to just say “I don’t know”.

Finally, we invoke the prompt using the InvokePromptAsync() method, by providing the prompt and the arguments (in this case, the value of the input placeholder with the questions we want to ask). The result will be something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24


Subject: Introduction to Contoso Electronics - Your Partner in Advanced Aerospace Solutions

Dear [Recipient's Name],

I hope this message finds you well.

I am writing to introduce you to Contoso Electronics, a distinguished provider of advanced electronic components within the aerospace industry. 
Our expertise lies in delivering both commercial and military aircraft systems that embody the pinnacle of innovation, reliability, and efficiency.

At Contoso Electronics, we take immense pride in our steadfast commitment to quality. Our primary focus is to ensure the delivery of superior aircraft components, 
with an unwavering attention to safety and the pursuit of excellence. We are honored to have established a credible reputation 
in the aerospace sector and are relentlessly dedicated to the ongoing enhancement of our product offerings and customer service.

Our success is driven by a team of skilled engineers and technicians, all of whom are devoted to furnishing our clientele with exceptional products and services. 
The values that define us-hard work, innovation, collaboration, quality, integrity, teamwork, respect, excellence, accountability, and community engagement-are at the core of everything we do.

Should you have any questions or wish to explore how Contoso Electronics can support your business needs, please feel free to reach out. 
We would be delighted to engage in further discussion regarding our solutions and how we can contribute to the success of your projects.

Thank you for considering an affiliation with Contoso Electronics. We look forward to the possibility of a fruitful collaboration.

Warm regards,

AI Assistant

It worked! As you can see, with a single execution, we’ve been able to generate a response starting from private data (our employee handbook) and convert it into a business mail. Let’s see another example by implementing a chat experience!

Implementing a chat experience

In this example, we’re going to use the same scenario: we want to ask questions about our employee handbook. However, this time, we want to implement a chat experience, so that can ask multiple questions retaining the context of the conversation, something that it isn’t possible to achieve just with Kernel Memory. The initialization code is the same we have seen before, however we’re going to change the way we use Semantic Kernel:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21


var chatHistory = new ChatHistory();
var chatCompletionService = kernel.GetRequiredService<IChatCompletionService>();

while (true)
{
   var message = Console.ReadLine();

   var prompt = $@"
           Question to Kernel Memory: {message}

           Kernel Memory Answer: {{memory.ask}}

           If the answer is empty say 'I don't know', otherwise reply with the answer.
           ";


   chatHistory.AddMessage(AuthorRole.User, prompt);
   var result = await chatCompletionService.GetChatMessageContentAsync(chatHistory, settings, kernel);
   Console.WriteLine(result.Content);
   chatHistory.AddMessage(AuthorRole.Assistant, result.Content);
}

To implement a chat experience with Semantic Kernel we can use the ChatCompletionService object, which we have learned about when we introduced function calling. The service is automatically registered by Semantic Kernel when we register a Chat Completion model in KernelBuilder, so we can retrieve it using dependency injection by calling GetRequiredService<IChatCompletionService>(). The chat implementation happens inside an endless loop, so that we can continuously ask questions until we close the application. We read the message of the user using Console.ReadLine(), then we inject it into the same prompt we have seen in the first example of this post. However, in this case, we’re injecting the message using the C# string manipulation capabilities, rather than the Semantic Kernel templating engine. This is because, when we use the ChatCompletionService, we can’t use prompt templates. However, the prompt works in the same way: we use the ask function of the plugin to provide an answer to the question and then we share it with the user. However, this time, we just return the straight answer, without converting it into a business mail.

Before executing the prompt by calling GetChatMessageContentAsync(), we add it to the ChatHistory collection, by setting the role as User, since this is the message written by the user. Once we have a result, we show it to the user and we add it to the ChatHistory collection, this time setting the role as Assistant, since this is generated by the LLM. Thanks to this approach, every time we invoke the GetChatMessageContentAsync() method, we’re supplying to the LLM the entire chat history, which will enable us to retain the context of the conversation. Let’s see an example. Run the application and let’s ask What is Contoso Electronics?. The answer will be something like this:

1
2
3
4


Contoso Electronics is a leader in the aerospace industry, known for providing advanced electronic components for both commercial and military aircraft. 
The company focuses on creating innovative and reliable systems that offer efficiency and quality. 
With a commitment to safety and excellence, Contoso Electronics has built a reputable stature in the aerospace market.
Its mission involves delivering top-quality aircraft components to its customers.

Now let’s ask another follow-up question, like And which are its values?. This is the response we should get:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


The values of Contoso Electronics are:

1. **Quality**: Ensuring the delivery of the highest quality products and services to customers.
2. **Integrity**: Upholding honesty, respect, and trustworthiness in all interactions.
3. **Innovation**: Fostering creativity and supporting new ideas and approaches to business.
4. **Teamwork**: Emphasizing the belief that collaboration leads to greater success.
5. **Respect**: Maintaining respect and dignity towards all employees, customers, and partners.
6. **Excellence**: Aiming to exceed expectations and provide exceptional service.
7. **Accountability**: Taking responsibility for one's actions and holding individuals accountable for their performance.
8. **Community**: Committing to creating a positive impact in the communities where they operate.

As you can see, we didn’t have to specify the full context again. Instead of asking Which are the Contoso Electronics values?, we just asked And which are its values?, but the LLM was able to understand the context of the conversation and provide the right answer thanks to the usage of the ChatHistory collection.

Wrapping up

Kernel Memory is a memory powerful service to implement RAG experiences in your applications and to enable LLMs to work private data. If Kernel Memory alone works fine for simple Q&A scenarios, it really shines when you use it in combination with Semantic Kernel, since you can enable more complex AI workflows. In this post we have seen two examples:

The ability to combine the private knowledge stored in a vector database with other plugins and functions.
The implementation of a chat experience, which provides a more natural and powerful Q&A experience to the user.

You can find the complete code of this post on GitHub.

Happy coding!