In the previous post, we have learned how Kernel Memory, an open-source service by Microsoft, can greatly simplify the implementation of Retrieval Augmented Generation (RAG) experiences, which enable to use the power of LLMs in combination with private data, like organizational documents. What we have seen in the previous post was very powerful, but it was also a bit limited. We were able, in fact, to ask direct questions related to our documents, like What is Contoso Electronics?, but what if you need to perform a more complex task with these documents, like having a continuous chat experience or using the information stored into the document to perform other activities? That’s the right job for Semantic Kernel which, thanks to the usage of plugins, functions calling and planners, allows to build complex AI workflows.
In ths post, we’re going to see how we can combine Semantic Kernel and Kernel Memory, thanks to the usage of a dedicated plugin. We’re going to build two different scenarios:
- A console application that is going to combine two of the scenarios we’ve already seen: using a prompt function to convert a text into a business mail (which was explained here) and using Kernel Memory to store as embeddings the employee handbook of a fictitious company (which we explained here). In this case, we don’t want just to answer a question about the handbook, but we want to convert the answer into a business mail.
- A continuous chat experience, where we’re going to ask multiple questions about the handbook, retaining the context of the conversation.
To go through the samples, I assume you have already tested the project I made available on GitHub. Specifically, I’m going to assume that you have already used the Blazor application to upload the employee handbook and convert it into embeddings, which have been stored on an Azure AI Search instance. If you haven’t done it yet, please follow the instructions in the previous post to do it.
Regardless of the scenario, the way we set up the plugin is the same. Let’s take a look!
Setting up the plugin
Let’s start by setting up the project. The first step is to initialize Semantic Kernel in the same way we did in all the other posts about this library:
|
|
Now we need to import the Kernel Memory plugin so, as first step, we need to install the NuGet package called Microsoft.KernelMemory.SemanticKernelPlugin. Now we can use the MemoryPlugin
class offered by this library which, however, requires to initialize the KernelMemory
instance we want to use. In the previous post, we have used Kernel Memory in serverless mode (which means that the service is hosted by the application itself), so we’ll continue to use the same approach in this post. However, keep in mind that you are free to use the dedicated service if you need a more scalable solution. As such, we initialize the KernelMemory
object in the same way we did in the Blazor application we have built in the previous post, by using the KernelMemoryBuilder
class:
|
|
Kernel Memory requires a text generation model (to generate answers out from the question) and an embedding model (to convert documents into vectors, so that they can be stored in a vector database). As such, we provide the configuration for both of them using the AzureOpenAIConfig
class (you can switch to the OpenAIConfig
class in case you’re using the OpenAI APIs). In this sample, we’re going to use also the same Azure AI Search instance we have used in the previous post, so we also initialize it using the WithAzureAISearchMemoryDb()
method by providing the same endpoint and API key. Finally, we generate a KernelMemory
object by calling the Build<MemoryServerless>()
method, which will initialize the service in serverless mode.
Now we can create a MemoryPlugin
object and load it into Semantic Kernel:
|
|
This approach is similar to the one we have seen when we talked about the Bing and Microsoft Graph plugins: we create a new instance of the MemoryPlugin
class and then we import it into Semantic Kernel using the ImportPluginFromObject()
method, by supplying also the plugin name.
Since we want to generate business mails out of the answers we get about the employee handbook, let’s not forget to import our MailPlugin
which contains the WriteBusinessMail
prompt function (reference this post for details on the plugin):
|
|
Now let’s use the function calling feature of Semantic Kernel to define our ask and to let the framework automatically pick the two plugins we loaded: the MemoryPlugin
, to query Azure AI Search, and the MailPlugin
, to convert the answer into a business mail. In order to do that, however, we need to slightly change the way we define our prompt. Let’s take a look at the complete code:
|
|
First, we set the ToolCallBehavior
property of the OpenAIPromptExecutionSettings
to AutoInvokeKernelFunctions
, which means that Semantic Kernel won’t just identify the plugins to use to satisfy the request, but it will also automatically call the functions defined by the plugins. Then, we define the prompt, which is a bit different from what have seen so far. The first part should be familiar: we’re using the templating feature of prompt functions to define a placeholder called input
, that we’re going to replace later with the real question of the user. Let’s look now at something new: in a template, we can ask Semantic Kernel to explicitly invoke a function. In this case, we’re asking to invoke the ask
function provided by the MemoryPlugin
, which will combine the LLM capabilities with a semantic search on our vector database (in this case, Azure AI Search). Implicitly, the same input
placeholder we have provided will be used as a parameter for the function.
To summarize, the prompt will be executed in this way:
- The user asks a question, like What is Contoso Electronics?. The question is embedded into the prompt.
- We invoke the
ask
function using, asinput
, the question. The generated answer will be embedded into the prompt as well. - Finally, we ask to turn the answer into a business mail. If Kernel Memory isn’t able to find an answer, we instruct the LLM to just say “I don’t know”.
Finally, we invoke the prompt using the InvokePromptAsync()
method, by providing the prompt and the arguments (in this case, the value of the input
placeholder with the questions we want to ask). The result will be something like this:
|
|
It worked! As you can see, with a single execution, we’ve been able to generate a response starting from private data (our employee handbook) and convert it into a business mail. Let’s see another example by implementing a chat experience!
Implementing a chat experience
In this example, we’re going to use the same scenario: we want to ask questions about our employee handbook. However, this time, we want to implement a chat experience, so that can ask multiple questions retaining the context of the conversation, something that it isn’t possible to achieve just with Kernel Memory. The initialization code is the same we have seen before, however we’re going to change the way we use Semantic Kernel:
|
|
To implement a chat experience with Semantic Kernel we can use the ChatCompletionService
object, which we have learned about when we introduced function calling. The service is automatically registered by Semantic Kernel when we register a Chat Completion model in KernelBuilder
, so we can retrieve it using dependency injection by calling GetRequiredService<IChatCompletionService>()
.
The chat implementation happens inside an endless loop, so that we can continuously ask questions until we close the application. We read the message of the user using Console.ReadLine()
, then we inject it into the same prompt we have seen in the first example of this post. However, in this case, we’re injecting the message using the C# string manipulation capabilities, rather than the Semantic Kernel templating engine. This is because, when we use the ChatCompletionService
, we can’t use prompt templates. However, the prompt works in the same way: we use the ask
function of the plugin to provide an answer to the question and then we share it with the user. However, this time, we just return the straight answer, without converting it into a business mail.
Before executing the prompt by calling GetChatMessageContentAsync()
, we add it to the ChatHistory
collection, by setting the role as User
, since this is the message written by the user. Once we have a result, we show it to the user and we add it to the ChatHistory
collection, this time setting the role as Assistant
, since this is generated by the LLM. Thanks to this approach, every time we invoke the GetChatMessageContentAsync()
method, we’re supplying to the LLM the entire chat history, which will enable us to retain the context of the conversation. Let’s see an example. Run the application and let’s ask What is Contoso Electronics?. The answer will be something like this:
|
|
Now let’s ask another follow-up question, like And which are its values?. This is the response we should get:
|
|
As you can see, we didn’t have to specify the full context again. Instead of asking Which are the Contoso Electronics values?, we just asked And which are its values?, but the LLM was able to understand the context of the conversation and provide the right answer thanks to the usage of the ChatHistory
collection.
Wrapping up
Kernel Memory is a memory powerful service to implement RAG experiences in your applications and to enable LLMs to work private data. If Kernel Memory alone works fine for simple Q&A scenarios, it really shines when you use it in combination with Semantic Kernel, since you can enable more complex AI workflows. In this post we have seen two examples:
- The ability to combine the private knowledge stored in a vector database with other plugins and functions.
- The implementation of a chat experience, which provides a more natural and powerful Q&A experience to the user.
You can find the complete code of this post on GitHub.
Happy coding!