If you have followed my posts about Semantic Kernel and you are familiar with the AI terminology, you should have realized by now that basically every sample we built so far could be identified as a single-agent scenario. In the AI ecosystem, in fact, we call agents intelligent actors that are able to perform a task, given a set of instructions and, optionally, tools that they can use to perform it. Semantic Kernel makes really easy to build such an agent: thanks to features like plugins, prompt functions and function calling, we can easily let an LLM to figure out the best plan to perform a task and return the result to us.
What if, however, the task is too complex to be performed by a single agent? What if, in order to full fill the task, you need the LLM to interpret different roles to achieve it? Think, for example, to software development. When you build a project, you need a PM that fleshes out the specifications; you need an architect that designs the architecture; you need a developer that writes the code; you need an IT expert that deploys the needed infrastructure. Let’s say now we want to use an LLM to implement this scenario. Since LLMs have been trained with a huge amount of knowledge, technically you could try to write a prompt that instructs the LLM to play all these different roles and come up with a solution. Practically, however, this would be a nightmare: the prompt would be too complex, and the LLM would struggle to understand it.
This is why, in the AI ecosystem, many frameworks are introducing the concept of multi-agents. Instead of trying to accomplish a task with a single agent, you can split the task into multiple sub-tasks and assign each sub-task to a different agent. Each agent will then perform its task and communicate with the other agents to achieve the final goal. This way, the complexity of the task is split into smaller, more manageable tasks, and the agents can focus on their specific role. Additionally, each agent can be more carefully specialized, which simplifies the orchestration for the LLM. For example, if you have a plugin to generate code, you’ll assign it only to the agent who plays the role of a developer.
In this post, we’ll take a look at how to create multi-agent scenarios in Semantic Kernel. This feature is still experimental, and requires the usage of a preview NuGet package, but it works! We’re going to use to create a travel agency scenario: we want to visit a city and different agents will help us to plan the trip. Let’s start!
Setting up the project
Setting up the project is no different than what we did in the past. For this sample, we’re going to create a standard .NET Console application and we’re going to install the following NuGet packages:
- Microsoft.SemanticKernel
- Microsoft.SemanticKernel.Agents.Core. This one is in marked as prerelease so, if you’re installing the package using the NuGet Package Manager, you need to check the “Include prerelease” checkbox.
Then, you’ll need to initialize the kernel using the OpenAI or Azure OpenAI services. In my case, I’m going to use the Chat Completion APIs from Azure OpenAI:
|
|
The Azure OpenAI configuration, in my case, is stored in a user secret file.
The last step is to suppress all the warnings that the compiler will raise because the Microsoft.SemanticKernel.Agents.Core
package is still in preview. The easiest way to do that is to open the project file and add the following line:
|
|
Now that the project is set up, let’s create the agents!
Creating the agents
Let’s explain in a more detailed way the scenario that we’re going to build. We’re going to envision a travel agency, which has the following employees:
- A travel expert, who has experience in finding the best hotels, restaurants, and attractions in a city.
- A flight expert, who has experience in finding the best travel options via plane.
- A travel manager, who reviews the entire plan and makes sure that everything is in order.
Let’s start by defining these three roles. You’ll realize that the most complex part of defining an agent isn’t really writing the code, but rather writing a good prompt that instructs the LLM to play the role. Let’s start with the travel expert:
The travel expert
Let’s define the instructions for the travel expert:
|
|
The instructions are pretty detailed and they explain to the LLM what the travel expert can do and what it can’t do. The travel expert can suggest hotels, restaurants and places to see, but it can’t suggest traveling options. It’s also laser focused on the goal and it doesn’t waste time with chit chat. Providing these details is important, because LLMs can be very chatty and they can easily go off topic. With these instructions, we make sure that the travel agent will stick on suggesting hotels, restaurants and places to see, leaving the travel organization to another expert. Notice that we have also defined the name of the agent. This is important because we’ll use this name to refer to the agent in the orchestration phase.
Now we can use the Semantic Kernel APIs included Microsoft.SemanticKernel.Agents.Chat
namespace to create the agent:
|
|
We use the ChatCompletionAgent
class and we supply the name of the agent, the instructions we have defined and a reference to the kernel.
Now let’s move to the flight expert.
The flight expert
The code is the same we have seen before, we just change the name of the agent and the set of instructions:
|
|
Similarly to what we did with the travel expert, we create a detailed set of instructions, so that the agent will stick in creating flight plans and it won’t try to give suggestions about other related topics, like hotels or restaurants, or other travel options, like trains.
Then we create a new agent out of the instructions:
|
|
Let’s move to the final agent of our agency: the travel manager.
The travel manager
The travel manager is the agent that will validate the work of the other agents. It will receive the plans from the travel expert and the flight expert and it will review them to make sure that everything is in order. Let’s define the instructions:
|
|
The travel manager can lead the conversation to two different paths: if the plan is good, it’s going to recap it with a table and it’s going to say “the plan is approved”. As you’re going to see in a bit, we’re going to use this sentence as a signal that the conversation is completed. If the plan can be improved, instead, it’s going to create a new one.
Let’s create the agent, as usual:
|
|
Now we can start to enter into the orchestration phase.
Defining the terminate logic
When you build a multi-agent scenario, you need to define a termination logic. This logic is used to determine when the conversation is completed and the chat between the agents must end. Semantic Kernel provides a few classes to support different scenarios. One of the most powerful, which we’re going to use today, it’s the KernelFunctionTerminationStrategy
, which enables to define the logic with a prompt, wrapped by the KernelFunction
class.
Let’s see how to define the termination logic for our scenario:
|
|
We use the KernelFactionFactory.CreateFromPrompt()
method to create a new KernelFunction
from a prompt, which describes our logic. Remember that, when we have set up the instructions for the travel manager, we said that it must use the sentence “the plan is approved” when the plan is good enough? As such, we use the prompt to determine if the plan has been approved and, in this case, to respond with the word “yes”.
We also supply a parameter called history
, which is going to incorporate all the chat history so that the agent has the full context.
Defining the selection logic
The next step is to define the selection logic, which means how the orchestration is going to pick up who is the next agent in the conversation. Semantic Kernel includes a few classes as well to manage this scenario. The simplest one is called SequentialSelectionStrategy
, which is simply going to execute the agents one after the other. In our case, however, we need a more complex strategy, since agents might need to go back and forth to complete the task. As such, also in this case we’re going to use a strategy based on a prompt, by leveraging the KernelFunctionSelectionStrategy
class. Let’s take a look at our prompt:
|
|
The prompt is pretty detailed and it explains to the orchestration how to pick up the next agent in the conversation. As you can see, this logic is a bit more complex thant the termination one. To avoid as much as possible hallucinations, we provide a set of very strict information, leveraging the name of the agents that we have previously defined and incorporating them using the {{{ }}}
syntax.
First, we list who are the available participants. Then we define the following order of interaction:
- After the user has provided information about the trip, it’s the travel agent’s turn to create a trip plan.
- Due to the instructions we have provided, the trip plan will miss the travel options. As such, we’re going to pass the conversation to the flight expert, who will generate a flight plan.
- Then it’s travel manager’s turn, who is going to review the plan and approve it if it’s good enough. If the plan isn’t good, the conversation will go back to the travel agent, who will generate a new plan.
Putting all together
Now we have everything we need to kick off the conversation:
- Our agents
- A termination strategy
- A selection strategy
A conversation in Semantic Kernel is represented by the AgentGroupChat
class. Let’s see how to initialize it:
|
|
First, in the constructor, we pass the list of agents that are going to participate in the conversation. Then we set up the execution settings, where we define the termination strategy and the selection strategy. As we said, we’re going to use the KernelFunctionTerminationStrategy
and the KernelFunctionSelectionStrategy
classes. However, we need provide more than the prompt that we have created to describe the strategy:
For the termination strategy, we need to provide:
- In the constructor, the
KernelFunction
object with the termination prompt we have previously created and a reference to the kernel. - The agents that are in charge of the strategy. In this case, it’s the travel manager, since it’s the one that can approve the plan, leading to the conclusion of the conversation.
- The signal that the conversation is completed. In this case, we’re going to check if the result contains the word
yes
(remember that the termination prompt instructs the LLM to say “yes” if the plan has been approved). - The name of the variable which holds the history of the conversation, which in our case is
history
. - The maximum number of iterations. This is a safety measure to avoid infinite loops, in case the termination strategy isn’t met before.
For the selection strategy, instead, we must provide:
- Also in this case, the
KernelFunction
object with the prompt that represents our selection strategy that we have previously created and a reference to the kernel. - The name of the variables which contains the list of agents, which is
agents
. - The name of the variable which contains the history of the conversation, which is
history
.
Kicking off the conversation
Now that we have everything in place, we can start the conversation:
|
|
First, we define the prompt that we want to submit as a user. In this case, we state that we live in Italy and we want to visit Paris. We also provide some information about the budget, the travel preferences and the duration of the trip.
Then we wrap the prompt into a ChatMessageContent
object and we add it to the conversation using the AddChatMessage()
method offered by the AgentGroupChat
class.
The results are returned by calling the InvokeAsync()
method, which returns an asynchronous enumerator, so we can use a foreach
loop prefixed by the await
keyword to iterate over them.
The user prompt will kick off the conversation according to our selection strategy and so, each time we call the InvokeAsync()
method, we’ll kick off the next agent in the conversation. For each generated message, we print on the console:
- The role of the user that generated the message (the user, the assistant, etc.)
- The name of the agent
- The content of the message
Once the conversation is completed because the termination strategy has been met, the InvokeAsync()
method won’t return any more result and the IsComplete
property of the AgentGroupChat
object will be set to true
.
Now you can run the code! The output will be quite long, so let me split into multiple sections so that you can see more easily the different parts of the conversation. The first agent to intervene is the travel agent, which generates a 3-day plan that includes the hotels, restaurants and attractions in Paris.
|
|
Then, the next actor in the conversation is the flight expert, who generates a flight plan to reach Paris:
|
|
Finally, the travel manager will review the plan, provide a recap with a table and approve it:
|
|
And now we’re ready to fly to Paris :-)
A different conversation
If you’re going to run the code we’ve written multiple times, you’ll notice that the travel manager will almost always just approve the plan. Since the travel expert and the flight expert have worked to put together all the details, the plan is going to be good enough most of the time.
What if we want to test what happens in case, instead, the plan is not good? We can do that by changing our orchestration and removing the flight expert from the equation.
First, we must change the selection logic, since the flight expert won’t be involved in the conversation anymore:
|
|
Now we list only as participants the travel manager and the travel agent. Then, we tweak the steps to follow to decide who is the next participant in the conversation: after the travel agent has provided the plan, the travel manager will immediately review it.
Finally, when we initialize the AgentGroupChat
object, we must keep in the collection only the two agents:
|
|
Now you can run the code, passing as input the same prompt we use to describe the city we want to visit and our budget and travel preferences. You should see the following conversation happening:
- The travel agent will generate a plan that, given the instructions we provided, will miss the travel options.
- The travel manager will report that the plan isn’t good enough because travel options are missing, so it will generate a new one which includes them.
- The travel manager will finalize and approve it.
What’s next?
You can use this sample project as a starting point to experiment with other scenarios and add even more complexity. In the final sample published on GitHub, for example, you’ll find that I’ve added another agent that can suggest train options to reach the destination. To support this new agent, you’ll find a slightly tweaked selection logic:
|
|
As you can see, other than adding the train expert to the list of participants, we have also added a new step in the selection logic. After the travel agent has provided the plan, the selection logic will decide if the next participant is the flight expert or the train expert, based on the user preferences. This way, you can create a more complex scenario where the conversation can take different paths based on the user input. To test this, try to specify your preference to travel by train in the user prompt:
|
|
If you run the conversation, you’ll notice that this time the flight expert will be skipped and the conversation will flow from the travel agent to the train expert to the travel manager for approval.
Wrapping up
In this post, we have tasted the potential of AI agents and multi-agents scenarios. It may look a bit like role-playing, but it opens up a new world of possibilities in the AI ecosystem. By splitting a complex task into multiple sub-tasks and assigning each sub-task to a different agent, you are much more likely to achieve the task you want to perform in a more efficient and reliable way. This is especially true when you’re working with LLMs, which can easily get lost in complex prompts. By using multi-agents scenarios, you can provide a more focused and specialized set of instructions to each agent, which will make the orchestration much easier and the results more reliable. Semantic Kernel greatly helps in building these scenarios, by providing all the infrastructure you need to create agents, orchestrate them, defining the termination and selection strategies, and much more. Additionally, we haven’t touched this scenario in this post, but don’t forget that all the Semantic Kernel features we have learned in the past posts are still available in multi-agent scenarios. This means that you can still use plugins, prompt functions, and function calling to create more complex and powerful agents.
You can find the sample I’ve created in this post on GitHub.