Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 10 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,23 @@
# Agentune: The Path to Self-Improving AI Agents

[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Twitter Follow](https://img.shields.io/twitter/follow/agentune_sb?style=social)](https://x.com/agentune_sb)
[![Discord](https://img.shields.io/discord/1375004885845807114?color=7289da&label=discord&logo=discord&logoColor=white)](https://discord.gg/Hx5YYAaebz)

---

At SparkBeyond, we believe that operational AI agents in general, and customer-facing AI agents in particular, should improve over time effectively.

Operational agents are designed and built with KPIs in mind. To optimize an agent, whether it's a sales agent optimizing conversion rates and order values, or a support agent increasing the customer satisfaction rate, we systematically apply the cycle:

<div style="text-align: center">

**Evaluate → Analyze → Improve → Simulate**
**Analyze → Improve → Simulate**

</div>

That is, we measure the KPIs for the current behavior of the agent, analyze what is driving each KPI up or down, and then propose decisions and actions to improve each KPI. The improved agent is then tested using a customer simulation. Once tested and deployed, we restart the cycle with the agent's new operational data!
That is, we analyze what is driving each KPI up or down and propose decisions and actions to improve each KPI. The improved agent is then tested using a customer simulation. Once tested and deployed, we restart the cycle with the agent's new operational data!

We are looking to release the different modules that are part of this framework in the coming months. At start, we are now releasing the <a href="https://github.com/SparkBeyond/agentune/tree/main/agentune_simulate/">Agentune Simulate</a> customer simulator.

Looking to use Agentune and need help? Please contact us. We are committed to assist early adopters in making the most of it!
25 changes: 20 additions & 5 deletions agentune_simulate/README.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,36 @@
# Agentune Simulate

Developing your customer-facing conversational AI agent? Want to ensure it behaves as expected before going live? Agentune Simulate is here to help!
[![CI](https://github.com/SparkBeyond/agentune/actions/workflows/python-tests.yml/badge.svg?label=CI)](https://github.com/SparkBeyond/agentune/actions)
[![PyPI version](https://badge.fury.io/py/agentune-simulate.svg)](https://pypi.org/project/agentune-simulate/)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Twitter Follow](https://img.shields.io/twitter/follow/agentune_sb?style=social)](https://x.com/agentune_sb)
[![Discord](https://img.shields.io/discord/1375004885845807114?color=7289da&label=discord&logo=discord&logoColor=white)](https://discord.gg/Hx5YYAaebz)

---

**Launching an AI Agent? Stop guessing, start simulating.**

Many developers and data scientists struggle to test and validate AI agents effectively. Some deploy directly to production, testing on real customers! Others perform A/B testing, which also means testing on real customers. Many rely on predefined tests that cover main use cases but fail to capture real user intents.

Agentune Simulate creates a customer simulator (twin) based on a set of real conversations. It captures the essence of your customers' inquiries and the way they converse, allowing you to simulate conversations with your AI agent, ensuring it behaves as expected before deployment.

Ready to deploy your improved AI agent? Use Agentune Simulate to validate it first against real customer interactions!

## How It Works
**Need help?** Please contact us. We are committed to assist early adopters in making the most of it!

## How Does It Work?

![Agentune Simulate Workflow](https://github.com/SparkBeyond/agentune/blob/main/agentune_simulate/docs/images/agentune-simulate-flow.png)
Running a simulation with Agentune Simulate generates realistic conversations between your AI agent and simulated customers. This lets you evaluate your agent's performance, identify edge cases, and validate behavior before real deployment.

![Agentune Simulate Workflow](https://raw.githubusercontent.com/SparkBeyond/agentune/main/agentune_simulate/docs/images/agentune-simulate-flow.png)

**How do we validate the twin customer simulator?** We create a twin AI-Agent and let them converse. we then evaluate the conversations to check that the customer simulator behaves as the real customer:

1. **Capture Conversations** - Collect real conversations between customers and your existing AI-agent
2. **Create Simulator** - Create twin Customer Simulator and AI-Agent from the captured conversations
3. **Simulate & Evaluate** - Simulate interactions to evaluate if the twin Customer Simulator behaves as your real customers

**Connect a Real Agent** - Now you can integrate your real agent system and run simulations with simulated customers to validate agent behavior
![Agentune Simulate Workflow](https://raw.githubusercontent.com/SparkBeyond/agentune/main/agentune_simulate/docs/images/agentune-simulate-validation-flow.png)

## Quick Start

Expand Down Expand Up @@ -49,9 +61,12 @@ Ready to deploy your improved AI agent? Use Agentune Simulate to validate it fir

1. **Quick Start** - [`getting_started.ipynb`](https://github.com/SparkBeyond/agentune/blob/main/agentune_simulate/examples/getting_started.ipynb) for a quick getting started example
2. **Production Setup** - [`persistent_storage_example.ipynb`](https://github.com/SparkBeyond/agentune/blob/main/agentune_simulate/examples/persistent_storage_example.ipynb) for a closer to real life, scalable, persistent example
3. **Validate _Your_ Data** - Adapt the 2nd example to load _your_ conversations data and validate the simulation
3. **Validate _Your_ Data** - Adapt the 2nd example to load _your_ conversations data and validate the simulation.
Here is an example of how to load conversations from tabular data: [`load_conversations_from_csv.ipynb`](https://github.com/SparkBeyond/agentune/blob/main/agentune_simulate/examples/load_conversations_from_csv.ipynb)
4. **Connect Real Agent** - [`real_agent_integration.ipynb`](https://github.com/SparkBeyond/agentune/blob/main/agentune_simulate/examples/real_agent_integration.ipynb) for integrating your existing agent systems

📧 **Need help? Have feedback?** Contact us at [agentune-dev@sparkbeyond.com](mailto:agentune-dev@sparkbeyond.com)

## Contributing

- **Environment Setup**: [Environment Setup Guide](https://github.com/SparkBeyond/agentune/blob/main/agentune_simulate/docs/development/environment-setup.md)
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
156 changes: 80 additions & 76 deletions agentune_simulate/examples/loading_conversations.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,10 @@
"source": [
"## Loading Conversations from DataFrames\n",
"\n",
"In some cases, conversation data might be available in DataFrames rather than JSON. \n",
"For example, you might have a DataFrame for messages and another for outcomes.\n",
"This section demonstrates how to create Conversation objects from these separate DataFrames."
" Often conversation data is available in tabular format.\n",
" For example, you might have a table for messages and another for outcomes.\n",
" This section demonstrates how to create Conversation objects from these separate\n",
" DataFrames."
]
},
{
Expand All @@ -20,22 +21,22 @@
},
{
"cell_type": "code",
"execution_count": 1,
"id": "de8b2ef74acf3b90",
"metadata": {
"ExecuteTime": {
"end_time": "2025-07-23T07:20:39.789891Z",
"start_time": "2025-07-23T07:20:37.034378Z"
"end_time": "2025-07-23T08:55:09.792380Z",
"start_time": "2025-07-23T08:55:06.943924Z"
}
},
"outputs": [],
"source": [
"# Import necessary libraries\n",
"import pandas as pd\n",
"\n",
"# Import Agentune simulate components\n",
"from agentune.simulate.models import Conversation, Message, Outcome, ParticipantRole"
]
],
"outputs": [],
"execution_count": 1
},
{
"cell_type": "markdown",
Expand All @@ -49,15 +50,13 @@
},
{
"cell_type": "code",
"execution_count": 2,
"id": "e7eb6ab7e40c513c",
"metadata": {
"ExecuteTime": {
"end_time": "2025-07-23T07:20:40.794988Z",
"start_time": "2025-07-23T07:20:40.790312Z"
"end_time": "2025-07-23T08:55:09.836941Z",
"start_time": "2025-07-23T08:55:09.827878Z"
}
},
"outputs": [],
"source": [
"# Create sample DataFrames that might come from a database or CSV files\n",
"# First, let's create a DataFrame for messages\n",
Expand All @@ -75,7 +74,9 @@
" {'conversation_id': 'conv_001', 'name': 'resolved', 'description': 'Issue was successfully resolved'},\n",
" {'conversation_id': 'conv_002', 'name': 'unresolved', 'description': 'Issue was not resolved'}\n",
"])"
]
],
"outputs": [],
"execution_count": 2
},
{
"cell_type": "markdown",
Expand All @@ -85,14 +86,18 @@
},
{
"cell_type": "code",
"execution_count": 3,
"id": "a09f03f65492f308",
"metadata": {
"ExecuteTime": {
"end_time": "2025-07-23T07:20:42.075055Z",
"start_time": "2025-07-23T07:20:42.063256Z"
"end_time": "2025-07-23T08:55:10.093296Z",
"start_time": "2025-07-23T08:55:10.075027Z"
}
},
"source": [
"# Display the DataFrames\n",
"print(\"Messages DataFrame:\")\n",
"messages_df.head()"
],
"outputs": [
{
"name": "stdout",
Expand All @@ -103,6 +108,28 @@
},
{
"data": {
"text/plain": [
" conversation_id sender \\\n",
"0 conv_001 customer \n",
"1 conv_001 agent \n",
"2 conv_001 customer \n",
"3 conv_002 customer \n",
"4 conv_002 agent \n",
"\n",
" content \\\n",
"0 I received a damaged product and need a replac... \n",
"1 I apologize for the inconvenience. We can arra... \n",
"2 Please do, and I expect a refund on the delive... \n",
"3 Is your warranty transferable if I sell the pr... \n",
"4 Yes, our warranty stays with the product for t... \n",
"\n",
" timestamp \n",
"0 2024-05-10T09:15:00.000000 \n",
"1 2024-05-10T09:17:30.000000 \n",
"2 2024-05-10T09:21:05.000000 \n",
"3 2024-05-15T14:35:22.000000 \n",
"4 2024-05-15T14:38:45.000000 "
],
"text/html": [
"<div>\n",
"<style scoped>\n",
Expand Down Expand Up @@ -167,51 +194,28 @@
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" conversation_id sender \\\n",
"0 conv_001 customer \n",
"1 conv_001 agent \n",
"2 conv_001 customer \n",
"3 conv_002 customer \n",
"4 conv_002 agent \n",
"\n",
" content \\\n",
"0 I received a damaged product and need a replac... \n",
"1 I apologize for the inconvenience. We can arra... \n",
"2 Please do, and I expect a refund on the delive... \n",
"3 Is your warranty transferable if I sell the pr... \n",
"4 Yes, our warranty stays with the product for t... \n",
"\n",
" timestamp \n",
"0 2024-05-10T09:15:00.000000 \n",
"1 2024-05-10T09:17:30.000000 \n",
"2 2024-05-10T09:21:05.000000 \n",
"3 2024-05-15T14:35:22.000000 \n",
"4 2024-05-15T14:38:45.000000 "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Display the DataFrames\n",
"print(\"Messages DataFrame:\")\n",
"messages_df.head()"
]
"execution_count": 3
},
{
"cell_type": "code",
"execution_count": 4,
"id": "49f887740e4d53f",
"metadata": {
"ExecuteTime": {
"end_time": "2025-07-23T07:20:43.143540Z",
"start_time": "2025-07-23T07:20:43.140615Z"
"end_time": "2025-07-23T08:55:10.255619Z",
"start_time": "2025-07-23T08:55:10.249131Z"
}
},
"source": [
"print(\"Outcomes DataFrame:\")\n",
"outcomes_df.head()"
],
"outputs": [
{
"name": "stdout",
Expand All @@ -222,6 +226,11 @@
},
{
"data": {
"text/plain": [
" conversation_id name description\n",
"0 conv_001 resolved Issue was successfully resolved\n",
"1 conv_002 unresolved Issue was not resolved"
],
"text/html": [
"<div>\n",
"<style scoped>\n",
Expand Down Expand Up @@ -262,22 +271,14 @@
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" conversation_id name description\n",
"0 conv_001 resolved Issue was successfully resolved\n",
"1 conv_002 unresolved Issue was not resolved"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print(\"Outcomes DataFrame:\")\n",
"outcomes_df.head()"
]
"execution_count": 4
},
{
"cell_type": "markdown",
Expand All @@ -287,15 +288,13 @@
},
{
"cell_type": "code",
"execution_count": 5,
"id": "3a1a29ab971003f1",
"metadata": {
"ExecuteTime": {
"end_time": "2025-07-23T07:20:44.221945Z",
"start_time": "2025-07-23T07:20:44.216380Z"
"end_time": "2025-07-23T08:55:10.382812Z",
"start_time": "2025-07-23T08:55:10.377968Z"
}
},
"outputs": [],
"source": [
"def create_conversations_from_dataframes(\n",
" messages_df: pd.DataFrame,\n",
Expand Down Expand Up @@ -343,7 +342,9 @@
" conversations.append(conversation)\n",
" \n",
" return conversations"
]
],
"outputs": [],
"execution_count": 5
},
{
"cell_type": "markdown",
Expand All @@ -353,14 +354,20 @@
},
{
"cell_type": "code",
"execution_count": 6,
"id": "81c0d9b2dd1918f3",
"metadata": {
"ExecuteTime": {
"end_time": "2025-07-23T07:20:47.074902Z",
"start_time": "2025-07-23T07:20:47.066741Z"
"end_time": "2025-07-23T08:55:10.518464Z",
"start_time": "2025-07-23T08:55:10.497257Z"
}
},
"source": [
"# Convert the DataFrames to Conversation objects\n",
"conversations = create_conversations_from_dataframes(messages_df, outcomes_df)\n",
"\n",
"# Display the resulting DataFrame\n",
"print(f\"Created {len(conversations)} conversations from DataFrames\")"
],
"outputs": [
{
"name": "stdout",
Expand All @@ -370,23 +377,20 @@
]
}
],
"source": [
"# Convert the DataFrames to Conversation objects\n",
"conversations = create_conversations_from_dataframes(messages_df, outcomes_df)\n",
"\n",
"# Display the resulting DataFrame\n",
"print(f\"Created {len(conversations)} conversations from DataFrames\")"
]
"execution_count": 6
},
{
"cell_type": "code",
"execution_count": null,
"id": "591b89043c0b2c76",
"metadata": {},
"outputs": [],
"cell_type": "markdown",
"source": [
"# Now with the conversations in the right format, we can load them into a vector store and run simulations"
]
"Now with the conversations in the right format, we can load them into a vector store and\n",
"run simulations.\n",
"\n",
"For a complete example of setting up persistent vector stores and running simulations at\n",
"scale, see the [Production Setup notebook](https://github.com/SparkBeyond/agentune/blob/main/agentune_simulate/examples/persistent_storage_example.ipynb).\n",
"\n"
],
"id": "489dd5892836dfbe"
}
],
"metadata": {},
Expand Down
Loading