AutoGen + LangChian + SQLite + Function Schema = Super AI Chabot 2024

Over the past year, I have built many AI chatbots most of them built on Autogen and Langchain.

so in this Post, I am going to be teaching you everything that I know about building a Super AI chatbot using Autogen, langchain, Sqlite and Function Schema so that you can have the same sort of success as AI engineering by utilising this amazing new technology.

you will be able to copy and paste my code and get started with building these Super Ai Chatbots for your business and your personal use as quickly as possible

but the best part about what I’m about to show you is that this method is going to give you complete flexibility and customization over how your app works and how the SQLite and function schema are processed.

For those who may not be as familiar with technical details, let me provide a bit of background to make things easier to understand.

Table of Contents

What is Langchain :

LangChain is a robust library designed to simplify interactions with various large language model (LLM) providers, including OpenAI, Cohere, Bloom, Huggingface, and others. What sets LangChain apart is its unique feature: the ability to create Chains, and logical connections that help in bridging one or multiple LLMs.

What are Autogen Agents :

Autogen is multi multi-agent framework launched by Microsoft that allows users to create and manage multiple autonomous agents to collaboratively complete complex tasks. This framework is extremely flexible, you can define different agents and their roles according to your needs, and then let them work together.

This multi-agent collaboration approach not only improves the efficiency of task completion, but also improves the quality of results, especially in areas such as programming, planning, and creative writing.

Autogen is a multiple agent and Langchain is a single-agent framework

AutoGen is more focused on code generation, while LangChain is more focused on general-purpose NLP tasks

Else See: TaskWeaver + Planner + Plugin = Super AI Agent

What is SQLite :

SQLite is one of the most popular databases for embedded software development. Essentially, it is a library written in C that offers a relational database management system.
SQLite is an open-source, serverless flexible, cross-platform that doesn’t require configuration and can manage low to medium traffic levels.
SQLite is a preferred choice among database administrators (DBAs) and software developers alike, due to its simplicity and efficiency.

What is the Function Schema :

The Function Calling schema in the context of GPT-4 APIs allows a user to describe functions and have the model intelligently choose to output a JSON object containing arguments to call one or many functions.

The schema includes the descriptions and parameters of the available functions, allowing the model to execute arbitrary function calls and even potentially aid in attacking the functions

Let’s start coding

We need to build a system that can run SQL commands directly from AutoGen. To do this, the process involves several steps: first, we’ll examine the available tables, then create and run an SQL query, and finally, present the results in a format like text, graphs, or tables.

Although this process involves multiple steps and can seem complex, we’re going to use the LangChain mechanism to make handling SQL from AutoGen much easier.

To make things easier, we’ve set up a way to use SQL through something we’re calling ‘Function Calling.’

This is possible using the agent toolkit in LangChain. As for the database, I chose SQLite since it’s simple to set up. However, if it’s compatible with LangChain, you’re also free to use other databases like MySQL or PostgreSQL

let’s start installing the requirements. Assuming you have created a new Python project and set up a virtual environment, run the command:

!pip install langchain , "pyautogen[retrievechat]" ,

let’s import the required dependencies:

import sqlite3
from langchain.agents import create_sql_agent
from langchain.agents.agent_toolkits import SQLDatabaseToolkit
from langchain.utilities.sql_database import SQLDatabase
from langchain.llms import OpenAI
import os
import autogen

Step 1: Create The Database

First, let’s create the database that will be used by the Agent. I’ll make something simple for now. Create a books table, authors table, and publishers table.

def create_table(cursor, create_table_sql):
    """ Execute a SQL statement to create a table. """
    cursor.execute(create_table_sql)

def insert_data(cursor, insert_sql, data):
    """ Insert data into a table using the provided SQL statement. """
    cursor.executemany(insert_sql, data)


database = 'bookstore.db'

with sqlite3.connect(database) as conn:
    cursor = conn.cursor()

    # Create the Books table
    sql_create_books_table = """
        CREATE TABLE IF NOT EXISTS Books (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            title TEXT NOT NULL,
            author_id INTEGER,
            publisher_id INTEGER,
            price REAL
        ); """
    create_table(cursor, sql_create_books_table)

    # Create the Authors table
     # Create the Authors table
    sql_create_authors_table = """
        CREATE TABLE IF NOT EXISTS Authors (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            name TEXT NOT NULL,
            biography TEXT
        );"""
    create_table(cursor, sql_create_authors_table)

    

    # Create the Publishers table
    sql_create_publishers_table = """
        CREATE TABLE IF NOT EXISTS Publishers (
            d INTEGER PRIMARY KEY AUTOINCREMENT,
            name TEXT NOT NULL,
            address TEXT
        );"""
    create_table(cursor, sql_create_publishers_table)

    # Insert sample data into Books table
    insert_books_sql = "INSERT INTO Books (title, author_id, publisher_id, price) VALUES (?, ?, ?, ?)"
    books_data = [
            ('To Kill a Mockingbird', 1, 1, 10.99),
            ('1984', 2, 2, 8.99)
        ]
    insert_data(cursor, insert_books_sql, books_data)

    # Insert sample data into Authors table
    insert_authors_sql = "INSERT INTO Authors (name, biography) VALUES (?, ?)"
    authors_data = [
        ('Harper Lee', 'Author of To Kill a Mockingbird'),
        ('George Orwell', 'Author of 1984')
    ]
    insert_data(cursor, insert_authors_sql, authors_data)

    # Insert sample data into Publishers table
    insert_publishers_sql = "INSERT INTO Publishers (name, address) VALUES (?, ?)"
    publishers_data = [
        ('Publisher A', '123 Street, City'),
        ('Publisher B', '456 Avenue, City')
    ]
    insert_data(cursor, insert_publishers_sql, publishers_data)

    conn.commit()
    cursor.close()

Step 2: Call SQL Database

we will load the database using the LangChain mechanism. By using SQLDatabaseToolkit provided by LangChain, you can execute SQL in natural language.

# Use an absolute path for the database
os.environ["OPENAI_API_KEY"] = ""
database_path = 'C:/Users/mrtar/Desktop/sql_autogen/bookstore.db'
db = SQLDatabase.from_uri(f'sqlite:///{database_path}')
llm = OpenAI(model_name="gpt-3.5-turbo-instruct", temperature=0)
toolkit = SQLDatabaseToolkit(db=db, llm=llm)

Step 3: Function Calling Schema

Let’s create a Function Calling schema to call the created toolkit as follows. Convert the toolkit parameters using the generate_llm_config function and save them to tool_schema. Also, tool._run calls LangChain’s toolkit and registers it in function_map.

def generate_llm_config(tool: Any) -> Dict[str, Any]:
    """
    Generate a Function schema from a LangChain's Agent Tool.

    Args:
    tool (Any): The tool to generate the schema for.

    Returns:
    Dict[str, Any]: The generated function schema.
    """
    function_schema = {
        "name": tool.name.lower().replace(' ', '_'),
        "description": tool.description,
        "parameters": {
            "type": "object",
            "properties": {},
            "required": [],
        },
    }

    # Assuming tool.args is a dictionary or None
    if tool.args:
        function_schema["parameters"]["properties"] = tool.args

    return function_schema

# Usage with Langchain Tool Bridge
tools = [generate_llm_config(tool) for tool in toolkit.get_tools()]  # List comprehension
function_map = {tool.name: tool._run for tool in toolkit.get_tools()}  # Dictionary comprehension

Step 4: Configuring AutoGen and API Key

Let’s set up the list to AutoGen, we create the config_list as follows:

config_list : is a list containing configuration settings for the model you intend to use.

timeout : Set to 120, this represents a timeout value in seconds.

With this configuration, we are ready to use AI Agents with AutoGen.

config_list = [{
    'model': 'gpt-4-1106-preview',
    'api_key': ''
}]

# Construct the llm_config
llm_config = {
  "functions": tools,
  "config_list": config_list,  # Assuming you have this defined elsewhere
  "timeout": 120,
}

Step 5: Create An Agent

Let’s create an Agent as usual. When multiple conversations are in progress, the OpenAI Agent returns TERMINATE when the task is finished, and the UserProxy stops working. Also, register the function_map created earlier.

user_proxy = autogen.UserProxyAgent(
    name="user_proxy",
    is_termination_msg=lambda x: x.get("content", "") and x.get("content", "").rstrip().endswith("TERMINATE"),
    human_input_mode="NEVER",
    max_consecutive_auto_reply=10,
    code_execution_config={"work_dir": "tmp", "use_docker": False},
)

# Register the tool and start the conversation
user_proxy.register_function(
    function_map = function_map
)

Sql_chatbot = autogen.AssistantAgent(
    name="Sql_chatbot",
    system_message="""Please adhere strictly to the predefined functions for all coding assignments. 
                    Once you have completed the task, kindly respond with the word 'TERMINATE' to signify its completion..""",
    llm_config=llm_config,
)

Step 6: Summary of the Results Table

I was tasked with explaining what’s on the table. It’s quite extensive, so I’ll just show you the results. When working with SQL, we have tools like sql_db_query, sql_db_schema, sql_db_list_tables, and sql_db_query_checker. This time, we’re using sql_db_list_tables to get a list of tables and then sql_db_schema to see the table details. It’s a multi-step process, but I’ve confirmed that it works effectively.


We have three tables in the database: `Authors`, `Books`, and `Publishers`. Below is the schema and sample data for each table:

1. `Authors` Table:
   - **Schema**:
     - `id`: INTEGER (Primary Key)
     - `name`: TEXT NOT NULL
     - `biography`: TEXT
   - **Sample Rows**:
     ```
     id |      name      |              biography
     ----------------------------------------------
     1  | Harper Lee     | Author of To Kill a Mockingbird
     2  | George Orwell  | Author of 1984
     3  | Harper Lee     | Author of To Kill a Mockingbird
     ```

2. `Books` Table:
   - **Schema**:
     - `id`: INTEGER (Primary Key)
     - `title`: TEXT NOT NULL
     - `author_id`: INTEGER
     - `publisher_id`: INTEGER
     - `price`: REAL
   - **Sample Rows**:
     ```
     id |         title          | author_id | publisher_id | price
     ----------------------------------------------------------------
     1  | To Kill a Mockingbird  |     1     |      1       | 10.99
     2  | 1984                   |     2     |      2       |  8.99
     3  | To Kill a Mockingbird  |     1     |      1       | 10.99
     ```

3. `Publishers` Table:
   - **Schema**:
     - `id`: INTEGER (Primary Key)
     - `name`: TEXT NOT NULL
     - `address`: TEXT
   - **Sample Rows**:
     ```
     id |    name    |       address
     -----------------------------------
     1  | Publisher A | 123 Street, City
     2  | Publisher B | 456 Avenue, City
     3  | Publisher A | 123 Street, City
     ```

These tables likely represent a simple bookstore database with relationships between authors, their books, and the publishers of these books.

Step 7: The Result

user_proxy.initiate_chat(
    Sql_chatbot,
    message="how many books in table",
    llm_config=llm_config,
)

Here you go

There are 26 books in the database.

Else See: AutoGen + LangChian + RAG + Function Call = Super AI Chabot

Conclusion :

By applying Function Calling like this, we can effectively utilize LangChain’s capabilities. In my recent experiment, I used this method to manage SQL.

Considering that databases are common in many settings, but writing queries can be challenging for many, especially those who aren’t engineers, I believe this kind of feature would be really helpful.

Reference:

- https://github.com/microsoft/autogen

Intro to DSPy: Simple Ideas To Improve Your RAG

Llmlingua + LlamaIndex + RAG = Cheaper Chatbot