Iāll show you how to create any chatbot locally for free in just a few minutes. Weāll make a chatbot that can automatically respond to questions about any topic you ask, as well as a multimodal chatbot that answers questions about images. Iāll guide you through using ollama libraries and Langchain to create both types of chatbots step by step. Plus, Iāll provide custom code so you donāt have to build it from scratch; you can simply copy and paste it into your project
So, hereās the plan: weāre going to create two different chatbots. The first one is a multimodal chatbot. Weāll use the LLAVA chatbot, which can automatically answer questions about images. The second chatbot is a Question-Answer system that automatically responds to any question you ask. This will be useful for any type of business.
Before we start!Ā š¦øš»āāļø
If you like this topic and you want to support me:
- Follow me on Medium and subscribe to get my latest articleš«¶
- follow me on Youtube Gao Dalie (é«éē)
Ollama PythonĀ library
![[ollama libraries š¦] Run Any Chatbot Free Locally on YourĀ Computer 12 ollama libraries](https://cdn-images-1.medium.com/max/1000/1*n_9afFHKdWUsuRMX3XYDGQ.png)
The Ollama Python library is a library that helps you to integrate ollama services into your Python project. Ollama is a platform that provides a natural language processing model, and this library allows you to easily utilize Ollamaās various features.
If you havenāt downloaded Ollama yet, please refer to my previous article where I explain how to download it.
Features of ollama librariesĀ :
The Ollama Python library supports a variety of functions and can perform various tasks such as streaming, multimodal, processing, text completion, and custom model creation.
1- Streaming responseĀ :
when you want to receive a continuous response from the server, you can use the āstream=Trueā option.
stream = ollama.chat(
model='llama2',
messages=[{'role': 'user', 'content': 'Why is the sky blue?'}],
stream=True,
)
for chunk in stream:
print(chunk['message']['content'], end='', flush=True)
Ā 2- Custom ClientĀ :
Users can create a custom client according to their needs to connect to the Ollama server.
3- Asynchronous support:
Can also be used in an Asynchronous environment, and an Asynchronous client can be used through the āAsyncClientā class. This class allows you to perform other tasks while waiting for communication with the Ollama server.
Using various features, the ollama Python library supports efficient natural language processing tasks. You can find more information in the Ollama libraryās GitHub repository provided below.
Install Ollama and dependencies
We create a virtual environment with āpython3.11 -m venv ollamaā and activate it with āsource ollama/bin/activateā and then install āpip install ollamaā
Installing Necessary Libraries
Before we start, letās install the required libraries. Create a file named requirements.txt and write the dependencies below.
langchain
streamlit
ollama
After that, just open the terminal in VS Code and run the below command.
pip install -r requirements.txt
Import Libraries
we going to import Streamlit, time, and ollama and then we import Streamlit ChatMessage History from langchain.
import streamlit as st
import time
import ollama
from langchain.memory.chat_message_histories import StreamlitChatMessageHistory
we will set up the page title and layout style for a streamlit web application.
st.set_page_config(page_title="Ollama Chatbot", page_icon="š¬")
letās configure the sidebar in a streamlit application. the sidebar function is used to add widgets and content to the sidebar of the streamlit app.
we set the title of the sidebar to āOllama chatbot ā to display a title in the sidebar. then we divide the line in the sidebar, separate the different sections
create a select box widget in the sidebar labelled āChoose a modelā. Users can select one of the three options: āMistralā, āSolarā, or āCode Llamaā. The selected option is stored in the variable selected_model
.
Hereās what happens in each branch of the if-elif-else statement:
- If the user selects āMistralā, it sets
llm_model
to “mistral” and displays information about the Mistral 7B model. - If the user selects āSolarā, it sets
llm_model
to “Solar” and displays information about the Llama 2 model released by Meta Platforms, Inc. - If the user selects āCode Llamaā, it sets
llm_model
to “codellama” and displays information about the Code Llama model for generating and discussing code.
with st.sidebar:
st.title('š¬ Ollama Chatbot')
st.divider()
# Select the model
selected_model = st.selectbox('Choose a model', ['Mistral', 'Solar', 'Code Llama'], key='selected_model')
if selected_model == "Mistral":
llm_model = "mistral"
st.caption("""
The Mistral 7B model released by Mistral AI.
Mistral 7B model is an Apache licensed 7.3B parameter model.
It is available in both instruct (instruction following) and text completion.
""")
elif selected_model == "Solar":
llm_model = "Solar"
st.caption("""
SOLAR 10.7B is released by Upstage.
Solar a 10B an open LLM outperforming other LLMs up to 30B parameters,
including Mistral 7B. š¤Æ Solar achieves an MMLU score of 65.48,
which is only 4 points lower than Meta Llama 2 while being 7x smaller.
""")
else:
llm_model = "codellama"
st.caption("""
Code Llama is a model for generating and discussing code, built on top of Llama 2.
Itās designed to make workflows faster and efficient for developers and make it easier for people to learn how to code.
It can generate both code and natural language about code.
Code Llama supports many of the most popular programming languages used today, including Python, C++, Java, PHP, Typescript (Javascript), C#, Bash and more.
""")
st.divider()
Define a text area widget named āSystem Promptā where users can input text. The initial value of the text area is set to āYou are a helpful assistant who answers questions in short sentences.ā Users can modify this text as needed.
system_prompt = st.text_area(
label="System Prompt",
value="You are a helpful assistant who answers questions in short sentences."
)
creates a toggle button labeled āActivate GPUā, allowing users to activate or deactivate GPU usage. When the toggle is switched on (gpu_on is True), the variable activate_gpu is set to 1, indicating GPU activation.
If the toggle is switched off (gpu_on is False), activate_gpu is set to 0, indicating GPU deactivation. This functionality allows users to control GPU usage within the application.
gpu_on = st.toggle('Activate GPU')
# Activate the GPU
if gpu_on:
activate_gpu = 1
else:
activate_gpu = 0
we use StreamlitChatMessageHistor to store messages in the Streamlit session state at the specified key=
. The default key is "langchain_messages"
.
- Note,
StreamlitChatMessageHistory
only works when run in a Streamlit app
If the chat message history is currently empty (no messages have been exchanged), it adds an AI message āHow may I assist you today?ā
# Set up memory
msgs = StreamlitChatMessageHistory(key="langchain_messages")
if len(msgs.messages) == 0:
msgs.add_ai_message("How may I assist you today?")
create āresponse()ā function to generate a response using a model called āllm_modelā It used ollama. generate() to generate the response based on a given prompt.
def response():
response = ollama.generate(model=llm_model,prompt=prompt)
return response['response']
Display message stored in the message object. If a message is from a human, it shows the message with a computer avatar (š§āš»). If itās not from a human, it displays the message with a speech bubble avatar (š¬). Finally, it writes the content of each message.
for msg in msgs.messages:
if msg.type == "human":
with st.chat_message(msg.type, avatar="š§āš»"):
st.write(msg.content)
else:
with st.chat_message(msg.type, avatar="š¬"):
st.write(msg.content)
Design the function to clear the chat history. Itās linked to a button titled āClear Chat Historyā placed in the sidebar. When the button is clicked, it triggers the clear_chat_history()
function to remove all messages from the chat history stored in the msgs
object. The button is styled to fit within the width of its container in the sidebar.
def clear_chat_history():
msgs.clear()
st.sidebar.button('Clear Chat History',
on_click=clear_chat_history,
use_container_width=True)
Create input where users can type messages (chat_input). If the user inputs a message and hits enter, it captures the input (prompt). Then, it displays the captured message
if prompt := st.chat_input():
with st.chat_message("human", avatar="š§āš»"):
st.write(prompt)
we going to create a code to display an AI-generated response within a chat interface. It uses a speech bubble avatar (š¬) to represent the assistantās message. While processing the response, it shows a spinning indicator with the text āThinkingā¦ā. The response is gradually built character by character and displayed with a vertical line (ā) to simulate typing.
with st.chat_message("assistant", avatar="š¬"):
with st.spinner("Thinking..."):
response = response()
placeholder = st.empty()
full_response = ''
for item in response:
full_response += item
placeholder.markdown(full_response + "ā")
time.sleep(0.05)
placeholder.markdown(full_response)
Multi-model Chatbot
we import Streamlit, ollama, and Image for manipulation and BytesIO
for handling image data in memory.
import streamlit as st
import ollama
from PIL import Image
from io import BytesIO
Define a list of names available_models
containing different model names. Each item in the list represents a specific model, such as ādolphinā, āopenhermesā, and āllavaā.
available_models = [
'dolphin-mistral:7b-v2.6-fp16',
'openhermes:7b-mistral-v2.5-q6_K',
'llava',
]
Create a sidebar select box titled āSelect a model:ā, where users can choose from the available models listed in the available_models
list.
Create a text area labeled āAsk a question:ā, allowing users to input text for asking questions. The input text is stored in the variable user_input
.
model_choice = st.sidebar.selectbox('Select a model:', available_models)
user_input = st.text_area('Ask a question:', '', height=150)
Uses HTML and CSS within a Markdown block to define a style called ādashed-boxā. The style creates a dashed border with an orange color (#FFA500), rounded corners (border-radius), and padding around the content. Additionally, it sets margins above and below the box. This style can be applied to elements in the Streamlit app to create visually distinctive dashed borders around them. The unsafe_allow_html=True
parameter allows Streamlit to render the HTML code safely within the Markdown block.
st.markdown("""
<style>
.dashed-box {
border: 2px dashed #FFA500;
border-radius: 10px;
padding: 10px;
margin: 10px 0;
}
</style>
""", unsafe_allow_html=True)
We create a file uploader to ask a user to upload an image file by displaying the text āUpload an imageā. Users can only upload files with extensions ājpgā, āpngā, or ājpegā due to the specified file types (type=[ājpgā,ā pngā,ā jpegā]).
uploaded_file = st.file_uploader("Upload an image", type=['jpg','png','jpeg'])
We check if an image file has been uploaded (uploaded_file
). If a file has been uploaded, it opens the image using the Python Imaging Library (PIL
) and assigns it to the variable img
. If the image has an RGBA mode (alpha channel for transparency), it converts it to RGB
if uploaded_file:
img = Image.open(uploaded_file)
# Convert RGBA image to RGB
if img.mode == 'RGBA':
img = img.convert('RGB')
st.image(img, caption='Uploaded image', use_column_width=True)
Then we create an if statement to check if the āSubmitā button has been clicked in a Streamlit application. If clicked, it initializes a list named messages
, containing a dictionary representing user input.
Once the image is converted into bytes, itās attached to the userās message. The images
key within the first message dictionary (messages[0]
) is assigned a list containing the image bytes ([img_bytes]
). This links the image data to the user’s input message.
if st.button('Submit'):
messages = [{'role': 'user', 'content': user_input}]
if uploaded_file:
with BytesIO() as buffer:
img.save(buffer, 'jpeg')
img_bytes = buffer.getvalue()
messages[0]['images'] = [img_bytes]
we use ollama to interact with ollama models using the Ollama Python library, It sends the prepared messages list (messages
) and the chosen model (model_choice
) to the chat function. The stream=True
parameter indicates that the response should be streamed, meaning it will be received in chunks rather than all at once.
response = ollama.chat(
model=model_choice,
messages=messages,
stream=True
)
final_response = ''
for chunk in response:
if 'content' in chunk.get('message', {}):
final_response += chunk['message']['content']
Then we wrap the final response within the HTML div element, creating a visually distinct box with dashed borders and padding around the text.
st.markdown(f'<div class="dashed-box">{final_response}</div>', unsafe_allow_html=True)
ConclusionĀ :
ollama libraries is the go-to option if you require an easy-to-use tool for running LLMs with efficiency and precision and represents significant advancements in the open-source AI community and offers robust solutions for different user requirements.
Reference :