I wanted to see how a fun ChatGPT experiment could be transformed into a business-ready feature. Naturally, I started with a topic of critical international importance: cat facts.
First, I uploaded a PDF of cat facts to ChatGPT. Then, I asked it for interesting tidbits about felines, some of which are fascinating.
But then I started wondering: What would be required to turn a fun, engaging ChatGPT interaction into a fully functional, customized AI feature?
In this guide, I’ll take you through the process of transforming that simple interaction with ChatGPT into a robust LLM bot with custom content. While it’s not quite enterprise-ready, it lays the foundation for scaling and deeper integration—covering everything from sandboxing with Python to creating your very own “Cat Bot”.
Engineering note: This guide is written for macOS and assumes you have an IDE like Visual Studio Code or PyCharm (direct link to free community edition) already installed, as well as an OpenAI development account.
I used Python for this project and there are a few hoops to jump through to get it working properly on your machine before going any further. If you already have a working python environment practice, skip this part.
The object is to use Homebrew, a macOS package management tool, to install PyEnv, a tool that makes it easier to install and maintain different Python versions, and then use Python's virtual environment feature to create an isolated test space exclusively for this project. In the end, you should have a working Python setup and not a superfund site.
1. Install the macOS command line tools. These are needed for Homebrew to work in Terminal run.
xcode-select --install
2. Install Homebrew, either with the one line script pasted into a terminal below or by downloading it from Github:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
3. Install pyenv:
brew install pyenv
4. Add pyenv to your terminal so it can work its magic. Running pyenv init will give you the terminal-specific instructions for your machine:
> pyenv init
# Load pyenv automatically by appending
# the following to
# ~/.zprofile (for login shells)
# and ~/.zshrc (for interactive shells) :
export PYENV_ROOT="$HOME/.pyenv"
[[ -d $PYENV_ROOT/bin ]] && export PATH="$PYENV_ROOT/bin:$PATH"
eval "$(pyenv init -)"
# Restart your shell for the changes to take effect.
In the above case (and is default for most macOS users), this can be done by pasting the below into your terminal and hitting “enter” or “return”:
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.zshrc
echo '[[ -d $PYENV_ROOT/bin ]] && export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.zshrc
echo 'eval "$(pyenv init -)"' >> ~/.zshrc
Close the terminal window and then open a new one.
5. Run the command pyenv --help
to verify it is installed properly. Then, install Python and set it as the default version for your system. Running the commands pyenv install 3.11
, pyenv global 3.11
, and python --version
will produce output like this:
> pyenv install 3.11
python-build: use openssl from homebrew
python-build: use readline from homebrew
Downloading Python-3.11.9.tar.xz...
-> https://www.python.org/ftp/python/3.11.9/Python-3.11.9.tar.xz
Installing Python-3.11.9...
python-build: use tcl-tk from homebrew
python-build: use readline from homebrew
python-build: use ncurses from homebrew
python-build: use zlib from xcode sdk
Installed Python-3.11.9 to /Users/username/.pyenv/versions/3.11.9
> pyenv global 3.11
> python --version
Python 3.11.9
6.
Next, follow these steps:
mkdir sandgarden-guide-01
cd sandgarden-guide-01
pyenv local 3.11
python -m venv .venv
echo ".venv*" >> .gitignore
source .venv/bin/activate
python -m pip install --upgrade pip
The important step here is the source .venv/bin/activate line, which makes sure that all commands to Python and any libraries, code, and work you perform will only modify the virtual environment created for this folder.
Now that you've completed these steps, you have a base Python environment to begin your experimentation with LLMs. Move on to the next section to begin working with OpenAI in the environment.
1. Request an OpenAI key from your account page here and save it somewhere securely, such as 1Password.
2. Launch VS Code or PyCharm, and select the folder you created for the project. It should open to a relatively blank window.
3. Open a command terminal in either application so you can have a single window to view the code you write and the output from running it. For VS Code press control-` and option-F12 for PyCharm.
4. For PyCharm you should see something that looks like this:
5. In the terminal, set up your environment with your OPENAI_API_KEY
. The easiest way to do that is by exporting it as an environment variable with the command:
export OPENAI_API_KEY="$api_key_here"
6. Install the OpenAI Python library (make sure to activate the environment first):
source .venv/bin/activate
pip install openai
7. Create a file named main.py
to work with and add the basic call. This is our first “Hello World” that will verify that we have both Python and OpenAI configured correctly. Paste this into the file:
1from openai import OpenAI
2client = OpenAI()
3
4completion = client.chat.completions.create(
5 model="gpt-4o-mini",
6 messages=[
7 {"role": "system", "content": "You are a helpful assistant."},
8 {
9 "role": "user",
10 "content": "Write a haiku about recursion in programming."
11 }
12 ]
13)
14
15print(completion.choices[0].message.content)
16
And execute it with the command python main.py in the terminal. You should see a result like this:
> python main.py
Function calls itself,
A loop of thought unfolding,
Endless depths emerge.
You can run it multiple times and get a different haiku each time. Pretty neat.
What seems like an easy task on the surface starts becoming a multistep process on the backend. While you can just drag a file to the web UI and ask ChatGPT to answer questions about it, that's not what you want to build for your business. Instead, in ChatGPT terms, you’ll want to create an assistant (an agent) and provide it with a set of files to reference. Those files are stored in something called a vectorDB, a referenceable set of information that the Assistant will query for relevant information before trying to answer the question.
For instance, in the cat facts example, asking about a siamese cat would first search the vectorDB for relevant chunks of data that contain information on siamese cats, and then submit that along with the original request to the LLM. This increases the likelihood that a response will use factual information that you know the source of (because you can track what was retrieved from vectorDB, which is included in the result) and make it less likely for the response to be made up from data somewhere else in the LLM (what some people call a hallucination and other people call bullshit).
There are many different ways to create the flow of data from user input→agent→vectorDB response. For this exercise we're going to tackle this in two segments:
1. Create a new file called create_assistant.py. This will handle the API calls around defining the vectorDB, uploading files to it, parsing them, creating the assistant, and binding it to the vectorDB. You don't want to upload sensitive data to these outside servers for this experiment, so the dog-facts.txt is included as an example dataset to use.
touch create_assistant.py
2. The below code creates a vectorDB of the name cat_facts. It first uses a function to ensure that there isn't already a DB of that name present. If there is, it will just use that and skip the creation step. This allows you to keep running this code over and over again without creating hundreds of useless DB's while testing it.
1from pathlib import Path
2from openai import OpenAI
3client = OpenAI()
4
5# Check if VectorDB of that name already exists
6
7def db_exists(name, client):
8 db_list = client.beta.vector_stores.list()
9 has_db = [x.id for x in db_list.data if x.name is not None and name in x.name]
10 return has_db
11
12# Create VectorDB or return the ID if one exists already
13# Either case you get the ID of DB for the next step
14def create_db(name, client):
15 db_id = None
16
17 db_list = db_exists(name, client)
18 if db_list:
19 db_id = db_list[0]
20 print(f'DB of name "{name}" exists with id of "{db_id}"')
21 else:
22 resp = (client.beta.vector_stores.create(name=name))
23 db_id = resp.id
24 print(f'DB created, named: "{name}" with id: "{db_id}"')
25 return db_id
26
27# Call the create_db() function that returns the db_id
28db_id = create_db('cat_facts', client) # vs_KdpZjj4EUDhNHN8bzkNuOMg9
29
30# Output DB ID for next steps
31print(db_id)
3. Run python create_assistant.py
and it should output something like what is below. This verifies that you've gotten the code working properly and your OpenAI token is still valid.
> python create_assistant.py
DB created, named: "cat_facts" with id: "vs_YjFJvRKemJWYzV3jyKPJ40AS"
vs_YjFJvRKemJWYzV3jyKPJ40AS
4. There are multiple ways to get data into the vectorDB. In this instance, you will just upload a single file to OpenAI's servers for indexing, but integrations with other platforms and tools could allow you to reclusively load files from an S3 bucket or other systems. The vectorDB doesn't have to be hosted alongside the LLM you are working with—you can deploy and run your own local vectorDB if you have sensitive information you do not want to leave your corporate network. That is outside the scope of this guide but may be covered in the future.
Just like the DB step above, this is the new code showing the DB code from before, but now also checking if the file is uploaded, uploading it, and then ensuring that vectorDB indexes it appropriately.
1from pathlib import Path
2from openai import OpenAI
3client = OpenAI()
4
5# Check if VectorDB of that name already exists
6def db_exists(name, client):
7 db_list = client.beta.vector_stores.list()
8 has_db = [x.id for x in db_list.data if x.name is not None and name in x.name]
9 return has_db
10
11# Create VectorDB or return the ID if one exists already
12# Either case you get the ID of DB for the next step
13def create_db(name, client):
14 db_id = None
15
16 db_list = db_exists(name, client)
17 if db_list:
18 db_id = db_list[0]
19 print(f'DB of name "{name}" exists with id of "{db_id}"')
20 else:
21 resp = (client.beta.vector_stores.create(name=name))
22 db_id = resp.id
23 print(f'DB created, named: "{name}" with id: "{db_id}"')
24
25 return db_id
26
27# Check if file has been uploaded and attached to DB
28def file_exists(name, db, client):
29 file_list = client.files.list()
30 has_file = [x.id for x in file_list.data if x.filename is not None and name in x.filename]
31 if has_file:
32 attached_files = client.beta.vector_stores.files.list(db)
33 attached_id = [x.id for x in attached_files.data if has_file[0] in x.id]
34 else:
35 attached_id = []
36
37 return attached_id
38
39# Upload file if it isn't present and ensure the DB
40# knows about it
41def upload_file(filepath, db_id, client):
42 file_id = None
43
44 file = Path(filepath)
45 name = ''
46 if file.is_file():
47 name = file.name
48 else:
49 print(f'File: {filepath} does not exist')
50 exit(1)
51
52 file_list = file_exists(name, db_id, client)
53
54 if file_list:
55 file_id = file_list[0]
56 print(f'File of name "{name}" exists with id of "{file_id}"')
57 else:
58 print(f'Creating: "{name}"')
59 file_upload = client.files.create(file=file,purpose='assistants')
60 file_id = file_upload.id
61 print(f'File of name "{name}" uploaded with id of "{file_id}", attaching')
62 file_id = client.beta.vector_stores.files.create_and_poll(file_id, vector_store_id=db_id).id
63
64 return file_id
65
66# Call the create_db() function that returns the db_id
67db_id = create_db('cat_facts', client)
68
69# Output DB ID for next steps
70print(db_id)
71
72# Call the upload_file() and get the file_id
73file_id = upload_file('cat-facts-2.pdf', db_id, client)
74
75# Output File ID for reference
76print(file_id)
77
5. Running python create_assistant.py
again will return something like this below. Notice how it also shows that the script checks that the DB exists before going on to try to associate the uploaded file with it:
> python create_assistant.py
DB of name "cat_facts" exists with id of "vs_YjFJvRKemJWYzV3jyKPJ40AS"
vs_YjFJvRKemJWYzV3jyKPJ40AS
Creating: "cat-facts-2.pdf"
File of name "cat-facts-2.pdf" uploaded with id of "file-LGyE70KxkwWIgTL783zh8xgL", attaching
file-LGyE70KxkwWIgTL783zh8xgL
6. Now, the final task for this part of the guide! You will see the complete code, and running it will create an assistant called Cat Bot. This will show up in the OpenAI UI along with any other assistant you created after this step is complete.
1from pathlib import Path
2from openai import OpenAI
3client = OpenAI()
4
5# Check if VectorDB of that name already exists
6def db_exists(name, client):
7 db_list = client.beta.vector_stores.list()
8 has_db = [x.id for x in db_list.data if x.name is not None and name in x.name]
9 return has_db
10
11# Create VectorDB or return the ID if one exists already
12# Either case you get the ID of DB for the next step
13def create_db(name, client):
14 db_id = None
15
16 db_list = db_exists(name, client)
17 if db_list:
18 db_id = db_list[0]
19 print(f'DB of name "{name}" exists with id of "{db_id}"')
20 else:
21 resp = (client.beta.vector_stores.create(name=name))
22 db_id = resp.id
23 print(f'DB created, named: "{name}" with id: "{db_id}"')
24
25 return db_id
26
27# Check if file has been uploaded and attached to DB
28def file_exists(name, db, client):
29 file_list = client.files.list()
30 has_file = [x.id for x in file_list.data if x.filename is not None and name in x.filename]
31 if has_file:
32 attached_files = client.beta.vector_stores.files.list(db)
33 attached_id = [x.id for x in attached_files.data if has_file[0] in x.id]
34 else:
35 attached_id = []
36
37 return attached_id
38
39# Upload file if it isn't present and ensure the DB
40# knows about it
41def upload_file(filepath, db_id, client):
42 file_id = None
43
44 file = Path(filepath)
45 name = ''
46 if file.is_file():
47 name = file.name
48 else:
49 print(f'File: {filepath} does not exist')
50 exit(1)
51
52 file_list = file_exists(name, db_id, client)
53
54 if file_list:
55 file_id = file_list[0]
56 print(f'File of name "{name}" exists with id of "{file_id}"')
57 else:
58 print(f'Creating: "{name}"')
59 file_upload = client.files.create(file=file,purpose='assistants')
60 file_id = file_upload.id
61 print(f'File of name "{name}" uploaded with id of "{file_id}", attaching')
62 file_id = client.beta.vector_stores.files.create_and_poll(file_id, vector_store_id=db_id).id
63
64 return file_id
65
66# Check if assistant has been created already and has vectorDB configured
67def assistant_exists(name,db_id,client):
68 assistant_id = None
69
70 assistants = client.beta.assistants.list()
71 assistant = [x for x in assistants.data if x.name is not None and name in x.name]
72
73 if assistant:
74 vector_store_ids = assistant[0].tool_resources.file_search.vector_store_ids
75 if db_id in vector_store_ids:
76 print('assistant exists and DB is attached')
77 else:
78 print('assistant exists and DB needs to be attached')
79 client.beta.assistants.update(
80 assistant_id=assistant[0].id,
81 tool_resources={"file_search": {"vector_store_ids": [db_id]}},
82 )
83 assistant_id = assistant[0].id
84
85 return assistant_id
86
87# Create a new assistant if it doesn't exist, make sure it uses
88# the vectorDB for any knowledge base lookups
89def create_assistant(name,db_id,client):
90 assistant_id = assistant_exists(name,db_id,client)
91
92 if assistant_id:
93 print(f'assistant: {name} exists')
94 else:
95 print(f'Creating assistant: {name}')
96 assistant = client.beta.assistants.create(
97 name=name,
98 instructions="You are a cat expert. Use your knowledge base to answer questions about cats",
99 model="gpt-4o",
100 tools=[{"type": "file_search"}],
101 tool_resources={"file_search": {"vector_store_ids": [db_id]}},
102 )
103
104 assistant_id = assistant.id
105
106 return assistant_id
107
108
109# Call the create_db() function that returns the db_id
110db_id = create_db('cat_facts', client)
111
112# Output DB ID for next steps
113print(db_id)
114
115# Call the upload_file() and get the file_id
116file_id = upload_file('cat-facts-2.pdf', db_id, client)
117
118# Output File ID for reference
119print(file_id)
120
121# Create the assistant and get the ID of it
122assistant_id = create_assistant('Cat Bot', db_id, client)
123
124# Print the assistant ID, this is used in the next script
125print(assistant_id)
7. Running it this time will produce output like this below. Going into the OpenAI Assistant UI should also show it.
> python create_assistant.py
DB of name "cat_facts" exists with id of "vs_YjFJvRKemJWYzV3jyKPJ40AS"
vs_YjFJvRKemJWYzV3jyKPJ40AS
File of name "cat-facts-2.pdf" exists with id of "file-LGyE70KxkwWIgTL783zh8xgL"
file-LGyE70KxkwWIgTL783zh8xgL
Creating assistant: Cat Bot
asst_AhwNp62Vqtu0u6TBqzoYnT2E
8. With this step completed, you've run through all the processes that go into setting yourself up for a Retrieval-Augmented Generation (RAG)-based Assistant. When it’s asked questions about cats, the assistant will be able to use a new source of information that is not in the LLM to determine the right answer.
Now that the Assistant has been completed and set up with a vectorDB that has a document (or more if you decided to upload more to it), you can begin to ask questions. Doing so via the API is a matter for another day, but you can see that you've created the Assistant on the OpenAI page now. This is a good point to explore asking that questions and performing queries that way.
Also, because we've loaded the assistant with the relevant information, you don't need to run the above create_assistant.py file again. Only if you need to add more information or want to modify other parameters will you need to revise that agent/assistant creation.
Now that the assistant has been created on OpenAI’s servers, we need a way to interact with it via the API. To keep track of the assistant and its state, you can export the assistant ID by using the following command:
export ASSISTANT_ID='asst_AhwNp62Vqtu0u6TBqzoYnT2E'
This will be useful for both testing and the web UI created later.
1. Creating a Thread for Conversations
The following script helps you interact with the assistant. It creates a Thread (OpenAI’s way of keeping a conversation history between API calls) and submits any prompt as a question.
python test_assistant.py "tell me about cats"
This sends the prompt “tell me about cats” to the assistant, waits for a response, and prints the result.
2. Managing Threads and Sessions
The thread ID can be reused for subsequent prompts by setting it with the command:
export THREAD_ID=$thread_id
This prevents creating a new thread for every interaction. If no thread ID is provided, the script will create a new one automatically.
3. Using Asynchronous Python for Streaming Responses
The example script uses async Python to handle the lookup in the background and stream responses. This is not essential for a simple test, but it becomes valuable when creating a web UI, allowing the response to be streamed in real-time, just like ChatGPT does. Here’s an outline of how the asynchronous streaming process works in the background:
By using async functions, the process is made more efficient, especially when integrated into a web UI where responses are needed incrementally. Here’s the full code that makes it happen:
1import os
2import argparse
3from typing_extensions import override
4from openai import AssistantEventHandler, OpenAI
5
6class EventHandler(AssistantEventHandler):
7 @override
8 def on_text_created(self, text) -> None:
9 print(f"\nassistant > ", end="", flush=True)
10
11 @override
12 def on_tool_call_created(self, tool_call):
13 print(f"\nassistant > {tool_call.type}\n", flush=True)
14
15 @override
16 def on_message_done(self, message) -> None:
17 # print a citation to the file searched
18 message_content = message.content[0].text
19 annotations = message_content.annotations
20 citations = []
21 for index, annotation in enumerate(annotations):
22 message_content.value = message_content.value.replace(
23 annotation.text, f"[{index}]"
24 )
25 if file_citation := getattr(annotation, "file_citation", None):
26 cited_file = client.files.retrieve(file_citation.file_id)
27 citations.append(f"[{index}] {cited_file.filename}")
28
29 print(message_content.value)
30 print("\n".join(citations))
31
32ASSISTANT_ID = os.environ['ASSISTANT_ID']
33
34if 'THREAD_ID' in os.environ:
35 THREAD_ID = os.environ['THREAD_ID']
36else:
37 THREAD_ID = ''
38
39client = OpenAI()
40
41if not len(THREAD_ID):
42 thread = client.beta.threads.create()
43 THREAD_ID = thread.id
44
45print(f'Thread id is {THREAD_ID}')
46
47parser = argparse.ArgumentParser()
48
49parser.add_argument("question", nargs='*', help="Question to ask LLM")
50
51args = parser.parse_args()
52
53sentence = ' '.join(args.question)
54
55print(sentence)
56
57message = client.beta.threads.messages.create(
58 thread_id=THREAD_ID,
59 role="user",
60 content=sentence
61)
62
63with client.beta.threads.runs.stream(
64 thread_id=THREAD_ID,
65 assistant_id=ASSISTANT_ID,
66 event_handler=EventHandler(),
67) as stream:
68 stream.until_done()
69
The full experience looks like this:
> python test_assistant.py "tell me 2 facts about cats"
Thread id is thread_FmQ3DQsR1vRcLgLvwgJEPTUP
tell me 2 facts about cats
assistant > file_search
assistant >
Here are two more fascinating facts about cats:
1. **Ear Muscles and Movement**: Cats have about thirty-two muscles in each of their ears. This allows them to control the direction of their ears efficiently, enabling them to rotate their ears about 180 degrees and face them in any direction to detect sounds[0].
2. **Unique Identification**: Just like human fingerprints, cats have unique nose prints. The pattern of ridges and bumps on a cat's nose is different for every cat, making it a unique identifier[1].
[0] cat-facts.pdf
[1] cat-facts.pdf
4. Real-Time Response Handling
Once the assistant receives a prompt, the event handler processes the response and updates the user in real-time. The assistant can handle:
This asynchronous process enables seamless interaction through the web UI, ensuring that each request to the assistant builds upon the previous conversation, as long as the thread is maintained.
Now that we’ve got an assistant working with a thread to keep continuity of conversation (or memory), we need to create a web interface. I found an example showing a FastUI chatbot with a different AI service. Since I'd worked with FastAPI (which FastUI sits on top of), both of which use pedantic, a library already used by the OpenAI SDK, I thought it would be a good starting point.
The most complicated part of this was getting used to async functions in python; my previous usage had typically consisted of quick scripts and one-shots that I’d either batch or use a different process to handle any async requests outside of python itself. The bright side is that I got a web frontend out of it (that doesn't look half bad) thanks to the FastUI library.
First step is to add the needed libraries (uvicorn is new, but is the library that actually runs the code as a webservice for me):
pip install fastapi uvicorn fastui
Create a new file: web.py
. The code below is similar to the example code, but adds app.thread_id
as a variable to keep track of the Thread as the web interface is reloaded. This does not isolate the Thread for different users. A full production implementation would add session handling so each user gets a different Thread:
1import asyncio
2import os
3from typing import AsyncIterable
4from fastapi import FastAPI
5from fastapi.responses import HTMLResponse
6from fastui import prebuilt_html, FastUI, AnyComponent
7from fastui import components as c
8from fastui.components.display import DisplayLookup, DisplayMode
9from fastui.events import PageEvent, GoToEvent
10from pydantic import BaseModel, Field
11from starlette.responses import StreamingResponse
12from openai import AsyncOpenAI
13from openai import OpenAI
14
15# Create the app object
16app = FastAPI()
17
18client = OpenAI()
19async_client = AsyncOpenAI()
20
21# Message history
22app.message_history = []
23# Keeps the process using one thread, however also means each webuser is accessing the same thread at the moment
24app.thread_id = None
25
26# Message history model
27class MessageHistoryModel(BaseModel):
28 message: str = Field(title='Message')
29# Chat form - this is what creates the ?chat=<prompt> get response which triggers the streaming call to begin
30# it's useful that the chat itself is a standard python model, since this could be reused or extended as needed
31class ChatForm(BaseModel):
32 chat: str = Field(title=' ', max_length=1000)
33
34# Root endpoint, unchanged from example
35@app.get('/api/', response_model=FastUI, response_model_exclude_none=True)
36def api_index(chat: str | None = None, reset: bool = False) -> list[AnyComponent]:
37 if reset:
38 app.message_history = []
39 return [
40 c.PageTitle(text='FastUI Chatbot'),
41 c.Page(
42 components=[
43 # Header
44 c.Heading(text='FastUI Chatbot'),
45 c.Paragraph(text='This is a simple chatbot built with FastUI.'),
46 # Chat history
47 c.Table(
48 data=app.message_history,
49 data_model=MessageHistoryModel,
50 columns=[DisplayLookup(field='message', mode=DisplayMode.markdown, table_width_percent=100)],
51 no_data_message='No messages yet.',
52 ),
53 # Chat form
54 c.ModelForm(model=ChatForm, submit_url=".", method='GOTO'),
55 # Reset chat
56 c.Link(
57 components=[c.Text(text='Reset Chat')],
58 on_click=GoToEvent(url='/?reset=true'),
59 ),
60 # Chatbot response
61 c.Div(
62 components=[
63 c.ServerLoad(
64 path=f"/sse/{chat}",
65 sse=True,
66 load_trigger=PageEvent(name='load'),
67 components=[],
68 )
69 ],
70 class_name='my-2 p-2 border rounded'),
71 ],
72 ),
73 # Footer
74 c.Footer(
75 extra_text='Made with FastUI',
76 links=[]
77 )
78 ]
79
The next block of code defines how we want to handle streaming the responses back from the OpenAI endpoint. It ensures the session in the web browser continues to refresh and get new information as the code gets the prompt response from OpenAI. It keeps the web browser from hanging up, thinking the conversation has completed. The big change here is adding the app.thread_id component, which creates a new thread in OpenAI when the first prompt is sent, and then reuses it for all future prompts. It won't save the threads between session executions at the moment. Good enough for an experiment, but again this isn't an exploration in web service building.
1# SSE endpoint
2@app.get('/api/sse/{prompt}')
3async def sse_ai_response(prompt: str) -> StreamingResponse:
4 # adds a check to keep the thread consistent or will create a new one if needed
5 if app.thread_id is None:
6 thread = client.beta.threads.create()
7 app.thread_id = thread.id
8 print(f'created new thread: {thread.id}')
9
10 # if a user hasn't entered a prompt, just return this, don't submit a query
11 if prompt is None or prompt == '' or prompt == 'None':
12 return StreamingResponse(empty_response(), media_type='text/event-stream')
13 else:
14 # Sets up the prompt whose response we retrieve later in `ai_response_generator`
15 client.beta.threads.messages.create(
16 thread_id=app.thread_id,
17 role="user",
18 content=prompt
19 )
20 return StreamingResponse(ai_response_generator(prompt), media_type='text/event-stream')
21
22# Empty response generator
23async def empty_response() -> AsyncIterable[str]:
24 # Send the message
25 m = FastUI(root=[c.Markdown(text='')])
26 msg = f'data: {m.model_dump_json(by_alias=True, exclude_none=True)}\n\n'
27 yield msg
28 # Avoid the browser reconnecting
29 while True:
30 yield msg
31 await asyncio.sleep(10)
32
The real challenge was converting the example to use a streaming response from an OpenAI Assistant with a Thread instead of storing the prompt history and submitting it with each new prompt. It does keep the message history from the example, but unlike the example, it won't submit the history with each prompt. Thanks to the OpenAI thread that already exists, the assistant already has access to previous messages and responses. Instead the message history is just used to fill out the conversation log in the web UI. If we don't have the OPENAI_API_KEY
and ASSISTANT_ID
key, this will error out.
The previous sse_ai_response
function set up the Thread—sending the prompt to OpenAI servers, staging it with the assistant and the thread we've created. Because the Thread and Prompt are staged but not triggered to run yet, in the future it would be possible to add a file upload screen allowing a user to upload a file as they submit a prompt, but delay executing the prompt until the file has been added to the Assistants context. A useful refinement for the user experience, but not something I did in my code, since the earlier create_assistant.py
already uploaded the file I wanted to use.
After kicking off the thread run (with async_client.beta.threads.runs.stream
), the code will parse each response as it gets it back from the run, updating the UI with each payload. The below snippet is the most important part really—this triggers the update of the message in the UI as the run data is received.
1async with stream as stream:
2 async for text in stream.text_deltas:
3 output += text
4 m = FastUI(root=[c.Markdown(text=output)])
5 msg = f'data: {m.model_dump_json(by_alias=True, exclude_none=True)}\n\n'
6 yield msg
7
This is the full code:
1# response generator
2async def ai_response_generator(prompt: str) -> AsyncIterable[str]:
3
4 # reuses the existing Assistant ID, this will break if not present
5 if 'ASSISTANT_ID' in os.environ:
6 assistant_id = os.environ['ASSISTANT_ID']
7 else:
8 assistant_id = ''
9
10
11 output = f"**User:** {prompt}\n\n"
12 msg = ''
13
14 # Prompt template for message history
15 # this uses the class we created above, saves the history for web UI view
16 # original example also used this to submit with each request to keep memory
17 # since we have threads, we don't need to use it
18 prompt_template = "Previous messages:\n"
19 for message_history in app.message_history:
20 prompt_template += message_history.message + "\n"
21 prompt_template += f"Human: {prompt}"
22
23 # Create the background process with openAI to run the
24 stream = async_client.beta.threads.runs.stream(
25 assistant_id=assistant_id,
26 thread_id=app.thread_id
27 )
28
29
30 # Stream the chat
31 output += f"**Chatbot:** "
32
33 async with stream as stream:
34 async for text in stream.text_deltas:
35 output += text
36 m = FastUI(root=[c.Markdown(text=output)])
37 msg = f'data: {m.model_dump_json(by_alias=True, exclude_none=True)}\n\n'
38 yield msg
39
40 message = MessageHistoryModel(message=output)
41 app.message_history.append(message)
42
43 while True:
44 yield msg
45 await asyncio.sleep(10)
46
47# Prebuilt HTML
48@app.get('/{path:path}')
49async def html_landing() -> HTMLResponse:
50 """Simple HTML page which serves the React app, comes last as it matches all paths."""
51 return HTMLResponse(prebuilt_html(title='FastUI Demo'))
52
With all of this saved in web.py
, the web service itself can finally be started with the command uvicorn web:app
. By default it launches a web service listening on localhost:8000 on the workstation running the code. This now provides the ability to submit questions and get facts about cats with citations included upon requests, and it will remember previous questions.
On the command line a session shows up as the following, including a printout of the thread it create at the start of a conversation and the submission of the prompt (which in future iterations could be a data post instead of URI):
> uvicorn web:app --reload
INFO: Will watch for changes in these directories: ['/src/guide-content/01-rag']
INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO: Started reloader process [85000] using StatReload
INFO: Started server process [85002]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: 127.0.0.1:64236 - "GET / HTTP/1.1" 200 OK
INFO: 127.0.0.1:64236 - "GET /api/ HTTP/1.1" 200 OK
created new thread: thread_PxW8DbybHmy2e0oxkz0ja2T0
INFO: 127.0.0.1:64237 - "GET /api/sse/None HTTP/1.1" 200 OK
INFO: 127.0.0.1:64238 - "GET /api/?chat=Give+me+a+fact+about+cats+with+a+citation HTTP/1.1" 200 OK
INFO: 127.0.0.1:64238 - "GET /api/sse/Give%20me%20a%20fact%20about%20cats%20with%20a%20citation HTTP/1.1" 200 OK
^CINFO: Shutting down
INFO: Waiting for application shutdown.
INFO: Application shutdown complete.
INFO: Finished server process [85002]
INFO: Stopping reloader process [85000]
With this as the rendered response in the UI:
Not bad for a few lines of Python, eh? We get a full custom ChatGPT interface that can keep one informed with the latest cat facts, knowing that they probably aren't made up on the spot.
Building an AI-driven solution doesn’t have to be a long, complex coding journey—Sandgarden helps make it as seamless and intuitive as possible. We handle the heavy lifting so you can focus on experimenting, scaling, and eventually turning your ideas into business-ready features.
Sandgarden offers the flexibility and speed to grow your AI solution as you refine and develop it further. Whether you’re starting with small experiments or looking for a more scalable solution, we’ve got the tools to help you avoid “integration hell” and provide everything from data management to custom AI integrations.
Ready to move beyond fun experiments and start shaping your AI into a functional tool? With Sandgarden, you won’t have to wrestle with endless lines of code—we’ve already built the foundation for you.
Not ready to build your own AI-powered tool? Let Sandgarden help you get started—faster than you can say “purr.”