Fix flake8 F824 in api.py; add FastAPI backend, Streamlit UI, tests, and CI#3
Fix flake8 F824 in api.py; add FastAPI backend, Streamlit UI, tests, and CI#3joshuvavinith merged 5 commits intomainfrom
Conversation
… api.py Co-authored-by: joshuvavinith <146979257+joshuvavinith@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR expands the project from a Tkinter-only chatbot into a multi-surface app (FastAPI backend + Streamlit UI) while addressing a flake8 F824 CI failure and adding automated tests + CI.
Changes:
- Introduces a FastAPI service with session-based chat, health, training, and session reset endpoints.
- Refactors chatbot logic into
SimpleBot(offline) +LLMBot(OpenAI) behind aChatBotfacade; isolates Tkinter GUI behind__main__. - Adds a Streamlit demo UI, pytest coverage, Docker image modes, and a GitHub Actions CI pipeline.
Reviewed changes
Copilot reviewed 6 out of 8 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
ai_chatbot.py |
Splits core chatbot into offline + LLM backends with a unified facade; keeps GUI runnable via __main__. |
api.py |
Adds FastAPI REST API with in-memory session storage and retrain endpoint. |
web_demo.py |
Adds Streamlit chat UI with session state and optional LLM streaming. |
test_chatbot.py |
Adds pytest suite for SimpleBot, ChatBot, and API endpoints via TestClient. |
requirements.txt |
Adds dependencies for FastAPI, Streamlit, OpenAI, dotenv, and testing. |
Dockerfile |
Adds container build with runtime switch between API and Streamlit modes. |
.github/workflows/ci.yml |
Adds lint → test → docker build workflow. |
.gitignore |
Adds common Python/venv/env/test ignores and generated CSV ignore. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if session_id not in _sessions: | ||
| _sessions[session_id] = ChatBot() | ||
|
|
There was a problem hiding this comment.
_get_or_create_session() constructs a brand-new ChatBot() for any new session. In pattern mode this can trigger Kaggle download / CSV training inside ChatBot.__init__, making first requests for new sessions potentially very slow and repeated. Consider keeping a pre-trained template (or a shared SimpleBot/responses) and instantiating per-session bots without re-downloading/re-training.
api.py
Outdated
| # Retrain a fresh bot and store it as the template for new sessions | ||
| fresh_bot = ChatBot(dialog_file=request.dialog_file, force_offline=True) | ||
| patterns = fresh_bot.pattern_count | ||
|
|
||
| # Clear all existing sessions so next requests pick up new data | ||
| _sessions.clear() | ||
|
|
There was a problem hiding this comment.
/train creates fresh_bot from dialog_file but never stores it anywhere, so future sessions created via ChatBot() in _get_or_create_session() will not actually use the newly provided training data. If the intent is to retrain the pattern backend for subsequent sessions, persist the trained data/template (e.g., keep a module-level default bot or default dialog_file used when creating new sessions).
| """Reload pattern-matching data from a CSV file on the server. | ||
|
|
||
| The file must exist on the server filesystem. This endpoint only | ||
| affects future sessions created after the reload (existing sessions | ||
| keep their current bot instance). | ||
| """ | ||
| if not os.path.exists(request.dialog_file): | ||
| raise HTTPException( | ||
| status_code=404, | ||
| detail=f"File not found: {request.dialog_file}", | ||
| ) | ||
|
|
||
| # Retrain a fresh bot and store it as the template for new sessions | ||
| fresh_bot = ChatBot(dialog_file=request.dialog_file, force_offline=True) | ||
| patterns = fresh_bot.pattern_count | ||
|
|
||
| # Clear all existing sessions so next requests pick up new data | ||
| _sessions.clear() | ||
|
|
There was a problem hiding this comment.
The /train docstring says existing sessions “keep their current bot instance”, but the implementation clears _sessions, which deletes all existing sessions. Either update the docstring to match the behavior or change the logic to preserve existing session bots while only affecting newly created sessions.
| if not os.path.exists(request.dialog_file): | ||
| raise HTTPException( | ||
| status_code=404, | ||
| detail=f"File not found: {request.dialog_file}", | ||
| ) |
There was a problem hiding this comment.
/train accepts an arbitrary filesystem path from the client and returns an error that echoes the full path. This enables filesystem probing and can let remote clients trigger reads of unintended server-local files. Consider restricting training files to an allowlisted directory, accepting uploads instead of paths, and avoiding echoing the raw path in the error message.
| with open(temp_csv_path, "w", encoding="utf-8") as fh: | ||
| fh.write("dialog_id,line_id,text\n") | ||
| for i in range(0, len(lines) - 1, 2): | ||
| q, a = lines[i].strip(), lines[i + 1].strip() | ||
| if q and a: | ||
| dialog_id += 1 | ||
| fh.write(f"{dialog_id},1,{q}\n") | ||
| fh.write(f"{dialog_id},2,{a}\n") |
There was a problem hiding this comment.
_parse_kaggle_dialogs() writes CSV rows via string interpolation without escaping. If a question/answer contains commas, quotes, or newlines, the generated converted_dialog.csv will be malformed and training will mis-parse rows. Use csv.writer (with proper quoting) to emit the CSV instead of manual fh.write(...) formatting.
| with open(temp_csv_path, "w", encoding="utf-8") as fh: | |
| fh.write("dialog_id,line_id,text\n") | |
| for i in range(0, len(lines) - 1, 2): | |
| q, a = lines[i].strip(), lines[i + 1].strip() | |
| if q and a: | |
| dialog_id += 1 | |
| fh.write(f"{dialog_id},1,{q}\n") | |
| fh.write(f"{dialog_id},2,{a}\n") | |
| with open(temp_csv_path, "w", encoding="utf-8", newline="") as fh: | |
| writer = csv.writer(fh) | |
| writer.writerow(["dialog_id", "line_id", "text"]) | |
| for i in range(0, len(lines) - 1, 2): | |
| q, a = lines[i].strip(), lines[i + 1].strip() | |
| if q and a: | |
| dialog_id += 1 | |
| writer.writerow([dialog_id, 1, q]) | |
| writer.writerow([dialog_id, 2, a]) |
| import os | ||
| import tempfile |
There was a problem hiding this comment.
test_chatbot.py has unused imports (os, tempfile, and the top-level TestClient is re-imported inside the fixture). This adds noise and can trigger flake8 F401 if linting is tightened. Consider removing unused imports and keeping a single TestClient import.
| import os | |
| import tempfile |
| data = resp.json() | ||
| assert data["status"] == "retrained" | ||
| assert data["patterns_loaded"] > 0 | ||
|
|
There was a problem hiding this comment.
The current FastAPI tests don’t verify that /train actually affects subsequent /chat sessions (e.g., that a new session after training responds using the newly provided dialog file). Adding a test for this would catch regressions and would have surfaced the missing persistence of fresh_bot/dialog data.
| def test_train_affects_new_chat_session(self, api_client): | |
| client, _ = api_client | |
| # Create a new dialog file with a distinctive response to "hello" | |
| new_dialog_content = """\ | |
| dialog_id,line_id,text | |
| 1,1,hello | |
| 1,2,Hello from NEW dialog! | |
| """ | |
| tmp = tempfile.NamedTemporaryFile(mode="w", suffix=".csv", delete=False) | |
| try: | |
| tmp.write(new_dialog_content) | |
| tmp.flush() | |
| tmp_path = tmp.name | |
| finally: | |
| tmp.close() | |
| try: | |
| # Retrain the API using the new dialog file | |
| resp = client.post("/train", json={"dialog_file": tmp_path}) | |
| assert resp.status_code == 200 | |
| data = resp.json() | |
| assert data["status"] == "retrained" | |
| assert data["patterns_loaded"] > 0 | |
| # Start a new chat session (no session_id) and ensure the reply | |
| # reflects the newly trained dialog file. | |
| chat_resp = client.post("/chat", json={"message": "hello"}) | |
| assert chat_resp.status_code == 200 | |
| chat_data = chat_resp.json() | |
| assert "reply" in chat_data | |
| assert "Hello from NEW dialog" in chat_data["reply"] | |
| finally: | |
| # Clean up the temporary dialog file | |
| if os.path.exists(tmp_path): | |
| os.remove(tmp_path) |
web_demo.py
Outdated
| @st.cache_resource | ||
| def _get_bot_factory() -> ChatBot: | ||
| """Return a template bot (loads data once); actual per-session bots copy from this.""" | ||
| return ChatBot() | ||
|
|
||
|
|
||
| # Per-session bot stored in session_state so each browser tab/user gets its own history | ||
| if "bot" not in st.session_state: | ||
| st.session_state.bot = _get_bot_factory() |
There was a problem hiding this comment.
@st.cache_resource causes _get_bot_factory() to return the same ChatBot instance across Streamlit sessions/users. Because ChatBot holds conversation history, this can leak context between different browser tabs/users and contradicts the comment about “per-session bots”. Consider caching only the training data (or a factory function) and creating a fresh ChatBot per st.session_state, or ensure the cached object is immutable and per-session bots have independent history.
| @st.cache_resource | |
| def _get_bot_factory() -> ChatBot: | |
| """Return a template bot (loads data once); actual per-session bots copy from this.""" | |
| return ChatBot() | |
| # Per-session bot stored in session_state so each browser tab/user gets its own history | |
| if "bot" not in st.session_state: | |
| st.session_state.bot = _get_bot_factory() | |
| # Per-session bot stored in session_state so each browser tab/user gets its own history | |
| if "bot" not in st.session_state: | |
| st.session_state.bot = ChatBot() |
|
@copilot Resolve the conflicts |
…atbot.py, web_demo.py and test_chatbot.py Co-authored-by: joshuvavinith <146979257+joshuvavinith@users.noreply.github.com>
Conflicts resolved in commit 70afb69. The three files that diverged from main (
All 20 tests pass and flake8 reports 0 errors. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
|
@copilot some CI tests has failed, find and resolve |
…ges reverted by merge commit Co-authored-by: joshuvavinith <146979257+joshuvavinith@users.noreply.github.com>
Fixed in commit
The fix restores all three files. Additionally, Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
api.py- unusedglobal _default_botstatementai_chatbot.pywith refactored version (LLM + pattern-matching backends,ChatBotfacade, GUI only runs when executed directly)api.pywith the flake8 bug fixed (removed unusedglobal _default_bot)test_chatbot.pywith unit/integration testsweb_demo.pyStreamlit web interfaceDockerfilefor web and API modes.github/workflows/ci.ymlCI workflowci.yml(lint/test matrix/docker jobs),README.md(rewritten),requirements.txt(addedpytest-cov)api.py: restored_default_dialog_filemodule var (reverted by merge commit) so/trainpersists training data for new sessions;_get_or_create_sessionnow usesforce_offline=Truewhen a dialog file is set to prevent Kaggle fallback in CI; fixed error message; updated docstringai_chatbot.py:_parse_kaggle_dialogsusescsv.writer; fixed E302 blank lineweb_demo.py: removed@st.cache_resourcefactory (history leak); removed unusedimport ostest_chatbot.py: removed unused top-levelTestClientimport;test_train_affects_new_chat_sessionverifies/trainpersistence💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.