MemoryChat is a lightweight chatbot system built using the Groq API that supports session-based conversational memory with strict token and rate-limit control. It is designed to maintain short-term conversational context while optionally storing structured personal information, without resending full chat histories to the language model.
The project also includes a Streamlit application that provides a simple web-based user interface for interacting with the chatbot.
MemoryChat separates memory into two clearly defined components:
- Short-term conversational memory, used to maintain dialogue coherence
- Session-level personal information, used to store structured user data
This separation allows the chatbot to remain efficient, predictable, and scalable.
- Stores the last N messages exchanged between the user and the assistant
- Implemented using a sliding window mechanism
- Persisted locally in a JSON file
- Older messages are automatically discarded
Only the most recent messages are sent to the language model, ensuring a fixed upper bound on token usage.
- Stores structured user information such as preferences or context
- Saved separately from the message history
- Injected into prompts only when relevant
- Not subject to the sliding window
This prevents repeated restatement of user information while keeping prompts concise.
memory/
├── msg_hist.json # Sliding window chat history
├── personal_info.json # Structured session memory
- User submits a message via the interface
- Message is appended to the message history
- History is trimmed to the last N messages
- Relevant personal information is optionally injected
- The request is sent to the Groq LLM
- The assistant’s response is saved back into history
At no point is the full conversation history transmitted.
MemoryChat includes a Streamlit-based web application that:
- Provides an interactive chat interface
- Displays model responses in real time
- Manages session memory transparently
- Allows rapid testing and iteration
The Streamlit app acts purely as a frontend layer and does not alter the underlying memory logic.
- Token efficiency through strict context limits
- Rate-limit safety via predictable prompt size
- Simplicity using JSON persistence instead of databases
- Modularity between memory, responder logic, and UI
- A long-term memory or knowledge system
- A replacement for retrieval-augmented generation (RAG)
- Designed for extended multi-hour reasoning sessions
- Sliding window size:
N = 4 - Minimal system prompt
- Concise assistant responses
- Selective memory injection only when necessary
MemoryChat demonstrates a practical and disciplined approach to chatbot memory management. By combining a sliding window context strategy with structured session memory and a Streamlit interface, it delivers reliable conversational continuity without exceeding token or rate limits.