Panda WebUI is a lightweight and user-friendly web interface designed for running lightweight Language Models (LLMs) using the Hugging Face transformers library. In addition to supporting standard LLMs, Panda WebUI also accommodates GGUF models and several popular large Vision-Language Models (VLMs).
- Create a new environment
conda create -n panda python=3.11
conda activate panda- Install pytorch
pip install torch==2.4.1 torchvision==0.19.1 --index-url https://download.pytorch.org/whl/cu121- For
llama-cpp(GGUF inference) anddeepspeed(training):
conda install -y -c "nvidia/label/cuda-12.1.1" cudaIf training is not needed, one can install the CUDA Runtime instead:
conda install -y -c "nvidia/label/cuda-12.1.1" cuda-runtime- Install the packages in requirements.txt
- If unable to install auto-gptq try installing it separately:
pip install auto-gptq --no-build-isolation
- If unable to install flash-attention try installing it using the system terminal
- Create a symbolic link
weightsto the model directory:
ln -s /path_to_weights .To use models from Huggingface, set the Huggingface cache by setting the environment variable in .bashrc:
export HF_HOME=/path/to/cache
Then run the command to reload and apply the changes made to .bashrc:
source ~/.bashrcThis shell script can be used to run from the terminal.
cd ~/path/to/Panda-LLM/ || { echo "Failed to change directory"; exit 1; }
# Activate the conda environment named 'panda'
source ~/miniforge3/etc/profile.d/conda.sh
conda activate panda
# Check if the conda environment 'panda' is active.
if [ "$CONDA_DEFAULT_ENV" = "panda" ]; then
echo "Successfully activated conda environment 'panda'."
else
echo "Failed to activate conda environment 'panda'."
exit 1
# Run the Python script
python webui.pyIn Models, use the Download feature to download models from Huggingface. To use the downloaded model, refresh and load the model on the left side of the Models page. Once the model is loaded, you can start chatting with the model in Main.
Inference-time FLOP estimation is computed using a single high-definition image (resolution 1920x1080), 128 padding input tokens and 1 output token. The list of available models for this feature can be found here