An intelligent study assistant system using the Furhat robot with real-time emotion detection and multimodal AI interaction powered by Google Gemini.
This project consists of three main components:
- Furhat Robot - Physical/virtual robot interface
- Vision System - Real-time emotion detection via webcam
- Main Interview Script - Orchestrates the interaction using Gemini AI
- Python 3.8+
- Furhat SDK (Download here)
- Google Gemini API Key (Get one here)
- Webcam (for emotion detection)
- Model File:
vit_rafdb_4class.pth(should be invisionSystem/directory)
- Download and install the Furhat SDK from furhat.io
- Launch Furhat SDK application
- Start a virtual Furhat robot (or connect to a physical one)
- Note the IP address (default:
localhostfor virtual robot)
- Open Furhat SDK
- Go to the Skills section
- Import the skill file:
skill/furhat-remote-api.skill - Enable the Remote API skill on your Furhat robot
# Create a virtual environment
python3 -m venv venv
# Activate the virtual environment
# On macOS/Linux:
source venv/bin/activate
# On Windows:
# venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtCreate a .env file in the project root:
API_KEY="your_google_gemini_api_key_here"Replace your_google_gemini_api_key_here with your actual Google Gemini API key.
Ensure the emotion detection model is present:
ls visionSystem/vit_rafdb_4class.pthIf missing, you'll need to obtain or train the ViT model for 4-class emotion detection (angry, happy, neutral, sad).
- Launch the Furhat SDK application
- Start your virtual or physical Furhat robot
- Ensure the Remote API skill is active
Open a terminal and run:
# Activate virtual environment if not already active
source venv/bin/activate
# Navigate to visionSystem directory
cd visionSystem
# Start the emotion detection API
python emotion_api.pyThe vision system will:
- Start on
http://127.0.0.1:8000 - Access your webcam
- Continuously process emotions in the background
- Expose a
/moodendpoint
You can test it by visiting: http://127.0.0.1:8000/mood
Open a new terminal (keep the vision system running):
# Activate virtual environment
source venv/bin/activate
# Run the main script
python mainInterview.pyEdit these variables in mainInterview.py:
FURHAT_IP = "localhost" # Change if using physical robot
QUESTIONS_TO_ASK = 5 # Number of interaction rounds
USE_KEYBOARD = False # True for keyboard input, False for voice
MOOD_CLASSIFIER_ENABLED = True # Enable/disable emotion detectionEdit these in visionSystem/emotion_api.py:
WEBCAM_ID = 0 # Change if using different camera
PROCESSING_INTERVAL = 0.5 # Seconds between emotion checksOnce everything is running:
- The Furhat robot will greet you and ask for your name
- You'll be asked what subject you're working on
- The system will engage in 5 rounds of conversation
- Your facial emotions are continuously detected and influence the robot's responses
- At the end, you'll receive a summary
- Voice Mode (default): Speak to the robot
- Keyboard Mode: Set
USE_KEYBOARD = Trueand type responses
- Verify Furhat SDK is running
- Check
FURHAT_IPmatches your robot's address - Ensure Remote API skill is active
- Check webcam is connected and not in use by other applications
- Verify model file exists:
visionSystem/vit_rafdb_4class.pth - Check API is running:
curl http://127.0.0.1:8000/mood
- Verify
.envfile exists in project root - Check API key is valid at Google AI Studio
- Ensure proper formatting:
API_KEY="your_key_here"
# Reinstall dependencies
pip install --upgrade -r requirements.txt.
├── mainInterview.py # Main orchestration script
├── requirements.txt # Python dependencies
├── .env # Environment variables (API keys)
├── skill/
│ └── furhat-remote-api.skill # Furhat Remote API skill (72MB)
├── visionSystem/
│ ├── emotion_api.py # FastAPI emotion detection server
│ └── vit_rafdb_4class.pth # Trained emotion detection model
└── output/ # Output directory (if used)
Key libraries:
furhat-remote-api- Furhat robot controlgoogle-generativeai- Gemini AI integrationfastapi+uvicorn- Emotion API serveropencv-python- Webcam/face detectiontorch+torchvision- Deep learning inferencetimm- Vision Transformer model
See requirements.txt for complete list.
Built for Uppsala University - Intelligent Robot Interaction course.