AI Talent Scout is a comprehensive Streamlit-based web application designed to streamline the process of managing, searching, and analyzing candidate resumes using advanced LLMs (Large Language Models), MongoDB, MinIO, and vector search with ChromaDB. The app features resume extraction, semantic search, CSV export, skills management, and intelligent candidate matching.
- Resume Upload & Extraction: Upload PDF resumes, extract structured candidate data using LLMs (Groq LLaMA 3.3 or local Ollama), and store them in MongoDB.
- Semantic Skill Matching: Uses ChromaDB and Google Generative AI embeddings to deduplicate and match skills, supporting fuzzy/semantic search.
- Candidate Search & Chat: Query the candidate database using natural language, with LLMs generating MongoDB queries for flexible search.
- CSV Export: View and export candidate summaries as CSV.
- Resume Library: Browse, preview, and download all uploaded resumes.
- Skills Management: Comprehensive interface to manage, add, remove, and synchronize skills across MongoDB and ChromaDB.
- MinIO Integration: Store and retrieve PDF files securely.
- Intelligent Skill Processing: Automatic skill normalization, similarity detection, and database synchronization.
- Frontend: Streamlit
- Backend: Python 3.12
- Database: MongoDB
- Object Storage: MinIO
- Vector Search: ChromaDB + Google Generative AI Embeddings
- LLMs: Groq API (LLaMA 3.3), Ollama (local)

AI Talent Scout Global Architecture
For the ELK stack check the "main-with-elk" branch
Stage2A_1/
├── main.py # Main application entry point
├── pages/ # Streamlit page modules
│ ├── chatPage.py # AI-powered candidate search interface
│ ├── upload_resumePage.py # Resume upload and processing
│ ├── csvPage.py # CSV export and candidate viewing
│ ├── listResume.py # Resume library and browsing
│ └── skillsManagementPage.py # Skills management interface
├── services/ # Business logic services
│ ├── llm_service.py # LLM integration and processing
│ └── dictionaire_service.py # Skills dictionary management
├── clients/ # Database and storage clients
│ ├── mongo_client.py # MongoDB operations
│ └── minio_client.py # MinIO file storage
├── llms/ # LLM client implementations with strategy pattern
│ ├── groqClient.py # Groq API integration
│ ├── ollamaClient.py # Local Ollama integration
│ └── llmClientABC.py # Abstract base class
├── embeddings/ # Vector search and embeddings
│ ├── chroma_gemini_embedding.py # ChromaDB + Google AI integration
│ └── google_langchain_chroma_Adapter.py
├── utils.py # Utility functions
├── docker-compose.yml # MinIO service configuration
- Docker & Docker Compose
- Python 3.12 (for local development)
- MongoDB instance (local or cloud)
- API keys for Groq, Google Generative AI
- Ollama (for local LLM processing)
Create a .env file in the project root with the following variables:
MINIO_ENDPOINT=localhost:9000
MINIO_ROOT_USER=your_minio_user
MINIO_ROOT_PASSWORD=your_minio_password
MONGO_ENDPOINT=mongodb://your_mongo_uri
GOOGLE_API_KEY=your_google_api_key
CHROMA_GOOGLE_GENAI_API_KEY=your_google_api_key
GROQ_API_KEY=your_groq_api_key# Install dependencies
pip install -r requirements.txt
#Launch Minio object storage
docker-compose up -d
# Run the application
streamlit run main.py- The app will be available at http://localhost:8501
- MinIO Console: http://localhost:9001
The AI Talent Scout system follows this comprehensive workflow:
- Resume Upload → PDF text extraction using PyPDF2
- AI Processing → LLM analysis and structured data extraction
- Skills Processing → Automatic skill normalization and similarity matching
- Data Storage → MongoDB for profiles, MinIO for PDFs, ChromaDB for embeddings
- Search & Matching → Natural language queries converted to MongoDB queries
- Export & Analysis → CSV export and detailed candidate viewing
- Skills Management → Centralized skills dictionary management and synchronization
- Navigate to the "Upload Page"
- Select one or more PDF resumes
- Choose your preferred LLM (Groq API or local Ollama)
- The app will extract structured data and store it in MongoDB and MinIO
- Skills are automatically deduplicated and matched using semantic search
- Go to the "Chat Page"
- Select your preferred LLM client
- Ask questions in natural language, such as:
- "Show me Python developers with 5+ years experience"
- "Find candidates who know React and have worked with AWS"
- "Who has experience in machine learning and data science?"
- The LLM generates MongoDB queries and returns matching candidates
- Download candidate resumes directly from the results
- Visit the "CSV Table" page
- Search and filter candidates by name, email, role, or summary
- View a comprehensive table of all candidates
- Download the data as a CSV file for external analysis
- Access the "Resume Library" page
- Browse all uploaded resumes with PDF preview
- Search candidates by various criteria
- Download individual resumes as needed
- Navigate to the "Skills Management" page
- View Skills: Browse all current skills with search and filter capabilities
- Add Skills: Add individual skills or bulk import multiple skills
- Remove Skills: Select and remove unwanted skills from both databases
- Database Sync: Check synchronization status between MongoDB and ChromaDB
- Force Sync: Resolve any database inconsistencies
- Export Skills: Download skills data as CSV for external use
- Intelligent Extraction: LLMs extract structured data from PDF resumes
- Skill Normalization: Automatic skill matching and deduplication
- Semantic Search: Find similar skills using vector embeddings
- Flexible LLM Options: Choose between cloud (Groq) and local (Ollama) processing
- MongoDB: Stores candidate profiles and extracted data
- ChromaDB: Vector database for skill similarity search
- MinIO: Secure file storage for original PDFs
- Automatic Sync: Ensures consistency between all databases
- Natural Language Queries: Ask questions in plain English
- Smart Filtering: AI-generated MongoDB queries for precise results
- Resume Download: Direct access to candidate PDFs
- Export Capabilities: CSV export for external analysis
The application follows a modular architecture:
- Pages: Streamlit UI components for different functionalities
- Services: Business logic and LLM processing
- Clients: Database and external service integrations
- LLMs: Abstracted language model interfaces
- Embeddings: Vector search and similarity matching
The Skills Management page provides comprehensive control over the skills dictionary:
- Skills Viewing: Browse all skills with search and filter capabilities
- Bulk Operations: Add/remove multiple skills simultaneously
- Database Synchronization: Ensure MongoDB and ChromaDB consistency
- Export Capabilities: Download skills data for external analysis
AI Talent Scout represents a modern approach to resume management and candidate discovery, combining the power of Large Language Models with intelligent database management and vector search capabilities. The system provides a comprehensive solution for organizations looking to streamline their recruitment processes while maintaining data quality and consistency.