An AI-powered CV screening platform that automates early-stage candidate selection using Blind Screening (NER-based anonymization), Competency Evaluation (RAG pipeline), and Explainable AI (evidence-based score justifications). Built for fair, transparent, and efficient recruitment — designed as a local single-user system.
| Layer | Technology |
|---|---|
| Backend | FastAPI + SQLAlchemy + SQLite |
| PDF Parsing | PyMuPDF |
| NER (Anonymization) | IndoBERT (ageng-anugrah/indobert-large-p2-finetuned-ner) |
| RAG & Orchestration | LangChain + ChromaDB |
| LLM Inference | DeepSeek V3 (OpenAI-compatible API) |
| Frontend | React + Vite + Tailwind CSS + shadcn/ui |
- Python 3.10+
- Node.js 18+
- DeepSeek API key (get one at platform.deepseek.com)
- ~4 GB disk space (for IndoBERT model cache)
git clone https://github.com/istgrudd/screenai.git
cd screenai# Create and activate virtual environment
python -m venv venv
source venv/bin/activate # Linux/macOS
venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Configure environment
cp .env.example .env
# Open .env and fill in your DEEPSEEK_API_KEYRun the backend:
uvicorn backend.main:app --reload --port 8000API will be available at http://localhost:8000. Health check: GET /api/health.
cd frontend
npm install
npm run devUI will be available at http://localhost:5173.
python -m scripts.seed_rubricThis populates the database with a sample rubric for testing.
screenai/
├── backend/
│ ├── main.py # FastAPI entry point
│ ├── models/ # SQLAlchemy ORM models
│ ├── routers/ # API route handlers
│ ├── services/ # Core pipeline logic
│ │ ├── extractor.py # PDF → raw text (PyMuPDF)
│ │ ├── anonymizer.py # NER-based identity masking
│ │ ├── rag_pipeline.py # LangChain RAG orchestration
│ │ └── scoring.py # Weighted score aggregation
│ └── utils/ # Helpers (PDF, NER, LLM client)
├── frontend/
│ └── src/
│ ├── pages/ # Upload, Dashboard, Detail, Rubric
│ └── components/ # UI components
├── scripts/
│ └── seed_rubric.py # Seed default rubric
├── data/ # Local data (gitignored)
├── requirements.txt
├── .env.example
└── CLAUDE.md # Execution plan
- PDF Upload — Multi-file upload for CVs and language certificates
- Text Extraction & Normalization — Section-aware parsing (education, experience, skills, certifications)
- Blind Screening — Automatic anonymization of names, contacts, institutions via IndoBERT NER + regex fallback
- Rubric Configuration — Define competency dimensions, weights, and indicators per job position
- RAG-based Scoring — Embed rubric → retrieve relevant CV chunks → generate structured scores via DeepSeek V3
- Explainable AI — Every score includes specific CV evidence and a justification
- Profile Summary — Narrative candidate overview generated by LLM
- Candidate Ranking — Dashboard sorted by weighted composite score
- Score Override — Recruiters can manually adjust dimension scores
- Batch Processing — Evaluate all candidates for a rubric in one action
Kelompok 26 — Capstone Design, Telkom University 2026