NLP Researcher | MSc Information Technology, Metropolia University of Applied Sciences — Graduated May 2026 | Based in Espoo, Finland
Context-Aware Document-Level Location Selection for Finnish News Articles: A Hybrid Rule-Based and LLM Approach
Developed in collaboration with Superhood Oy — a Finnish neighbourhood-level news platform. Awarded grade 5/5 — the highest grade at Metropolia University of Applied Sciences.
The research built and evaluated a four-configuration NLP pipeline for extracting and ranking geographic locations at postal-code granularity from Finnish news articles. The system combined:
- Stanza for Finnish NER and morphological normalisation
- Geoapify for dynamic geocoding and candidate resolution
- Postal-first hierarchical ranking for geographic level disambiguation
- Model Context Protocol (MCP) with Llama 3.3 70B via Groq for contextual reasoning
Key results: 83.33% exact match accuracy with the full hybrid configuration across 60 Finnish news articles — a 16.66 percentage point improvement over the rule-based baseline.
Key finding: Systematic lemmatization failures on Finnish postal-level names point to a data coverage gap in Finnish NER training corpora — not a tool architecture limitation.
📄 Published thesis: URN:NBN:fi:amk-2026051311846
📊 Evaluation dataset & results: thesis-nlp-evaluation
- Pursuing PhD opportunities in Clinical NLP, Biomedicine and cross-lingual health information extraction —
- Advance NLP and Machine Learning certifications in progress
| Area | Tools & Technologies |
|---|---|
| NLP | Stanza, Finnish NER, named entity recognition, lemmatization, evaluation framework design, error analysis |
| LLM Integration | MCP (Model Context Protocol), Groq, Llama 3.3 70B, FastMCP |
| Backend | Python, FastAPI, Flask, REST APIs |
| Databases | PostgreSQL, MongoDB, MySQL |
| ML | Regression, classification fundamentals, ablation study design |
| Data | Pandas, NumPy |
Four-configuration hybrid pipeline for Finnish postal-level location extraction. Evaluation dataset and results available in the thesis evaluation repo above. Grade 5/5.
Annotated evaluation dataset, ground truth annotations, and results across all four pipeline configurations. This is the live evaluation repository for the thesis research.
Hands-on exploration of Model Context Protocol — the tool orchestration framework used in the thesis disambiguation layer.
University coursework implementing regression and classification algorithms in Python including Ridge/Lasso regression, SVM, KNN, and structured evaluation.
Prototype language learning app providing real-time feedback on Finnish grammar and pronunciation.
Contributed backend REST API development for a VoIP application supporting calling and scheduling meetings. Implemented CRUD operations and database integration using Flask and MySQL.
Notebooks and experiments from NLP and Machine Learning specialisation coursework. Updated continuously as certifications progress.



