Skip to content

anhpdd/anhpdd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

Hi, I'm Robin (Duy Anh) 👋

Data Scientist | Business Analytics Graduate | Building ML Systems That Solve Real Problems

I turn messy, real-world data into production-ready machine learning systems. My edge? A business background that helps me translate technical solutions into stakeholder value—not just optimize metrics.

🎓 Master's in Business Analytics @ Sunway University 🌏 Seeking roles in: Malaysia | Singapore | Vietnam


💡 What Makes Me Different

I didn't start in computer science—I came from International Business, taught myself data analytics in 2021, and pursued a Master's in Business Analytics. That unconventional path means I don't just build models—I solve problems that matter to stakeholders and communicate insights people can actually use.


🚀 What I'm Working On

🏠 Property Price Prediction System – 97% accurate ML model for Malaysia's Klang Valley using geospatial features and DBSCAN clustering
📱 Building in Public – Sharing my data science journey on LinkedIn


💼 Featured Projects

The Challenge: Property valuations in Malaysia take days of manual research and cost RM 400-2,000+ per property.

My Solution: Built an end-to-end ML system that predicts prices in under 5 minutes with 97% accuracy (R² = 0.97). The breakthrough wasn't just the algorithm—it was solving a data quality nightmare.

Key Innovation:
Consolidated 18,000+ inconsistent location labels (misspelled road names, duplicate schemes, manual entry errors) into 238 spatial market segments using DBSCAN clustering. This single feature engineering step improved model accuracy from 84% to 97%.

Tech Stack: Python • scikit-learn • DBSCAN • Random Forest • OpenStreetMap • Geospatial Analysis • pandas

Business Impact:
✅ Reduces valuation time from days → minutes (99% faster)
✅ Maintains 97% accuracy on unseen 2025 data (temporal validation)
✅ Production-ready Python package with 50+ unit tests
✅ Potential cost savings: RM 150,000/month for high-volume agencies

What I Learned: Feature engineering > hyperparameter tuning. I achieved 97% with default Random Forest parameters—proving that smart data preparation matters more than complex algorithms.

📂 View Full Project | 📊 Technical Deep Dive


The Problem: Brands need to understand how they're perceived on social media, but manual analysis doesn't scale.

My Solution: Built an NLP pipeline that processes 10,000+ social media posts to extract brand perception insights and competitive positioning.

Tech Stack: Python • NLP • Sentiment Analysis • pandas • Text Processing

Business Value:
✅ Automated sentiment tracking across platforms
✅ Comparative brand analysis (Uniqlo vs Muji positioning)
✅ Network analysis revealing influencer patterns

📂 View Project


The Problem: Logistics companies need efficient routing to minimize delivery time and fuel costs.

My Solution: Implemented a genetic algorithm solution for the Traveling Salesman Problem, optimizing delivery routes across 150+ locations in Subang, Malaysia.

Tech Stack: Python • Genetic Algorithms • Optimization • Evolutionary Computing

Impact:
✅ Reduces total route distance by 20-30%
✅ Scalable to real-world logistics scenarios
✅ Demonstrates algorithmic problem-solving

📂 View Project


🛠️ Tech Stack

Core Skills:
Python • SQL • Machine Learning • Statistical Analysis • Data Visualization

ML & Data Science:
scikit-learn • pandas • NumPy • TensorFlow • DBSCAN • Random Forest • Feature Engineering

Visualization & BI:
Tableau • Power BI • Matplotlib • Seaborn • Plotly

Cloud & DevOps:
AWS (learning) • Git • Jupyter • VS Code • Google Colab

Domain Expertise:
Geospatial Analysis • NLP • Sentiment Analysis • Optimization Algorithms • Time Series Analysis


📊 GitHub Activity

Anh's GitHub Stats

Top Languages


🎯 What I Bring to Your Team

End-to-end ML execution – From messy data to production-ready models
Business acumen – I understand stakeholder needs and translate technical insights into action
Communication skills – I explain complex concepts to non-technical audiences (proven through LinkedIn content)
Production mindset – I write clean, tested, documented code (see my 50+ unit tests)
Continuous learning – Currently expanding into AWS/MLOps to enhance deployment capabilities


📫 Let's Connect

💼 LinkedIn: linkedin.com/in/phan-đức-duy-anh
📧 Email: duyanh.phanduc@gmail.com
🌐 GitHub: github.com/anhpdd

Currently seeking: Data Scientist | Business Intelligence Analyst roles
Available: January 2026
Locations: Malaysia | Singapore | Vietnam
Work Authorization: Graduate Pass sponsorship required


💬 Recent Highlights

📱 Building in Public: Sharing my data science journey on LinkedIn with 3x weekly posts about ML, career lessons, and technical deep-dives

🎓 Academic Recognition: Capstone project supervised by Dr. Norman Arshed & Dr. Mubbasher Munir (Sunway University)

🌱 Current Learning: AWS Cloud Practitioner certification, LLM integration with Gemini API, MLOps best practices


🔥 Fun Facts

  • 🌏 Originally from International Business → self-taught analytics → Master's in Business Analytics
  • 📚 Started learning data science on DataCamp in 2021
  • 🗺️ Fascinated by geospatial analytics and how location data shapes decisions
  • ☕ Best ideas come at 2 AM during debugging sessions

"Data science isn't just about algorithms—it's about solving real problems end-to-end."


⭐ If you found my work interesting, consider giving my repos a star!

💼 Open to collaboration, mentorship, and full-time opportunities starting January 2026.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors