Skip to content

DavidFSantillan/Data-Science-Porfolio

Repository files navigation

🧠 Data Science Portfolio — Python-Scripts

A curated collection of hands-on projects and labs spanning the full modern ML/AI stack — from efficient data engineering to production-oriented deep learning systems.

Python PyTorch TensorFlow LangChain Jupyter


🗂️ Portfolio Map

# Area Key Topics Notebooks
1 📦 Data Loading & Augmentation DataLoaders, transforms, Keras/PyTorch pipelines, memory vs. generator strategy 4
2 🧱 Deep Learning — CNN & NN CNNs, image classification, multi-framework comparison, medical & anime datasets 10
3 🤖 Transformers & Vision Transformers ViT, self-attention, BERT-style architectures, Keras & PyTorch implementations 3
4 🎮 Reinforcement Learning Tabular Q-Learning, Deep Q-Networks (DQN), policy optimization 3
5 📝 NLP, RAG & Embeddings LangChain, vector stores, RAG pipelines, watsonx embeddings, QA bots 5
6 👁️ Computer Vision & Multimodal AI CNN-ViT hybrid integration, image captioning, satellite scene classification 3
7 🏆 Capstone Business Projects End-to-end ML pipelines, game analytics, competitive feature engineering 1

Total: 29 notebooks across 7 skill domains


🛠️ Core Skills Demonstrated

Machine Learning & Deep Learning

  • CNNs for image classification across multiple domains (medical imaging, anime, fashion, MNIST)
  • Vision Transformers (ViT) — built from scratch and fine-tuned in both Keras and PyTorch
  • Multi-framework proficiency: side-by-side Keras (TensorFlow) and PyTorch implementations
  • Reinforcement Learning: from tabular Q-tables to Deep Q-Networks with experience replay

NLP & Generative AI

  • LangChain pipelines: document loaders, retrievers, vector stores
  • RAG (Retrieval-Augmented Generation): end-to-end QA systems
  • Embeddings: watsonx enterprise embedding API integration
  • QA chatbots: custom qabot.py with context-aware question answering

Data Engineering

  • Memory-efficient data loading: generator-based vs. memory-based strategies with benchmarks
  • Data augmentation: Keras ImageDataGenerator, PyTorch transforms pipelines
  • Custom Datasets: torch.utils.data.Dataset and DataLoader patterns

MLOps & Best Practices

  • Reproducible experiments with fixed seeds
  • Model evaluation, confusion matrices, and performance metrics
  • Feature engineering for real-world datasets

📁 Repository Structure

Python-Scripts/
├── 01_data_loading_augmentation/     # Data pipelines, transforms & augmentation
│   ├── README.md
│   └── *.ipynb (4 notebooks)
│
├── 02_deep_learning_cnn_nn/          # CNN & classical neural networks
│   ├── README.md
│   └── *.ipynb (10 notebooks)
│
├── 03_transformers_and_vit/          # Transformer & Vision Transformer architectures
│   ├── README.md
│   └── *.ipynb (3 notebooks)
│
├── 04_reinforcement_learning/        # RL algorithms & DQN agents
│   ├── README.md
│   └── *.ipynb (3 notebooks)
│
├── 05_nlp_rag_embeddings/            # NLP, RAG systems & embedding pipelines
│   ├── README.md
│   ├── qabot.py
│   └── *.ipynb (4 notebooks)
│
├── 06_computer_vision_multimodal/    # Advanced CV & multimodal models
│   ├── README.md
│   └── *.ipynb (3 notebooks)
│
├── 07_capstone_business_projects/    # End-to-end business analytics
│   ├── README.md
│   └── *.ipynb (1 notebook)
│
├── requirements.txt                  # All dependencies
├── portfolio_app.py                  # Streamlit interactive portfolio
└── README.md                         # This file

🚀 Quick Start

Prerequisites

  • Python 3.10-3.12 for full framework compatibility (including TensorFlow)
  • Python 3.13+ supported for PyTorch/LangChain workflows (TensorFlow skipped)
  • CUDA-capable GPU (recommended for DL notebooks)

Setup

# Clone the repository
git clone <your-repo-url>
cd Python-Scripts

# Create virtual environment
python -m venv .venv
.venv\Scripts\activate        # Windows
# source .venv/bin/activate   # macOS/Linux

# Install dependencies
pip install -r requirements.txt

# Launch Jupyter Lab
jupyter lab

# Or run the interactive portfolio app
streamlit run portfolio_app.py

🌟 Featured Projects

Project Area Highlight
League of Legends Match Predictor Business ML End-to-end pipeline, competitive game analytics
CNN-ViT Integration CV + Transformers Hybrid architecture for satellite classification
LangChain RAG System NLP / GenAI Full retrieval-augmented QA pipeline
Deep Q-Network RL Keras DQN with experience replay
Multi-Framework Classifier Comparison DL Keras vs PyTorch — same architecture, side-by-side

📬 Contact


Portfolio built with ❤️ using Python, Jupyter, PyTorch, TensorFlow, and LangChain.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors