Computer Vision & 3D Perception Engineer
M.Sc. in Communications & Electronics Engineering at TU Munich (1.3 final grade). Master's thesis at the 3D AI Lab on zero-shot 4D human-object interaction synthesis from text. Teaching Assistant for TUM's Computer Vision III (Detection, Segmentation, Tracking).
I build systems that see, reconstruct, and reason about the 3D world. My project work spans autonomous pursuit vehicles in CARLA, NeRF-based scene reconstruction, real-time safety monitoring, and agentic AI for wildfire response. I'm looking for roles in computer vision, 3D perception, and autonomous systems.
Working Student — Algorithm & Software Developer | Innoligent Technologies, Munich | Sep 2024 – Present
- Researching and training StockFormer, a Transformer + Soft Actor-Critic RL framework for algorithmic portfolio optimization
- Built a real-time workplace safety monitoring desktop app with pose estimation, predictive risk forecasting, and interactive zone editing
- Maintained an in-house video annotation tool for human actions with one-click YOLO-assisted labeling
Research Intern | TU Munich, Chair of Media Technology | Apr 2025 – Jun 2025
- Designed and evaluated four fusion strategies injecting object-level YOLOv8 + CLIP priors into Mask2Former for improved surface material segmentation on the Apple Dense Material Segmentation dataset
Teaching Assistant | TU Munich | Oct 2025 – Mar 2026
- TA for IN2375: Computer Vision III (Detection, Segmentation, and Tracking). Held programming exercise sessions, office hours, and co-graded final exam for 100+ students
Freelance ML Engineer | Top Rated Plus on Upwork (top 3%) | Dec 2021 – Sep 2023
- Delivered ML and CV solutions to global clients, contributed to RLHF evaluation for ChatGPT, Codex, and Claude, and built classification pipelines with BERT, LSTM, and XGBoost ensembles
Assistant Manager (Technical) | Avant Labs, Islamabad | Jul 2020 – Nov 2022
- Led hardware design and development for custom test and measurement products. Owned the full cycle from customer requirements through schematics, PCB review, Embedded C/C++ firmware, Verilog signal processing, and hardware bring-up to manufacturing handover
Zero-Shot 4D Human-Object Interaction Synthesis from Text — Master's Thesis
3D AI Lab, TU Munich
Generates dynamic 3D human-object interactions from text prompts, with no paired motion-capture data needed. LLMs parse prompts into Part Affordance Graphs that encode contact semantics, and video diffusion models serve as a physics prior. Object meshes come from SAM-3D, human motion from GVHMR, and 3D object trajectories from CoTracker-based rigid-transform optimization, all anchored to a consistent scale using Depth-Anything-3 metric depth. Object motion is then refined against fixed human anchors using PAG-derived contact constraints so hands actually meet handles, feet stay on surfaces, and the interaction looks physically plausible.
PyTorch PyTorch3D Blender Foundation Models LLMs/VLMs Differentiable Optimization 3D Reconstruction
Robust Autonomous Vehicle Pursuit — Research Project
Computer Vision Group, TU Munich
An ego vehicle in CARLA locks onto a user-designated target car through traffic and follows it, even when visually similar distractors are nearby. The target is selected with a bounding-box prompt in the first frame, then tracked online with SAM-3 mask propagation. A ConvNeXt-Base pose regressor estimates relative translation and yaw from the masked RGB input, and an MPC controller closes the loop in real time. The pose regressor is trained on a custom large-scale multi-town CARLA dataset where per-target ground-truth poses are generated by 3D CenterPoint detections on LiDAR.
PyTorch CARLA SAM-3 ConvNeXt CenterPoint MMDetection3D LiDAR MPC Autonomous Driving
Better SceneRF: Improving Self-Supervised Monocular 3D Scene Reconstruction — Semester Project
Improved SceneRF (ICCV 2023) by replacing sinusoidal positional encoding with Random Fourier Features and introducing coarse-to-fine hierarchical volume sampling via inverse-CDF density resampling.
| Task | Improvement |
|---|---|
| Novel Depth Synthesis | +16% |
| 3D Scene Reconstruction | +7% |
| Novel View Synthesis | +2% |
Extended the full pipeline to support the TUM RGB-D dataset.
PyTorch NeRF Volume Rendering 3D Reconstruction EfficientNet Docker
FireGPT: AI-Powered Wildfire Intelligence & Drone Operations — Semester Project
An agentic RAG platform for wildfire incident response. A LangGraph ReAct agent autonomously executes up to 15 tool-use steps per query, pulling from a three-tier ChromaDB knowledge base with BGE-M3 embeddings and cross-encoder reranking, computing fire danger scores from 4 Google Earth Engine satellite datasets, and generating terrain-aware drone waypoints. The whole stack is served via Docker Compose.
LangGraph FastAPI ChromaDB MCP Google Earth Engine Ollama Docker Compose Agentic AI RAG
Multi-View Furniture Assembly Action Recognition — Semester Project
Recognizes furniture-assembly actions from 8 synchronized Intel RealSense D435i cameras. The model extracts per-view spatial features with ResNet-50, fuses them across cameras using multi-head attention to handle occlusion, and models temporal dynamics with a Transformer encoder. Achieves 88.4% frame-level accuracy and ~0.82 macro-F1 on the held-out test split.
PyTorch ResNet-50 Transformers Multi-View Learning Attention Mechanisms Action Recognition
Autonomous valet parking on a physical F1Tenth car running ROS2 on a Jetson Nano with 2D LiDAR as the only sensor. Built a full autonomy stack covering SLAM, particle filter localization, LiDAR-based parking spot detection, RRT motion planning, and PID control, all tuned to run within the Jetson Nano's compute and memory budget.
ROS2 C++ Jetson Nano SLAM RRT PID Control Embedded Systems Robotics
| Domain | Technologies |
|---|---|
| Languages | Python, C++, C (Embedded), MATLAB/Simulink |
| AI / ML | Foundation Models, LLMs, VLMs, Diffusion Models, Transformers, Agentic AI, RAG, RL, Self-Supervised Learning |
| Computer Vision / 3D | 3D Reconstruction, NeRFs, Human-Object Interaction, Multi-View Geometry, Detection, Segmentation, Tracking, Autonomous Driving |
| Optimization & Control | Differentiable Optimization, MPC, PID, SLAM, RRT, Particle Filter Localization |
| Autonomous Systems | ROS2, Jetson Nano, LiDAR, CARLA, F1Tenth, Embedded & Real-Time Systems |
| Frameworks | PyTorch, PyTorch3D, OpenCV, Hugging Face Transformers, LangChain, LangGraph, FastAPI, ChromaDB |
| Tools | Docker, Git, CUDA, Blender, Google Earth Engine, MCP |
M.Sc. Communications & Electronics Engineering — Technical University of Munich | Oct 2023 – Jul 2026 | Grade: 1.3
Deep Learning for 3D Perception, Deep Learning for Computer Vision, Machine Learning for 3D Geometry, Human Activity Understanding, CV III (Detection, Segmentation, Tracking), Fundamentals of Foundation Models, F1Tenth Autonomous Driving Hands-On
B.E. Electrical Engineering — National University of Sciences and Technology, Pakistan | Sep 2016 – Jul 2020 | Grade: 1.4


