Muhammad Umer Abbasi mumerabbasi

Muhammad Umer Abbasi

Computer Vision & 3D Perception Engineer

M.Sc. in Communications & Electronics Engineering at TU Munich (1.3 final grade). Master's thesis at the 3D AI Lab on zero-shot 4D human-object interaction synthesis from text. Teaching Assistant for TUM's Computer Vision III (Detection, Segmentation, Tracking).

I build systems that see, reconstruct, and reason about the 3D world. My project work spans autonomous pursuit vehicles in CARLA, NeRF-based scene reconstruction, real-time safety monitoring, and agentic AI for wildfire response. I'm looking for roles in computer vision, 3D perception, and autonomous systems.

Professional Experience

Working Student — Algorithm & Software Developer | Innoligent Technologies, Munich | Sep 2024 – Present

Researching and training StockFormer, a Transformer + Soft Actor-Critic RL framework for algorithmic portfolio optimization
Built a real-time workplace safety monitoring desktop app with pose estimation, predictive risk forecasting, and interactive zone editing
Maintained an in-house video annotation tool for human actions with one-click YOLO-assisted labeling

Research Intern | TU Munich, Chair of Media Technology | Apr 2025 – Jun 2025

Designed and evaluated four fusion strategies injecting object-level YOLOv8 + CLIP priors into Mask2Former for improved surface material segmentation on the Apple Dense Material Segmentation dataset

Teaching Assistant | TU Munich | Oct 2025 – Mar 2026

TA for IN2375: Computer Vision III (Detection, Segmentation, and Tracking). Held programming exercise sessions, office hours, and co-graded final exam for 100+ students

Freelance ML Engineer | Top Rated Plus on Upwork (top 3%) | Dec 2021 – Sep 2023

Delivered ML and CV solutions to global clients, contributed to RLHF evaluation for ChatGPT, Codex, and Claude, and built classification pipelines with BERT, LSTM, and XGBoost ensembles

Assistant Manager (Technical) | Avant Labs, Islamabad | Jul 2020 – Nov 2022

Led hardware design and development for custom test and measurement products. Owned the full cycle from customer requirements through schematics, PCB review, Embedded C/C++ firmware, Verilog signal processing, and hardware bring-up to manufacturing handover

Featured Projects

Zero-Shot 4D Human-Object Interaction Synthesis from Text — Master's Thesis

3D AI Lab, TU Munich

Generates dynamic 3D human-object interactions from text prompts, with no paired motion-capture data needed. LLMs parse prompts into Part Affordance Graphs that encode contact semantics, and video diffusion models serve as a physics prior. Object meshes come from SAM-3D, human motion from GVHMR, and 3D object trajectories from CoTracker-based rigid-transform optimization, all anchored to a consistent scale using Depth-Anything-3 metric depth. Object motion is then refined against fixed human anchors using PAG-derived contact constraints so hands actually meet handles, feet stay on surfaces, and the interaction looks physically plausible.

PyTorch PyTorch3D Blender Foundation Models LLMs/VLMs Differentiable Optimization 3D Reconstruction

Robust Autonomous Vehicle Pursuit — Research Project

Computer Vision Group, TU Munich

An ego vehicle in CARLA locks onto a user-designated target car through traffic and follows it, even when visually similar distractors are nearby. The target is selected with a bounding-box prompt in the first frame, then tracked online with SAM-3 mask propagation. A ConvNeXt-Base pose regressor estimates relative translation and yaw from the masked RGB input, and an MPC controller closes the loop in real time. The pose regressor is trained on a custom large-scale multi-town CARLA dataset where per-target ground-truth poses are generated by 3D CenterPoint detections on LiDAR.

PyTorch CARLA SAM-3 ConvNeXt CenterPoint MMDetection3D LiDAR MPC Autonomous Driving

Better SceneRF: Improving Self-Supervised Monocular 3D Scene Reconstruction — Semester Project

Improved SceneRF (ICCV 2023) by replacing sinusoidal positional encoding with Random Fourier Features and introducing coarse-to-fine hierarchical volume sampling via inverse-CDF density resampling.

Task	Improvement
Novel Depth Synthesis	+16%
3D Scene Reconstruction	+7%
Novel View Synthesis	+2%

Extended the full pipeline to support the TUM RGB-D dataset.

PyTorch NeRF Volume Rendering 3D Reconstruction EfficientNet Docker

FireGPT: AI-Powered Wildfire Intelligence & Drone Operations — Semester Project

An agentic RAG platform for wildfire incident response. A LangGraph ReAct agent autonomously executes up to 15 tool-use steps per query, pulling from a three-tier ChromaDB knowledge base with BGE-M3 embeddings and cross-encoder reranking, computing fire danger scores from 4 Google Earth Engine satellite datasets, and generating terrain-aware drone waypoints. The whole stack is served via Docker Compose.

LangGraph FastAPI ChromaDB MCP Google Earth Engine Ollama Docker Compose Agentic AI RAG

Multi-View Furniture Assembly Action Recognition — Semester Project

Recognizes furniture-assembly actions from 8 synchronized Intel RealSense D435i cameras. The model extracts per-view spatial features with ResNet-50, fuses them across cameras using multi-head attention to handle occlusion, and models temporal dynamics with a Transformer encoder. Achieves 88.4% frame-level accuracy and ~0.82 macro-F1 on the held-out test split.

PyTorch ResNet-50 Transformers Multi-View Learning Attention Mechanisms Action Recognition

F1Tenth: Autonomous Valet Parking — Semester Project

Autonomous valet parking on a physical F1Tenth car running ROS2 on a Jetson Nano with 2D LiDAR as the only sensor. Built a full autonomy stack covering SLAM, particle filter localization, LiDAR-based parking spot detection, RRT motion planning, and PID control, all tuned to run within the Jetson Nano's compute and memory budget.

ROS2 C++ Jetson Nano SLAM RRT PID Control Embedded Systems Robotics

Technical Stack

Domain	Technologies
Languages	Python, C++, C (Embedded), MATLAB/Simulink
AI / ML	Foundation Models, LLMs, VLMs, Diffusion Models, Transformers, Agentic AI, RAG, RL, Self-Supervised Learning
Computer Vision / 3D	3D Reconstruction, NeRFs, Human-Object Interaction, Multi-View Geometry, Detection, Segmentation, Tracking, Autonomous Driving
Optimization & Control	Differentiable Optimization, MPC, PID, SLAM, RRT, Particle Filter Localization
Autonomous Systems	ROS2, Jetson Nano, LiDAR, CARLA, F1Tenth, Embedded & Real-Time Systems
Frameworks	PyTorch, PyTorch3D, OpenCV, Hugging Face Transformers, LangChain, LangGraph, FastAPI, ChromaDB
Tools	Docker, Git, CUDA, Blender, Google Earth Engine, MCP

Education

M.Sc. Communications & Electronics Engineering — Technical University of Munich | Oct 2023 – Jul 2026 | Grade: 1.3

Deep Learning for 3D Perception, Deep Learning for Computer Vision, Machine Learning for 3D Geometry, Human Activity Understanding, CV III (Detection, Segmentation, Tracking), Fundamentals of Foundation Models, F1Tenth Autonomous Driving Hands-On

B.E. Electrical Engineering — National University of Sciences and Technology, Pakistan | Sep 2016 – Jul 2020 | Grade: 1.4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Muhammad Umer Abbasi mumerabbasi

Achievements