Skip to content
View mumerabbasi's full-sized avatar

Highlights

  • Pro

Block or report mumerabbasi

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
mumerabbasi/README.md

Muhammad Umer Abbasi

Computer Vision & 3D Perception Engineer

M.Sc. in Communications & Electronics Engineering at TU Munich (1.3 final grade). Master's thesis at the 3D AI Lab on zero-shot 4D human-object interaction synthesis from text. Teaching Assistant for TUM's Computer Vision III (Detection, Segmentation, Tracking).

I build systems that see, reconstruct, and reason about the 3D world. My project work spans autonomous pursuit vehicles in CARLA, NeRF-based scene reconstruction, real-time safety monitoring, and agentic AI for wildfire response. I'm looking for roles in computer vision, 3D perception, and autonomous systems.

LinkedIn Email

PyTorch PyTorch3D Python OpenCV C++ ROS2 Docker CUDA Blender Linux

3D Reconstruction NeRFs 3D Perception Autonomous Driving Detection, Segmentation & Tracking Foundation Models Diffusion Models Transformers Reinforcement Learning


Professional Experience

Working Student — Algorithm & Software Developer | Innoligent Technologies, Munich | Sep 2024 – Present

  • Researching and training StockFormer, a Transformer + Soft Actor-Critic RL framework for algorithmic portfolio optimization
  • Built a real-time workplace safety monitoring desktop app with pose estimation, predictive risk forecasting, and interactive zone editing
  • Maintained an in-house video annotation tool for human actions with one-click YOLO-assisted labeling

Research Intern | TU Munich, Chair of Media Technology | Apr 2025 – Jun 2025

  • Designed and evaluated four fusion strategies injecting object-level YOLOv8 + CLIP priors into Mask2Former for improved surface material segmentation on the Apple Dense Material Segmentation dataset

Teaching Assistant | TU Munich | Oct 2025 – Mar 2026

  • TA for IN2375: Computer Vision III (Detection, Segmentation, and Tracking). Held programming exercise sessions, office hours, and co-graded final exam for 100+ students

Freelance ML Engineer | Top Rated Plus on Upwork (top 3%) | Dec 2021 – Sep 2023

  • Delivered ML and CV solutions to global clients, contributed to RLHF evaluation for ChatGPT, Codex, and Claude, and built classification pipelines with BERT, LSTM, and XGBoost ensembles

Assistant Manager (Technical) | Avant Labs, Islamabad | Jul 2020 – Nov 2022

  • Led hardware design and development for custom test and measurement products. Owned the full cycle from customer requirements through schematics, PCB review, Embedded C/C++ firmware, Verilog signal processing, and hardware bring-up to manufacturing handover

Featured Projects

3D AI Lab, TU Munich

Generates dynamic 3D human-object interactions from text prompts, with no paired motion-capture data needed. LLMs parse prompts into Part Affordance Graphs that encode contact semantics, and video diffusion models serve as a physics prior. Object meshes come from SAM-3D, human motion from GVHMR, and 3D object trajectories from CoTracker-based rigid-transform optimization, all anchored to a consistent scale using Depth-Anything-3 metric depth. Object motion is then refined against fixed human anchors using PAG-derived contact constraints so hands actually meet handles, feet stay on surfaces, and the interaction looks physically plausible.

PyTorch PyTorch3D Blender Foundation Models LLMs/VLMs Differentiable Optimization 3D Reconstruction


Computer Vision Group, TU Munich

An ego vehicle in CARLA locks onto a user-designated target car through traffic and follows it, even when visually similar distractors are nearby. The target is selected with a bounding-box prompt in the first frame, then tracked online with SAM-3 mask propagation. A ConvNeXt-Base pose regressor estimates relative translation and yaw from the masked RGB input, and an MPC controller closes the loop in real time. The pose regressor is trained on a custom large-scale multi-town CARLA dataset where per-target ground-truth poses are generated by 3D CenterPoint detections on LiDAR.

PyTorch CARLA SAM-3 ConvNeXt CenterPoint MMDetection3D LiDAR MPC Autonomous Driving


Improved SceneRF (ICCV 2023) by replacing sinusoidal positional encoding with Random Fourier Features and introducing coarse-to-fine hierarchical volume sampling via inverse-CDF density resampling.

Task Improvement
Novel Depth Synthesis +16%
3D Scene Reconstruction +7%
Novel View Synthesis +2%

Extended the full pipeline to support the TUM RGB-D dataset.

PyTorch NeRF Volume Rendering 3D Reconstruction EfficientNet Docker


An agentic RAG platform for wildfire incident response. A LangGraph ReAct agent autonomously executes up to 15 tool-use steps per query, pulling from a three-tier ChromaDB knowledge base with BGE-M3 embeddings and cross-encoder reranking, computing fire danger scores from 4 Google Earth Engine satellite datasets, and generating terrain-aware drone waypoints. The whole stack is served via Docker Compose.

LangGraph FastAPI ChromaDB MCP Google Earth Engine Ollama Docker Compose Agentic AI RAG


Recognizes furniture-assembly actions from 8 synchronized Intel RealSense D435i cameras. The model extracts per-view spatial features with ResNet-50, fuses them across cameras using multi-head attention to handle occlusion, and models temporal dynamics with a Transformer encoder. Achieves 88.4% frame-level accuracy and ~0.82 macro-F1 on the held-out test split.

PyTorch ResNet-50 Transformers Multi-View Learning Attention Mechanisms Action Recognition


F1Tenth: Autonomous Valet Parking — Semester Project

Autonomous valet parking on a physical F1Tenth car running ROS2 on a Jetson Nano with 2D LiDAR as the only sensor. Built a full autonomy stack covering SLAM, particle filter localization, LiDAR-based parking spot detection, RRT motion planning, and PID control, all tuned to run within the Jetson Nano's compute and memory budget.

ROS2 C++ Jetson Nano SLAM RRT PID Control Embedded Systems Robotics


Technical Stack

Domain Technologies
Languages Python, C++, C (Embedded), MATLAB/Simulink
AI / ML Foundation Models, LLMs, VLMs, Diffusion Models, Transformers, Agentic AI, RAG, RL, Self-Supervised Learning
Computer Vision / 3D 3D Reconstruction, NeRFs, Human-Object Interaction, Multi-View Geometry, Detection, Segmentation, Tracking, Autonomous Driving
Optimization & Control Differentiable Optimization, MPC, PID, SLAM, RRT, Particle Filter Localization
Autonomous Systems ROS2, Jetson Nano, LiDAR, CARLA, F1Tenth, Embedded & Real-Time Systems
Frameworks PyTorch, PyTorch3D, OpenCV, Hugging Face Transformers, LangChain, LangGraph, FastAPI, ChromaDB
Tools Docker, Git, CUDA, Blender, Google Earth Engine, MCP

Education

M.Sc. Communications & Electronics Engineering — Technical University of Munich | Oct 2023 – Jul 2026 | Grade: 1.3

Deep Learning for 3D Perception, Deep Learning for Computer Vision, Machine Learning for 3D Geometry, Human Activity Understanding, CV III (Detection, Segmentation, Tracking), Fundamentals of Foundation Models, F1Tenth Autonomous Driving Hands-On

B.E. Electrical Engineering — National University of Sciences and Technology, Pakistan | Sep 2016 – Jul 2020 | Grade: 1.4


GitHub Activity

Popular repositories Loading

  1. HumanActionRecognition HumanActionRecognition Public

    exploring different architectures for multiview human action recognition

    Jupyter Notebook 1

  2. FireGPT FireGPT Public

    Python 1

  3. SceneRF SceneRF Public

    Forked from astra-vision/SceneRF

    [ICCV 2023] Official implementation of "SceneRF: Self-Supervised Monocular 3D Scene Reconstruction with Radiance Fields"

    Python

  4. mumerabbasi mumerabbasi Public

  5. cpdf cpdf Public

    Simple PDF parser written in C++

    C++

  6. 4DHOI 4DHOI Public

    Python