Skip to content
View psychopathdev's full-sized avatar
🏠
Working from home
🏠
Working from home
  • hot cool
  • 02:13 (UTC -01:00)

Block or report psychopathdev

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
psychopathdev/README.md

Peng Fei

Hi, I’m Peng Fei, a graduate student at Shenzhen University working around multimodal large models, speech large models, and VLA-style systems.

I use this GitHub account as a public notebook for small, runnable research-engineering projects: clean data schemas, transparent evaluation scripts, and prototypes that help me understand model behavior before scaling things up.

Research interests

  • Multimodal large models — image-text/audio-text understanding, instruction following, and evaluation design
  • Speech large models — ASR-oriented workflows, spoken dialogue evaluation, and audio instruction following
  • Vision-Language-Action systems — action schemas, grounding, simulated evaluation, and robotics-oriented interfaces
  • Reproducible ML tooling — small benchmarks, dataset cards, CLI-first experiments, and readable reports

Current projects

Project What it explores
audio-scene-caption-lab A small sandbox for audio, speech, and visual scene captioning workflows with lightweight metrics and report generation.
vla-action-grounding-playground A toy environment for instruction-to-action grounding, action schema design, and VLA-style evaluation traces.

Toolkit

Python · PyTorch · Transformers · NumPy · Jupyter · Linux · Git · LaTeX

I am especially interested in projects that are easy to run, easy to inspect, and honest about their limitations. A good experiment should leave a readable trail.

Currently learning

  • How to evaluate multimodal reasoning beyond single-number accuracy
  • Speech and audio benchmarks that expose real failure modes
  • Action representations for VLA agents in simulated tasks
  • Better experiment organization for small research teams

Notes

Most repositories here are learning-oriented prototypes rather than production systems. I try to keep the README clear about what each project can and cannot do.

Pinned Loading

  1. audio-scene-caption-lab audio-scene-caption-lab Public

    Python 2 206

  2. parley parley Public

    Benchmark toolkit for spoken-instruction Vision-Language-Action (VLA) pipelines — measures how speech-side perturbations propagate to robot-task success.

    Python 2

  3. vla-action-grounding-playground vla-action-grounding-playground Public

    Python 2

  4. navi118/codex-desktop-doctor-skill navi118/codex-desktop-doctor-skill Public

    Codex Skill for diagnosing Chrome and Computer Use failures in Codex Desktop on Windows.

    PowerShell 35