psychopath psychopathdev

Peng Fei

Hi, I’m Peng Fei, a graduate student at Shenzhen University working around multimodal large models, speech large models, and VLA-style systems.

I use this GitHub account as a public notebook for small, runnable research-engineering projects: clean data schemas, transparent evaluation scripts, and prototypes that help me understand model behavior before scaling things up.

Research interests

Multimodal large models — image-text/audio-text understanding, instruction following, and evaluation design
Speech large models — ASR-oriented workflows, spoken dialogue evaluation, and audio instruction following
Vision-Language-Action systems — action schemas, grounding, simulated evaluation, and robotics-oriented interfaces
Reproducible ML tooling — small benchmarks, dataset cards, CLI-first experiments, and readable reports

Current projects

Project	What it explores
`audio-scene-caption-lab`	A small sandbox for audio, speech, and visual scene captioning workflows with lightweight metrics and report generation.
`vla-action-grounding-playground`	A toy environment for instruction-to-action grounding, action schema design, and VLA-style evaluation traces.

Toolkit

Python · PyTorch · Transformers · NumPy · Jupyter · Linux · Git · LaTeX

I am especially interested in projects that are easy to run, easy to inspect, and honest about their limitations. A good experiment should leave a readable trail.

Currently learning

How to evaluate multimodal reasoning beyond single-number accuracy
Speech and audio benchmarks that expose real failure modes
Action representations for VLA agents in simulated tasks
Better experiment organization for small research teams

Notes

Most repositories here are learning-oriented prototypes rather than production systems. I try to keep the README clear about what each project can and cannot do.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

psychopath psychopathdev

Highlights

Block or report psychopathdev