Skip to content

pgil256/tab_vision

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TabVision

Automatic guitar tab transcription from video.

TabVision analyzes video recordings of guitar playing and generates tablature by fusing audio pitch detection with visual finger tracking. The multi-modal approach resolves a limitation audio-only tools share: the same pitch can live at several positions on the fretboard, and only the video knows which one you actually played.

Python Flask Electron React MediaPipe Basic Pitch

How it works

Electron Desktop App (React)              Flask Backend (cloud)
├── Upload or webcam capture              ├── POST /jobs           (upload video)
├── Interactive tab editor                ├── GET  /jobs/:id       (poll status)
└── Export (text / PDF)                   └── GET  /jobs/:id/result
            │                                        │
            └───────── HTTPS ────────────────────────┘
                                                     │
                      Processing pipeline ───────────┘
                      ├── Audio:  ffmpeg → Basic Pitch → MIDI → pitch events
                      ├── Video:  MediaPipe Hands → fretboard geometry → fret/string
                      └── Fusion: align signals → TabDocument + confidence scores

Output: an interactive tab editor synced to video playback with per-note confidence highlighting (green / yellow / red). Users can correct notes inline and export to plain-text tab (Ultimate Guitar format) or PDF.

Tech stack

Layer Tech
Desktop app Electron 28, React 18, Zustand, Tailwind CSS
Backend Python 3.11, Flask, async job queue
Audio Basic Pitch (Spotify's polyphonic pitch detector), ffmpeg
Video MediaPipe Hands, OpenCV
Fusion Custom scoring — combines audio pitch candidates with hand-position evidence

Getting started

Backend (Flask)

cd tabvision-server
python -m venv venv
source venv/bin/activate   # Windows: venv\Scripts\activate
pip install -r requirements.txt

python run.py              # dev server on :5000
pytest tests/ -v           # run tests

Key dependencies: flask, basic-pitch, mediapipe, opencv-python, ffmpeg-python, numpy.

Frontend (Electron + React)

cd tabvision-client
npm install
npm run dev                # hot-reload development
npm run build              # production build

To package a distributable:

npm install -g electron-builder
npm run dist

Core data model

TabDocument (frontend): an array of TabNote objects —

interface TabNote {
  timestamp: number;            // seconds into the video
  string: 1 | 2 | 3 | 4 | 5 | 6;
  fret: number | "X";
  confidence: number;           // 0.0 – 1.0
  confidenceLevel: "high" | "medium" | "low";
}

Job (backend): tracks processing state —

status: "pending" | "processing" | "completed" | "failed"
progress: float                  # 0.0 – 1.0
current_stage: "uploading" | "extracting_audio" | "analyzing_audio" |
               "analyzing_video" | "fusing" | "complete"

Assumptions & constraints

  • Standard tuning (EADGBE) only
  • Video ≤ ~5 minutes per job
  • Guitar neck must be visible and roughly centered, horizontal orientation
  • Webcam capture works but file upload (MP4 / MOV) gives better results

Status

Active development. Full project specification lives in tabvision_specification.md.

About

Guitar tab transcription from video — Electron + Flask, fuses Basic Pitch audio with MediaPipe finger tracking.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors