Skip to content

DeonHolo/DataForge

Repository files navigation

DataForge 🛠️

A Google Teachable Machine Companion App

DataForge is a powerful, browser-based tool designed to streamline the creation of robust image datasets for machine learning models, specifically tailored for Google Teachable Machine.

It allows users to mass-extract frames from videos, apply ML-focused data augmentations, and generate synthetic edge-case data using Google's Gemini AI.

✨ Features

  • 🎥 Mass Video Processing: Drag and drop multiple .mp4 or .mov files at once. Extract frames automatically at your chosen Frames Per Second (FPS).
  • 🌗 ML-Focused Augmentations: Prevent model overfitting by applying automated filters to your frames:
    • Grayscale: Forces the model to learn shapes and textures rather than memorizing colors.
    • Blur: Simulates out-of-focus webcams or motion blur.
    • Brightness & Contrast: Simulates diverse real-world lighting conditions (overexposure, shadows, harsh lighting, foggy environments).
  • 🤖 Synthetic Data Generation: Integrated with gemini-3.1-flash-image-preview to generate synthetic training images via text prompts. Perfect for covering rare or hard-to-capture edge cases.
  • 📦 One-Click Export: Download your entire augmented dataset as a perfectly formatted ZIP file, ready to drop directly into Teachable Machine.
  • 🛡️ Resilient Batching: Built-in error handling skips corrupted files and actively manages browser hardware decoders to prevent freezing during massive batch jobs.

🚀 Getting Started

Prerequisites

  • Node.js (v18 or higher)
  • A Gemini API Key (for synthetic image generation)

Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/dataforge.git
    cd dataforge
  2. Install dependencies:

    npm install
  3. Set up your environment variables: Create a .env file in the root directory and add your Gemini API key:

    GEMINI_API_KEY=your_api_key_here
  4. Start the development server:

    npm run dev

🛠️ Tech Stack

  • Frontend: React 19, TypeScript, Vite
  • Styling: Tailwind CSS, Lucide React (Icons)
  • AI Integration: @google/genai (Gemini 3.1 Flash Image)
  • File Processing: jszip, file-saver

💡 Why DataForge?

When training models in Teachable Machine, users often record a single video from their webcam. This leads to models that overfit to the specific lighting, background, and camera quality of that single recording. DataForge solves this by taking your base videos and automatically multiplying your dataset with diverse, augmented variations, ensuring your final model is robust and ready for the real world.

About

Google Teachable Machine video frame extractor and image filter tool.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors