A Google Teachable Machine Companion App
DataForge is a powerful, browser-based tool designed to streamline the creation of robust image datasets for machine learning models, specifically tailored for Google Teachable Machine.
It allows users to mass-extract frames from videos, apply ML-focused data augmentations, and generate synthetic edge-case data using Google's Gemini AI.
- 🎥 Mass Video Processing: Drag and drop multiple
.mp4or.movfiles at once. Extract frames automatically at your chosen Frames Per Second (FPS). - 🌗 ML-Focused Augmentations: Prevent model overfitting by applying automated filters to your frames:
- Grayscale: Forces the model to learn shapes and textures rather than memorizing colors.
- Blur: Simulates out-of-focus webcams or motion blur.
- Brightness & Contrast: Simulates diverse real-world lighting conditions (overexposure, shadows, harsh lighting, foggy environments).
- 🤖 Synthetic Data Generation: Integrated with
gemini-3.1-flash-image-previewto generate synthetic training images via text prompts. Perfect for covering rare or hard-to-capture edge cases. - 📦 One-Click Export: Download your entire augmented dataset as a perfectly formatted ZIP file, ready to drop directly into Teachable Machine.
- 🛡️ Resilient Batching: Built-in error handling skips corrupted files and actively manages browser hardware decoders to prevent freezing during massive batch jobs.
- Node.js (v18 or higher)
- A Gemini API Key (for synthetic image generation)
-
Clone the repository:
git clone https://github.com/yourusername/dataforge.git cd dataforge -
Install dependencies:
npm install
-
Set up your environment variables: Create a
.envfile in the root directory and add your Gemini API key:GEMINI_API_KEY=your_api_key_here
-
Start the development server:
npm run dev
- Frontend: React 19, TypeScript, Vite
- Styling: Tailwind CSS, Lucide React (Icons)
- AI Integration:
@google/genai(Gemini 3.1 Flash Image) - File Processing:
jszip,file-saver
When training models in Teachable Machine, users often record a single video from their webcam. This leads to models that overfit to the specific lighting, background, and camera quality of that single recording. DataForge solves this by taking your base videos and automatically multiplying your dataset with diverse, augmented variations, ensuring your final model is robust and ready for the real world.