DataForge 🛠️

A Google Teachable Machine Companion App

DataForge is a powerful, browser-based tool designed to streamline the creation of robust image datasets for machine learning models, specifically tailored for Google Teachable Machine.

It allows users to mass-extract frames from videos, apply ML-focused data augmentations, and generate synthetic edge-case data using Google's Gemini AI.

✨ Features

🎥 Mass Video Processing: Drag and drop multiple .mp4 or .mov files at once. Extract frames automatically at your chosen Frames Per Second (FPS).
🌗 ML-Focused Augmentations: Prevent model overfitting by applying automated filters to your frames:
- Grayscale: Forces the model to learn shapes and textures rather than memorizing colors.
- Blur: Simulates out-of-focus webcams or motion blur.
- Brightness & Contrast: Simulates diverse real-world lighting conditions (overexposure, shadows, harsh lighting, foggy environments).
🤖 Synthetic Data Generation: Integrated with gemini-3.1-flash-image-preview to generate synthetic training images via text prompts. Perfect for covering rare or hard-to-capture edge cases.
📦 One-Click Export: Download your entire augmented dataset as a perfectly formatted ZIP file, ready to drop directly into Teachable Machine.
🛡️ Resilient Batching: Built-in error handling skips corrupted files and actively manages browser hardware decoders to prevent freezing during massive batch jobs.

🚀 Getting Started

Prerequisites

Node.js (v18 or higher)
A Gemini API Key (for synthetic image generation)

Installation

Clone the repository:

git clone https://github.com/yourusername/dataforge.git
cd dataforge

Install dependencies:
```
npm install
```
Set up your environment variables: Create a .env file in the root directory and add your Gemini API key:
```
GEMINI_API_KEY=your_api_key_here
```
Start the development server:
```
npm run dev
```

🛠️ Tech Stack

Frontend: React 19, TypeScript, Vite
Styling: Tailwind CSS, Lucide React (Icons)
AI Integration: @google/genai (Gemini 3.1 Flash Image)
File Processing: jszip, file-saver

💡 Why DataForge?

When training models in Teachable Machine, users often record a single video from their webcam. This leads to models that overfit to the specific lighting, background, and camera quality of that single recording. DataForge solves this by taking your base videos and automatically multiplying your dataset with diverse, augmented variations, ensuring your final model is robust and ready for the real world.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
index.html		index.html
metadata.json		metadata.json
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataForge 🛠️

✨ Features

🚀 Getting Started

Prerequisites

Installation

🛠️ Tech Stack

💡 Why DataForge?

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DataForge 🛠️

✨ Features

🚀 Getting Started

Prerequisites

Installation

🛠️ Tech Stack

💡 Why DataForge?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages