Get your first dataset ready in about 10 minutes! This guide walks you through installing FineFoundry and collecting your first batch of training data.
- A computer running Windows, macOS, or Linux
- Python 3.10 or newer — Download Python here if you don't have it
- An internet connection for downloading and collecting data
Optional (for sharing your work online):
- A free Hugging Face account
Open your terminal (Command Prompt on Windows, Terminal on Mac/Linux) and run these commands:
git clone https://github.com/SourceBox-LLC/FineFoundry.git FineFoundry-Core
cd FineFoundry-Core
pip install uvThen start the app:
On Mac/Linux:
chmod +x run_finefoundry.sh
./run_finefoundry.shOn Windows:
uv run src/main.pyIf the above doesn't work, try this instead:
git clone https://github.com/SourceBox-LLC/FineFoundry.git FineFoundry-Core
cd FineFoundry-Core
python -m venv venvActivate your virtual environment:
- Mac/Linux:
source venv/bin/activate - Windows:
.\venv\Scripts\Activate.ps1
Then install and run:
pip install -e .
python src/main.pyWhen FineFoundry opens, you'll see a window with tabs across the top:
- Data Sources — Where you collect data from websites or documents
- Publish — Prepare and share your datasets
- Training — Teach AI models with your data
- Inference — Test your trained models by chatting with them
- Merge Datasets — Combine multiple data collections
- Analysis — Check your data quality
- Settings — Set up accounts and preferences
Let's grab some data to work with:
- Click the "Data Sources" tab
- Choose a source — For this example, select "4chan"
- Pick some boards — Click a few board chips like
b,pol, orx - Set your limits:
- Max Threads:
50 - Max Pairs:
500 - Delay:
0.5 - Min Length:
10
- Max Threads:
- Click "Start"
Watch the progress bar and logs as data flows in. This usually takes 1-3 minutes.
When the collection finishes:
- Click "Preview Dataset"
- You'll see a two-column view showing conversation pairs
Each row shows an "input" (like a question or prompt) and an "output" (the response). This is what the AI will learn from!
Your data is automatically saved, so you won't lose it if you close the app.
You've just collected your first dataset! Here's what you can do now:
Go to the Training Tab Guide to learn how to teach an AI using your data.
Go to the Publish Tab to upload it to Hugging Face.
- Try different sources (Reddit, Stack Exchange)
- Collect from multiple boards
- Generate synthetic data from your own documents
Use the Merge Datasets Tab to mix data from different collections.
- App won't start? Make sure Python 3.10+ is installed
- No data collected? Check your internet connection and try different boards
- Other issues? See the Troubleshooting Guide
Still stuck? Ask for help in GitHub Discussions.