Understanding the FineFoundry codebase organization.
FineFoundry-Core/
├── docs/ # Documentation
│ ├── user-guide/ # User-facing documentation
│ ├── development/ # Developer documentation
│ ├── api/ # API reference
│ └── deployment/ # Deployment guides
├── src/ # Source code
│ ├── main.py # Main application entry point
│ ├── save_dataset.py # Dataset builder CLI
│ ├── synthetic_cli.py # Synthetic data generation CLI
│ ├── db/ # Database module (sole storage)
│ │ ├── __init__.py # Public exports
│ │ ├── core.py # Connection management, schema
│ │ ├── settings.py # Settings CRUD
│ │ ├── training_configs.py # Training config CRUD
│ │ ├── scraped_data.py # Scrape sessions and pairs
│ │ ├── training_runs.py # Training runs CRUD
│ │ └── logs.py # Database logging handler
│ ├── helpers/ # Helper modules
│ │ ├── common.py # Common utilities
│ │ ├── logging_config.py # Database-backed logging
│ │ ├── settings.py # Settings helper (database)
│ │ ├── settings_ollama.py# Ollama settings (database)
│ │ ├── training_config.py# Training config helper (database)
│ │ ├── scrape_db.py # Scrape data helper (database)
│ │ ├── merge.py # Dataset merging logic
│ │ ├── scrape.py # Scraping helpers
│ │ ├── synthetic.py # Synthetic data generation
│ │ ├── training.py # Training helpers
│ │ ├── training_pod.py # Runpod training helpers
│ │ ├── local_inference.py# Local inference helpers
│ │ ├── build.py # Dataset building helpers
│ │ ├── boards.py # Board listing
│ │ ├── datasets.py # Dataset utilities
│ │ ├── theme.py # UI theme and styling
│ │ ├── ui.py # UI component helpers
│ │ └── proxy.py # Proxy configuration
│ ├── scrapers/ # Data collection modules
│ │ ├── fourchan_scraper.py
│ │ ├── reddit_scraper.py
│ │ └── stackexchange_scraper.py
│ ├── ui/ # UI components
│ │ └── tabs/ # Tab-specific UI (layouts + controllers)
│ │ ├── tab_*.py # Tab layout builders
│ │ ├── *_controller.py # Tab controllers
│ │ └── */sections/ # Tab section modules
│ └── runpod/ # Runpod integration
│ ├── runpod_pod.py # Pod management
│ └── ensure_infra.py # Infrastructure setup
├── tests/ # Test suite
│ ├── unit/ # Unit tests
│ └── integration/ # Integration tests
├── training_outputs/ # Model artifacts (checkpoints, adapters)
├── finefoundry.db # SQLite database (auto-created)
├── img/ # Images and assets
├── pyproject.toml # Project configuration
├── uv.lock # UV lock file
└── README.md # Main README
The main entry point that:
- Initializes the Flet desktop application
- Sets up the global app shell (AppBar, welcome view, shared settings/proxy/HF/Runpod/Ollama controls)
- Delegates tab wiring to dedicated controllers in
ui/tabs/*_controller.py - Coordinates between controllers and shared helper modules
Key responsibilities (after controller refactor):
- Imports and logger setup
- Global helpers (user guide dialog, keyboard shortcuts, settings I/O)
- Building each tab by calling
build_*_tab_with_logic(...)from the appropriate controller - Adding all tabs to the Flet
Tabscontrol and bootstrapping the app
Business logic separated from UI:
safe_update()- Safe UI update wrapperset_terminal_title()- Terminal title management- Utility functions used across modules
- Database-backed logging via
DatabaseHandler - Console output for real-time monitoring
- Debug mode support via
FINEFOUNDRY_DEBUGenv var - See Logging Guide
run_merge()- Main merge operationpreview_merged()- Dataset preview- Dataset loading and column mapping
- Interleave and concatenate operations
- Database session and HF dataset handling
- Merged results saved to database with optional JSON export
run_reddit_scrape()- Reddit scraping wrapperrun_real_scrape()- 4chan scraping wrapperrun_stackexchange_scrape()- Stack Exchange wrapper- UI update helpers for scraping progress
run_synthetic_generation()- Synthetic data generation using Unsloth's SyntheticDataKit- Document ingestion (PDF, DOCX, TXT, HTML, URLs)
- Q&A, chain-of-thought, and summary generation
- Local LLM serving via vLLM
- Database integration for generated pairs
run_local_training()- Native local training via Python subprocessstop_local_training()- Stop local training subprocessbuild_hp_from_controls()- Hyperparameter extraction shared between Runpod and local
list_saved_configs()- List configs from databasesave_config()/read_json_file()/delete_config()- Config CRUD operationsrename_config()- Rename existing configsget_last_used_config_name()/set_last_used_config_name()- Track last-used config for auto-load on startup- All configs stored in SQLite database (no filesystem fallback)
- Helpers for loading a base model + LoRA adapter locally (used by Quick Local Inference)
- Caches loaded models to speed up repeated generations
- Inference tab helpers for global inference over fine-tuned adapters
run_pod_training()- Runpod training orchestrationrestart_pod_container()- Pod restartopen_runpod()/open_web_terminal()- Web interfacescopy_ssh_command()- SSH access helperensure_infrastructure()- Volume and template setup- Teardown operations
run_build()- Dataset building from database sessions or HF datasetsrun_push_async()- Async Hub push- Split creation and validation
- Dataset card generation
guess_input_output_columns()- Column detection- Dataset schema utilities
- Color definitions
- Icon mappings
- Styling constants
- Accent colors and borders
pill()- Chip/pill componentssection_title()- Section headersmake_wrap()- Wrap containersmake_selectable_pill()- Selectable chipstwo_col_row()/two_col_header()- Two-column layoutscompute_two_col_flex()- Column width calculations
apply_proxy_from_env()- Environment-based proxy setup- Proxy configuration helpers for all scrapers
Data collection modules:
scrape()- Main scraping functionfetch_catalog_pages()- Catalog fetchingfetch_thread()- Thread fetchingbuild_pairs_normal()- Adjacent pairingbuild_pairs_contextual()- Context-aware pairing- Text cleaning and normalization
- Quote chain and cumulative strategies
- CLI entry point
crawl()- Subreddit/post crawlingexpand_more_comments()- Comment expansionbuild_pairs_parent_child()- Parent-child pairingbuild_pairs_contextual()- Context-based pairing- Comment tree traversal
scrape()- Q&A pair scraping- Stack Exchange API integration
- Backoff handling
- HTML cleaning
Organized by tab and section for modularity:
tab_scrape.py- Composes Data Sources tab sectionstab_build.py- Composes publish sectionstab_training.py- Composes training sectionstab_inference.py- Composes the Inference tab (global inference over fine-tuned adapters)tab_merge.py- Composes merge sectionstab_analysis.py- Composes analysis sectionstab_settings.py- Composes settings sections
Each tab builder:
- Imports section builders
- Receives controls from a tab controller (for example,
ui/tabs/scrape_controller.py,ui/tabs/build_controller.py,ui/tabs/merge_controller.py,ui/tabs/analysis_controller.py,ui/tabs/training_controller.py, orui/tabs/inference_controller.py) - Calls section builders
- Returns the composed layout
Tab controllers own behavior and state for each tab and then delegate layout to the corresponding builder:
ui/tabs/scrape_controller.py→tab_scrape.pyui/tabs/build_controller.py→tab_build.pyui/tabs/merge_controller.py→tab_merge.pyui/tabs/analysis_controller.py→tab_analysis.pyui/tabs/training_controller.py→tab_training.pyui/tabs/inference_controller.py→tab_inference.py
src/main.py now calls the exported build_*_tab_with_logic(...) functions from these controllers instead of creating tab controls and handlers inline.
Example: src/ui/tabs/scrape/sections/
source_section.py- Data source selectorboards_section.py- Board selectionparams_section.py- Parametersprogress_section.py- Progress indicatorslog_section.py- Log outputpreview_section.py- Dataset preview
Benefits:
- Clear separation of concerns
- Easy to test individual sections
- Maintainable codebase
- Reusable components
create_pod()- Pod creationget_pod()- Pod statusstop_pod()- Pod terminationlist_pods()- List user podspod_logs()- Stream logs- API wrapper functions
ensure_network_volume()- Volume creation/reuseensure_pod_template()- Template creation/update- Infrastructure validation
- API key management
User (UI) → Data Sources tab controller (`ui/tabs/scrape_controller.py`)
↓
helpers/scrape.py wrapper
↓
scrapers/{source}_scraper.py
↓
Database session created (db/scraped_data.py)
↓
UI updated with progress
Database session or HF dataset → helpers/build.py
↓
Dataset validation
↓
Train/val/test splits
↓
HF DatasetDict
↓
Save to disk / Push to Hub
Multiple sources (DB sessions / HF) → helpers/merge.py
↓
Column mapping
↓
Dataset loading
↓
Concatenate/Interleave
↓
Save to database (+ optional JSON export)
↓
Preview generation
Configuration (UI) → helpers/training.py or helpers/training_pod.py
↓
Runpod pod creation OR Native local subprocess
↓
Unsloth trainer execution
↓
Checkpoints to /data (Runpod) or local output dir
↓
Optional Hub push
FineFoundry uses simple state management:
- UI State: Managed by Flet controls and their values
- Operation State: Dictionaries passed to async functions
cancel_state- Cancellation flagsmerge_cancel- Merge cancellation
- Configuration: Stored in UI controls, read when needed
- Settings: Persisted in the SQLite database (
finefoundry.db)
No complex state management framework needed due to:
- Single-user desktop app
- Synchronous UI updates
- Clear operation boundaries
- Modules:
snake_case.py - Classes:
PascalCase - Functions:
snake_case() - Constants:
UPPER_SNAKE_CASE - UI Builders:
build_{name}_{section}()
# Standard library
import asyncio
import json
from typing import List
# Third-party
import flet as ft
# Local
from helpers.logging_config import get_logger
from helpers.common import safe_updateOrder: standard library → third-party → local
- Create
src/ui/tabs/tab_newtab.py - Create sections in
src/ui/tabs/newtab/sections/ - Add business logic to
src/helpers/if needed - Import and integrate in
src/main.py - Add documentation in
docs/user-guide/
- Create
src/scrapers/newsource_scraper.py - Implement
scrape()function - Add wrapper in
src/helpers/scrape.py - Add UI in Data Sources tab sections
- Document in API reference
from helpers.logging_config import get_logger
logger = get_logger(__name__)
# In your functions
logger.info("Operation started")
logger.debug("Detailed info")
logger.error("Error occurred", exc_info=True)See Logging Guide for details.
tests/ # Test directory (collected by pytest)
├── unit/ # Unit tests for helpers, save_dataset, etc.
├── integration/ # Integration tests (end-to-end flows and UI/controller smoke tests)
└── fixtures/ # Optional shared test data
See Testing Guide for details on test types, commands, and coverage.
pyproject.toml- Project configuration and dependenciesuv.lock- UV package lock file.env(not in repo) - Optional local environment variables (for example,HF_TOKEN)- Settings persisted in the SQLite database (
finefoundry.db)
finefoundry.db- SQLite database (git ignored)training_outputs/- Model artifacts (git ignored)hf_dataset/- Built datasets (git ignored)__pycache__/- Python cache (git ignored).venv/- Virtual environment (git ignored)
Back to: Development Documentation | Documentation Index