An AI-powered, multi-pass course generation engine that discovers, curates, and sequences real-world educational resources into personalized, dependency-aware learning paths.
CourseForge does not generate synthetic content. Instead, it acts as an intelligent curriculum architect — it searches the open web for existing high-quality educational material (YouTube tutorials, documentation, academic papers, Wikipedia, university syllabi), extracts the conceptual structure of a topic, builds a pedagogically-grounded curriculum DAG (Directed Acyclic Graph), and assigns the best-fit resource to each node using vector similarity search and inference model evaluation.
The result is a complete, structured course where every node has a real learning resource, every prerequisite is mapped, and the entire curriculum is coherent — generated end-to-end in a single pipeline run.
- Architecture Overview
- The Generation Pipeline
- Context Persistence Across Pipeline Stages
- Multi-Model Architecture (BYOM)
- Data Structures & Models
- Key Services & Components
- Technology Stack
- Project Structure
- Getting Started
CourseForge is a full-stack application with a React + Vite frontend and a Node.js / Express backend, backed by MongoDB (with Atlas Vector Search for semantic retrieval). The backend orchestrates a complex, multi-pass generation pipeline that coordinates multiple inference models, search APIs, reranking classifiers, and embedding models to produce a complete course from a single topic string.
The pipeline is designed around three core principles:
-
Ground everything in discovered evidence. The inference model never invents the topic structure from parametric knowledge alone. Every concept in the curriculum must trace back to real educational content found during discovery. Concepts are assigned confidence scores based on how many independent source types confirm them.
-
Separate concerns across specialized models. Different pipeline stages have fundamentally different computational profiles. Chunk classification needs speed and volume (thousands of binary yes/no calls). Concept extraction needs reasoning depth. Skeleton generation needs creative synthesis. CourseForge allows routing each stage to a different inference model, optimizing cost, latency, and quality simultaneously.
-
Preserve context across stateless inference calls. Each inference model call is stateless, but the pipeline is deeply stateful. CourseForge maintains and injects a running context — skeleton design reasoning, per-node design notes, a resource assignment log — into every downstream prompt so the model can act as a coherent reviewer of its own prior decisions.
┌─────────────────────────────────────────────────────────────────────┐
│ GENERATION PIPELINE │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────────────┐ │
│ │ Pass 1 │──▶│ Pass 2 │──▶│ Pass 3 │──▶│ Pass 3.5 │ │
│ │Discovery │ │ Skeleton │ │ RAG-Fill │ │ Dedup Checker │ │
│ └──────────┘ └──────────┘ └──────────┘ └────────────────┘ │
│ │ ▲ │ │ │
│ ▼ │ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │
│ │ Filter │ │ Skill │ │ Vector │ │ Pass 4 │ │
│ │ + Embed │ │ Level │ │ Search │ │ Coherence │ │
│ │ + Graph │ │ + Gate │ │ + Score │ │ Validation │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
Pass 1 is the foundation. It discovers, validates, extracts concepts from, and indexes all available educational material for the topic. It runs as a sequence of sub-steps (1A through 1G), each building on the previous.
Before any searches are launched, an inference model decomposes the user's topic into 5–7 distinct sub-domains or learning milestones. For a topic like "MERN Stack", this produces targeted queries like "MongoDB aggregation pipeline tutorial" and "JWT middleware Express.js best practices" rather than generic "MERN Stack full course" variations.
Why this matters: Generic search queries produce overlapping results that cover the same introductory ground. Sub-domain decomposition forces search diversity across the topic's breadth, ensuring the discovery phase finds resources for niche sub-topics that would otherwise be missed.
Four independent discovery streams launch simultaneously via Promise.allSettled:
| Stream | Source | What It Finds |
|---|---|---|
| YouTube | yt-search |
Tutorial videos, lecture series, crash courses |
| Educational Web | Tavily Search API | Documentation, guides, blog tutorials, reference material |
| Academic | arXiv + Semantic Scholar | Research papers, survey papers, formal treatments |
| Structured | Wikipedia + University Syllabi | Section headings (concept lists), curated curricula |
Each stream is fault-tolerant — a failed stream contributes zero documents but does not abort the pipeline. The Promise.allSettled pattern ensures a Tavily API outage never blocks YouTube discovery.
Why multi-source matters: Each source type has different strengths. YouTube provides accessible worked examples but suffers from simplification bias. Academic papers provide formal rigor but lack practical demonstrations. Wikipedia provides editorial consensus on what concepts exist. University syllabi provide expert-designed prerequisite orderings. No single source is sufficient; the intersection of all four produces a robust concept space.
All discovered documents pass through source-type-specific filters:
- YouTube: Content density pre-filter on description text, then caption fetching. Videos without captions or with captions shorter than 200 characters are discarded (captions are essential for concept extraction and chunk embedding).
- Web/Academic: Content density scoring using a ratio of explanatory language markers (
"because","for example","this means") vs. declarative markers ("announces","will launch","according to"). Documents scoring below the density threshold are removed. - Structured: Fast-passed — Wikipedia outlines and syllabi represent editorial consensus and are always accepted if they have content.
Each filtered document is processed to extract the concepts it teaches, their inferred dependencies, and any explicitly stated prerequisites.
Map-Reduce Strategy: Documents are split into boundary-aware semantic chunks (1200–4800 characters each, using topic-shift markers like "now let's talk about", "the next concept is", and markdown headers). Each chunk is sent to an inference model independently, and the results are merged per-document via a reduce phase that deduplicates concepts and unions dependency relationships.
Why chunk-level extraction: Sending an entire 30-minute video transcript to a model causes token truncation and dilutes signal. Boundary-aware chunking ensures each inference call receives a focused, semantically coherent unit — the same units that will later be embedded and stored for vector search.
AIMD Congestion Control: All extraction calls across all streams share a single global queue governed by an Additive Increase / Multiplicative Decrease (AIMD) algorithm — the same congestion control strategy used in TCP. On each successful call, the concurrency window grows by +0.5. On a rate-limit (HTTP 429) or local inference server capacity error (HTTP 400 from KV-cache exhaustion), the window is halved and the queue pauses for the provider's requested retry duration. This allows a high-throughput API key to quickly scale to 20+ concurrent calls while a free-tier key safely settles at 1–3 concurrent calls without ever receiving an abort.
All per-document concept maps are merged into a single, deduplicated concept graph. Each concept accumulates:
- Sources: Which source types mentioned it (YouTube, academic, web, structured)
- Confidence: A weighted average based on source reliability weights:
- Structured (0.90) — editorial consensus, expert curricula
- Academic (0.85) — peer-reviewed, formal terminology
- Educational Web (0.70) — practical coverage, variable quality
- YouTube (0.55) — accessible but simplification-prone
A concept confirmed by Wikipedia AND an arXiv paper AND a tutorial blog has much higher confidence than one mentioned only in a single YouTube video. This confidence score directly influences skeleton generation — high-confidence concepts are treated as confirmed curriculum material, while low-confidence concepts are flagged and included only if pedagogically necessary.
Observed vs. Inferred Prerequisites: The graph stores two types of dependency relationships. Inferred dependencies are the inference model's judgment about what concept A requires. Observed prerequisites are explicitly stated in source text — phrases like "assuming you already understand closures" or "prerequisite: linear algebra". Observed prerequisites are treated as high-confidence DAG edges during skeleton generation.
A coverage analysis determines the health of the discovery results: which source type dominates, the overall weighted confidence, whether structured/academic coverage exists, and coverage warnings (e.g., "No practical tutorial content found. Course may lack worked examples."). This profile is injected into downstream prompts so the inference model can calibrate its assumptions.
All filtered documents are chunked using boundary-aware semantic chunking, classified as pedagogical or non-pedagogical (using a three-layer strategy: regex heuristic → inference model binary classifier → fail-open fallback), and the pedagogical chunks are embedded using a vector embedding model (Voyage AI voyage-3-large, 1024 dimensions, asymmetric query/document modes) and stored in MongoDB as ContentChunk documents.
Pedagogical Classification: A three-layer classifier filters out non-educational content (channel intros, sponsor reads, call-to-action segments, outro filler) from the embedding store:
- Heuristic pre-check (sync, zero cost) — regex patterns for known non-pedagogical phrases
- Inference model classifier (async, API call) — binary yes/no pedagogical judgment
- Fail-open fallback — if the API call fails, the chunk is included rather than dropped
The merged concept graph, sub-domain list, source URLs, and coverage profile are persisted to the Course document. This is the knowledge base that all downstream passes operate on.
The pipeline includes two human-in-the-loop interaction gates where the user makes decisions that shape the remaining generation:
After Pass 1 completes, the inference model analyzes the concept graph and coverage profile to generate 2–5 skill level options calibrated to the actual discovered content. Each option specifies:
- A skill level (
novice,beginner,capable,intermediate,advanced) - A topic-specific label and description
- An
assumedKnowledgelist — concepts the learner already knows at this level - An estimated
skippedNodeCount
The user selects their level, and the pipeline resumes with a feasibility check — the inference model estimates how many weeks the topic requires at the selected skill level, split into core weeks and scaffolding weeks (prerequisite ramp-up). This produces the minimum and recommended duration constraints for the duration picker.
The user picks their target duration (in weeks). The system computes a timePressure signal (tight, comfortable, or generous) by comparing the chosen duration to the feasibility recommendation. This signal influences resource selection — tight budgets bias toward concise text articles; generous budgets allow deeper, longer resources.
These two gates transform the pipeline from a one-size-fits-all generator into a learner-calibrated engine. A novice studying "Machine Learning" for 12 weeks gets a fundamentally different curriculum than an intermediate learner studying it for 4 weeks.
For novice and beginner learners only, the pipeline runs a targeted prerequisite discovery pass before skeleton generation. It examines the concept graph's observedAssumes chains, asks the inference model which prerequisites are pedagogically critical for this learner profile, and runs focused YouTube + web searches for foundational content on those prerequisites. The discovered prerequisite material is filtered, chunked, embedded, and added to the vector store so that the skeleton and RAG-fill passes have material to draw from for scaffold nodes.
Why this matters: Without prerequisite augmentation, a novice studying "Distributed Systems" would have a skeleton that references concepts like "consensus algorithms" but the vector store would have no content about "network protocols" or "client-server architecture" — the foundational material the novice actually needs first.
Pass 2 generates the curriculum DAG — the structural backbone of the course. The inference model receives:
- The full concept map with confidence scores
- The learner's assumed knowledge (from skill level selection)
- Time budget constraints (duration × 10 hours/week)
- A scaffolding directive (deep for novices, moderate for beginners, none for higher levels)
- Sub-domain organization from Pass 1
It produces a chapter-organized DAG where each node has:
- A title, type (
concept,skill,project,assessment), and learning objective - Key terms, estimated duration, prerequisite node references
isAssumedKnowledgeflag for nodes the learner can skipconceptConfidenceLowflag for concepts backed only by low-reliability sourcesoptionalflag for nodes that can be cut under time pressurenodeRole—scaffold(prerequisite) orcore(main topic)- A
designNoteexplaining why this node is a distinct topic
The skeleton also includes a skeletonReasoning — a 1–2 paragraph explanation of the curriculum's pedagogical progression, which is persisted and injected into all downstream inference calls.
DAG Validation: The generated skeleton is validated for cycles using iterative DFS, topologically sorted using a chapter-aware variant of Kahn's algorithm (which preserves chapter ordering when multiple nodes have in-degree zero), and persisted as TopicNode documents with their learning objective embeddings.
Pass 3 walks the DAG in topological order and assigns a real learning resource to each node using Retrieval-Augmented Generation:
-
Vector Search: The node's learning objective embedding is used for approximate nearest-neighbor search against all stored
ContentChunkembeddings (via Atlas Vector Search, with an in-memory cosine similarity fallback for non-Atlas deployments). Results are grouped by source, and scores are adjusted bysourceConfidenceand penalized if the resource was already assigned to a prior node. -
Video Duration Filtering: YouTube candidates are enriched with metadata (duration, channel name, thumbnail) via
yt-search, and videos whose duration falls outside 0.5×–3× the node's estimated duration are penalized. -
LLM Candidate Scoring: The top candidates are presented to an inference model along with the node's learning objective, key terms, prior node objectives, the resource assignment log, skeleton reasoning, the node's design note, and time pressure / node role signals. The model selects the best-fit resource and returns a
fitScore,conceptsTaught,conceptsAssumed, concept connections, and reasoning. -
Multi-Source Gap-Fill Search: If the vector store yields no candidate above the minimum fit threshold (0.65), a cascading gap-fill search launches:
- Web articles via Tavily (with full content extraction)
- YouTube videos via
yt-search(scored by the inference model) - Academic papers via arXiv + Semantic Scholar
Each gap-fill candidate passes through the three-tier filter (algorithmic scoring → Jina + Cohere rerankers → inference model spot-check) before being considered.
-
Resource Persistence: Accepted resources are persisted as
Resourcedocuments with full metadata — fit score, concepts taught, assignment reasoning, reading guides (generated by the inference model for text articles), and concept connections.
After resource filling, an algorithmic scan detects cases where the same resource (identified by YouTube video ID or article URL) was assigned to multiple nodes. Each duplicate group is classified by severity:
- Critical: Two nodes share all their resources — they may be redundant nodes
- High: Two single-resource nodes share that one resource
- Moderate: Partial overlap with other unique resources
For each duplicate group, the inference model receives full context (both nodes' design notes, learning objectives, assignment reasoning, and available alternative candidates from the vector store) and makes a three-way decision:
- Reassign: Replace the duplicate with an alternative resource on one node
- Merge: Combine the two nodes into one (with DAG surgery — prerequisite re-pointing, resource migration, topological re-sort)
- Confirm: The duplication is intentional (same resource genuinely serves both nodes)
This runs for up to 2 rounds to catch cascading duplicates created by reassignments.
The final pass presents the entire ordered curriculum — every node's title, type, learning objective, design note, assigned resources with fit scores and reasoning — to the inference model for a holistic review. It identifies:
- Knowledge gaps: Nodes where the assigned resource doesn't adequately cover the learning objective
- Redundancies: Node pairs that overlap significantly
- Ordering issues: Nodes that should appear earlier or later in the sequence
For every flagged gap (including gaps from Pass 3 where no resource met the fit threshold), the pipeline runs a targeted refetch — a fresh vector search biased by the gap description, followed by the full multi-source gap-fill cascade if needed. Resolved gaps clear their coherenceFlags; unresolved gaps remain flagged for the user.
Finally, the pipeline computes the initial UserProgress state — marking assumed-knowledge nodes as completed and unlocking the first available node via topological traversal.
A critical design challenge in multi-pass inference pipelines is that each model call is stateless, but curriculum design is inherently a stateful process — resource selection for node 15 should be informed by what was already assigned to nodes 1–14.
CourseForge solves this through context persistence — a set of data structures that capture the reasoning from earlier pipeline stages and inject relevant context into every downstream prompt:
| Context Signal | Generated In | Injected Into | Purpose |
|---|---|---|---|
skeletonReasoning |
Pass 2 | Pass 3, 3.5, 4 | Why the curriculum is structured this way |
designNote (per node) |
Pass 2 | Pass 3 | Why this node is a distinct topic |
assignmentLog (running) |
Pass 3 | Pass 3 (subsequent nodes), Pass 3.5 | What was already assigned and why |
assignmentReasoning (per resource) |
Pass 3 | Pass 4 | Why each resource was chosen |
timePressure |
Gate 2 | Pass 3 | Resource type bias (concise articles vs. deep videos) |
nodeRole |
Pass 2 | Pass 3 | Scaffold nodes prefer video; core nodes flex by time pressure |
This allows the inference model to act as a coherent reviewer of its own prior decisions — detecting intentional vs. accidental resource reuse, understanding why neighboring nodes are separate topics, and evaluating whether the executed curriculum matches the intended pedagogical progression.
CourseForge implements a Bring Your Own Model (BYOM) architecture where the user configures separate inference model credentials for four distinct pipeline tiers:
| Tier | Pipeline Stages | Computational Profile |
|---|---|---|
| Chunk Classification | Pedagogical classification, content density | High volume (hundreds of calls), binary yes/no, latency-sensitive |
| Spot Checking | Resource filter spot-checks | Medium volume, short structured output |
| Concept Extraction | Per-chunk concept extraction (Step 1C) | High volume, moderate reasoning depth, AIMD-controlled |
| Course Generation | Feasibility, skeleton, resource scoring, coherence, duplicate review | Low volume, deep reasoning, long structured output |
Each tier is wired to a createLLMClient instance from the unified LLM abstraction layer (llmService.js), which provides a single complete(prompt, options) interface over 11 supported providers:
- Cloud: OpenAI, Anthropic (Claude), Google Gemini, Mistral, Groq, OpenRouter
- Cloud Inference: OpenAI-compatible inference endpoints
- Local: LM Studio, Ollama
- Enterprise: Amazon Bedrock (Converse API with native AWS Sig V4 signing — no SDK required)
All providers are implemented via raw fetch() calls — no provider SDKs are used. The abstraction handles:
- Retry with exponential backoff for transient errors (429, 502, 503, 504) — up to 10 attempts with full-jitter backoff
- Multi-strategy JSON parsing — markdown fence stripping, regex block extraction, JavaScript-to-JSON normalization (for local models that return single-quoted keys or trailing commas)
- Thinking model support —
<think>...</think>tag stripping for reasoning models (DeepSeek-R1, Qwen3), minimum token floor enforcement for local inference servers - Malformed JSON retry — automatic re-prompt with stricter instructions on first parse failure
This architecture means a user can route high-volume classification calls through a fast, cheap model while using a more capable model for the creative skeleton generation — optimizing cost and quality simultaneously.
The top-level document representing a generated course. Stores the topic, user preferences, generation progress, concept map (with full confidence-weighted graph and coverage profile), skill level options, feasibility results, selected duration, skeleton reasoning, coherence report, and a timestamped generation log.
A single node in the curriculum DAG. Contains the title, type, learning objective, key terms, prerequisite references (as ObjectId edges), topological index, estimated duration, design note, node role (scaffold/core), optional flag, assumed knowledge flag, concept confidence flag, coherence flags, and a 1024-dimensional objective embedding vector for semantic matching.
A semantically coherent text segment extracted from a discovered resource. Stores the source type, source identifiers (video ID or article URL), raw text, a 1024-dimensional embedding vector, pedagogical classification, chunk index, and source confidence weight. These documents are the retrieval targets for Atlas Vector Search during Pass 3.
A learning resource assigned to a specific TopicNode. Polymorphic — stores YouTube-specific metadata (video ID, thumbnail, channel name, duration) or article-specific metadata (URL, site name, raw content, reading guide, estimated read time). Also stores the assignment reasoning, fit score, concepts taught, concepts assumed, and concept connections.
Tracks the learner's advancement through the DAG — completed nodes, unlocked nodes, and current position. The initial state is computed by the pipeline (assumed-knowledge nodes start completed; the first topologically-reachable node is unlocked).
Stores authentication (Google OAuth), profile information, and the BYOM configuration — four separate model configs (provider, model ID, API key) for the four pipeline tiers.
| Service | Responsibility |
|---|---|
generationService.js |
Pipeline orchestrator — coordinates all passes, manages SSE streaming, implements feasibility validation, skeleton generation, RAG-fill, duplicate checking, and coherence validation |
llmService.js |
Unified LLM abstraction — provider-agnostic complete() interface with retry, JSON parsing, thinking model support |
conceptGraphService.js |
Concept extraction (Map-Reduce over chunks), AIMD queue, concept graph merge, coverage profile computation |
embeddingService.js |
Vector embeddings (Voyage AI), boundary-aware semantic chunking, pedagogical classification, Atlas Vector Search with in-memory fallback |
filterService.js |
Three-tier resource filter — algorithmic scoring (URL structure, page structure, content density, video channel signals), classifier layer (Jina Reranker + Cohere Rerank), LLM spot-check |
graphService.js |
Pure graph algorithms — iterative DFS cycle detection, chapter-aware topological sort (Kahn's), unlocked node computation, shortest prerequisite path (BFS + topo-sort) |
searchService.js |
Tavily web search integration for educational content discovery |
youtubeService.js |
YouTube search and caption extraction |
academicService.js |
arXiv and Semantic Scholar paper discovery |
structuredReferenceService.js |
Wikipedia outline and university syllabus extraction |
The React frontend (built with Vite, Tailwind CSS, Framer Motion) provides:
- Dashboard — course cards with progress tracking
- Course Viewer — split-panel curriculum navigation with a DAG-based node progression system, content rendering (video embeds, article reader with markdown support, PDF rendering with arXiv probe-and-fallback), and tabbed multi-resource views
- Settings — BYOM model configuration for all four pipeline tiers
- Real-time Progress — SSE-driven generation progress streaming with phase-specific status messages
- Runtime: Node.js with Express
- Database: MongoDB with Mongoose ODM
- Vector Search: MongoDB Atlas Vector Search (cosine similarity, 1024-dim) with in-memory fallback
- Embeddings: Voyage AI (
voyage-3-large) — asymmetric query/document embedding modes - Search APIs: Tavily (web), yt-search (YouTube), arXiv API, Semantic Scholar API
- Rerankers: Jina Reranker v2, Cohere Rerank v3.5
- Authentication: Google OAuth 2.0 with JWT session tokens
- Real-time: Server-Sent Events (SSE) for generation progress streaming
- Inference Models: Provider-agnostic via unified abstraction (OpenAI, Anthropic, Gemini, Mistral, Groq, OpenRouter, LM Studio, Ollama, Amazon Bedrock)
- Framework: React 19 with Vite
- Styling: Tailwind CSS 4
- Animations: Framer Motion
- Icons: Lucide React
- SSE Client:
@microsoft/fetch-event-source - Layout:
react-resizable-panelsfor split-panel course viewer - Routing: React Router v7
- Markdown:
react-markdownwithremark-gfm
CourseForge2/
├── client/ # React frontend (Vite)
│ └── src/
│ ├── api/ # Axios API client
│ ├── components/
│ │ ├── course/ # Course viewer components
│ │ └── dashboard/ # Dashboard components
│ ├── context/ # React context providers
│ ├── pages/ # Route pages (Dashboard, CourseViewer, Login, Settings)
│ └── utils/ # Client utilities
│
├── server/ # Express backend
│ ├── index.js # Server entry point
│ └── src/
│ ├── config/ # Database and environment config
│ ├── controllers/ # Route handlers (auth, course, graph)
│ ├── middleware/ # Auth, validation, error handling
│ ├── models/ # Mongoose schemas (Course, TopicNode, ContentChunk, Resource, User, UserProgress)
│ ├── routes/ # Express route definitions
│ └── services/ # Core business logic (generation, LLM, embedding, filter, graph, search, etc.)
│
└── CourseForge_sequence_of_proposals/ # Historical design documents (development context)
- Node.js 18+
- MongoDB (local or Atlas — Atlas required for vector search; in-memory fallback available for local)
- At least one inference model API key (see BYOM configuration)
Server (server/.env):
MONGODB_URI=mongodb+srv://...
JWT_SECRET=your-jwt-secret
GOOGLE_CLIENT_ID=your-google-oauth-client-id
# Embedding & Classification (server-managed)
VOYAGE_API_KEY=your-voyage-api-key
# Search APIs
TAVILY_API_KEY=your-tavily-api-key
# Rerankers (used in resource filtering)
JINA_API_KEY=your-jina-api-key
COHERE_API_KEY=your-cohere-api-keyClient (client/.env):
VITE_API_URL=http://localhost:3001
VITE_GOOGLE_CLIENT_ID=your-google-oauth-client-id# Install server dependencies
cd server && npm install
# Install client dependencies
cd ../client && npm install# Start the backend (with file watching)
cd server && npm run dev
# Start the frontend (in a separate terminal)
cd client && npm run devIf using MongoDB Atlas, create a Vector Search index on the contentchunks collection:
- Index name:
content_embed_index - Index type: Atlas Vector Search (NOT Atlas Search)
{
"fields": [
{
"type": "vector",
"path": "embedding",
"numDimensions": 1024,
"similarity": "cosine"
},
{
"type": "filter",
"path": "courseId"
}
]
}If running locally without Atlas, the system automatically falls back to in-memory cosine similarity search.
This project is private and not currently licensed for distribution.