STELLAROS Agentic AI Video Generation and Editing Platform Project Documentation | March 2026 License: MIT | Python 3.10+ | React 18 | FastAPI
Project Overview STELLAROS is an AI-powered SaaS platform designed for researchers, educators, and professionals who need to turn complex documents into clear, structured explainer videos. The platform handles the entire production pipeline automatically — from reading your document and writing a narration script, all the way through to rendering and exporting a finished video.
What makes STELLAROS different from other AI video tools is its modular, scene-level architecture. Most platforms regenerate the entire video whenever you make a single change. STELLAROS works differently: every scene is independent, so editing one scene only affects that scene. The rest of your video remains untouched.
To put it plainly — if you have a ten-scene video and you want to adjust the narration in Scene 3, STELLAROS rewrites Scene 3, regenerates its audio, and re-renders its clip. Scenes 1, 2, and 4 through 10 stay exactly as they were. This alone saves hours of work on longer projects.
What STELLAROS accepts as input Users can upload any of the following file types to begin a project: • PDF files such as research papers, reports, and articles • Word documents (.docx) including manuscripts and notes • PowerPoint presentations (.pptx) such as lecture slides and decks • Excel spreadsheets (.xlsx) containing data tables and statistics • Images in PNG or JPG format, such as diagrams and charts • Existing MP4 video files that need re-editing
Core Features The following capabilities are built into the platform. Each one is explained in more detail in the sections that follow.
AI Script Generation When you upload a document, the Script Agent reads it and breaks it into narration paragraphs — one per video scene. Each paragraph is automatically tagged with a reference to the section of the source document it came from. This means every line of narration is traceable back to your original content, which is especially important for research and academic work where accuracy is non-negotiable.
Storyboard Generation Once the script is ready, the platform converts it into a structured storyboard. Each scene card in the storyboard contains the narration text, a description of what the visual should show, the tone and depth settings, an estimated duration, and a confidence score. The confidence score tells you how well each scene's narration is grounded in your source document, which helps you know exactly which scenes need closer attention.
Automatic Visual Generation Visuals are generated automatically for every scene using HuggingFace diffusion models for illustrated scenes and Matplotlib or Plotly for data-driven scenes. The system selects the appropriate visual type based on the content — a flowchart for a step-by-step process, a bar chart for statistical data, a concept diagram for explanatory content, and so on. You can also provide a custom prompt to regenerate any visual exactly the way you want it.
Text-to-Speech Narration Each scene's narration text is converted into a natural-sounding audio file using Edge-TTS, which uses Microsoft's neural voice technology. You can choose between male and female voices, adjust between formal and casual delivery styles, and set the speech speed anywhere from half-speed to double-speed. Audio is generated per scene and can be regenerated independently at any point.
Scene Video Rendering Once a scene has its visual and audio ready, it is rendered into an individual video clip using MoviePy and FFmpeg. Scenes can be rendered one at a time or all at once. Because each scene renders independently, you never have to wait for scenes you have already approved to re-render when you make a change elsewhere.
Interactive Video Editor Workbench The workbench is the main editing environment where you review all scenes together. Every scene appears as a card with its thumbnail, narration text, confidence badge, source reference, and approval status. You can click into any scene to edit its narration, replace its visual, adjust its timing, or regenerate its audio — all without leaving the workbench and without disturbing any other scene.
Scene Version Control Every edit you make to a scene is saved in that scene's version history. The history records the original content, the edit instruction you gave, the new content, and a timestamp. You can roll back any scene to any previous version at any time.
Final Video Export When all your scenes are approved and rendered, the platform stitches them together into the final video. You can export as an MP4 video, an MP3 audio-only file, a PDF of the full narration script with source references, or a JSON file of the complete storyboard data.
AI Architecture STELLAROS uses a multi-agent system where three specialised AI agents work together across the production pipeline. Each agent has a clearly defined responsibility and operates on individual scenes rather than on the video as a whole.
The Script Agent The Script Agent is responsible for reading your uploaded document and generating narration. It uses Groq's LLaMA 3 model (8b-8192) to understand your content, extract the key ideas from each section, and write narration paragraphs that are accurate, well-structured, and sourced from the original document. When you later edit a scene using a natural language instruction such as "make this simpler" or "use a more formal tone", the Script Agent handles that rewrite — and it only touches the one scene you asked it to change.
The Visual Agent The Visual Agent reads each scene's narration and decides what type of visual would best support it. For scenes with data or statistics, it generates charts and infographics using Matplotlib. For conceptual or explanatory scenes, it generates illustrated visuals using HuggingFace's diffusion models. If you provide a custom prompt when regenerating a visual, the Visual Agent uses that prompt to produce something more specific to your needs.
The Fact-Check Agent The Fact-Check Agent runs automatically every time narration is written or rewritten. It compares the generated narration against the relevant section of your source document and assigns a confidence score between 0 and 100 percent. Scenes scoring above 80 percent are well-grounded. Scores between 50 and 79 percent indicate partial grounding and warrant a review. Scores below 50 percent are flagged for manual checking, as they may contain claims that are not sufficiently supported by the original document.
How the three agents work together When you upload a document, the Script Agent generates the narration for all scenes. The Fact-Check Agent then verifies each scene and assigns confidence scores. Once you approve the script, the Visual Agent generates visuals for every scene. If you later edit a scene, the same cycle repeats — but only for that scene. The Script Agent rewrites it, the Fact-Check Agent scores it, and if you approve, the Visual Agent generates a new visual. All of this happens without any involvement from the rest of your project.
Technology Stack The following table summarises the main technologies used across the backend, AI layer, frontend, and media processing components.
Backend Technology Version Purpose FastAPI 0.110+ REST API server with async endpoint support Python 3.10+ Core backend language MongoDB 6.0+ Project storage, storyboard data, and version history MoviePy 1.0.3 Scene video composition Edge-TTS Latest Neural text-to-speech narration
AI Models Model Provider Used For LLaMA 3 (8b-8192) Groq API Script generation, scene rewriting, fact-checking Gemini 1.5 Flash Google AI Fallback AI when Groq is unavailable Diffusion Models HuggingFace Scene image generation
Frontend Technology Purpose React 18 Component-based user interface TailwindCSS Utility-first styling system Framer Motion Scene card animations and transition effects
Video and Media Processing Technology Purpose MoviePy Combines visuals and audio into scene clips FFmpeg Low-level video processing and format conversion Matplotlib and Plotly Data-driven chart and infographic generation Pillow Image manipulation and slide creation
System Architecture STELLAROS processes your content through a sequential pipeline where each stage builds on the previous one. The pipeline runs end-to-end on the first pass. After that, any individual stage can be re-run for a single scene without touching the rest.
The full pipeline, stage by stage Stage 1 — Document Input. The user uploads a file. The system reads it using pdfplumber, python-docx, or python-pptx depending on the file type. Each paragraph is extracted and labelled with its section heading and location within the document.
Stage 2 — Script Generation. The Script Agent calls Groq's LLaMA 3 model and generates narration paragraphs from the extracted content. Each paragraph is tagged with a citation pointing back to the source section it was derived from.
Stage 3 — Storyboard Creation. The platform converts the script into a structured JSON storyboard. Each scene card includes the narration text, a visual description, tone, depth, estimated duration, and the confidence score assigned by the Fact-Check Agent.
Stage 4 — Visual Generation. The Visual Agent generates an image for each scene — either a diffusion-model image or a Matplotlib chart, depending on the scene content. These are saved as PNG files and linked to each scene.
Stage 5 — Audio Narration. Edge-TTS converts each scene's narration text into an MP3 file using a neural voice. The voice settings (gender, tone, speed) are applied per scene.
Stage 6 — Scene Video Rendering. MoviePy combines the scene's visual image and audio file into a short MP4 clip. Each scene renders independently.
Stage 7 — Editor Workbench. The user reviews all scene clips in the interactive workbench. Any scene can be edited, and only that scene is re-processed through the relevant earlier stages.
Stage 8 — Final Export. All approved scene clips are stitched together into the final video. The user can export as MP4, MP3, PDF script, or Storyboard JSON.
What happens when you edit a single scene This is the core behaviour that sets STELLAROS apart. When a user edits Scene 3, for example, the following sequence runs — and only this sequence:
- The user types an instruction such as "make this more formal" into the edit panel.
- The request is sent to the backend endpoint /edit-scene with only Scene 3's data.
- The Script Agent rewrites Scene 3's narration using Groq.
- The Fact-Check Agent verifies the new narration against the source document.
- The frontend shows the user a diff — original text on the left, new text on the right.
- The user clicks Approve. Scene 3's audio is regenerated by Edge-TTS.
- Scene 3's video clip is re-rendered by MoviePy.
- Scene 3 is updated in the storyboard JSON. Scenes 1, 2, 4 through 10 are completely untouched.
Project Structure The repository is organised into three main areas: the backend, which runs the FastAPI server and all AI agents; the frontend, which contains the React application; and the static folder, which stores generated media files.
Backend The agents folder contains the three AI agents — script_agent.py for narration generation and rewriting, visual_agent.py for visual descriptions and infographic logic, and fact_check_agent.py for source verification and confidence scoring.
The services folder contains the supporting utilities — pdf_extractor.py, tts_service.py for Edge-TTS with gTTS fallback, image_service.py for HuggingFace generation, chart_service.py for Matplotlib and Plotly, video_service.py for MoviePy rendering, shotstack_service.py for cloud video stitching, and changelog_service.py for version history management.
The api folder contains all route handlers and Pydantic schemas. The routes are separated by function: documents, script, storyboard, scenes, audio, visuals, video, and changelog. The main server entry point is server.py.
Frontend The components folder contains all reusable UI elements including SceneCard, EditPanel, DiffViewer, ConfidenceBadge, ChangelogDrawer, and CoreSpinLoader. The pages folder contains the five main pages: the landing page, login, upload, workbench, and export. The editor folder contains the scene workbench grid, pipeline navigation, and video preview components. The lib folder contains the API client and shared utility functions.
Static files The static folder is where the platform stores all generated media. Visuals are saved as PNG files in the visuals subfolder, narration audio as MP3 files in the audio subfolder, and rendered scene clips as MP4 files in the video subfolder.
Environment Variables Create a .env file in the project root by copying .env.example. The following variables are required or recommended:
Variable Required What it is used for GROQ_API_KEY Yes Primary AI engine for all three agents. Free tier gives 6,000 requests per day at console.groq.com GEMINI_API_KEY Recommended Fallback AI when Groq is unavailable. Free tier gives 1M tokens per day at aistudio.google.com HUGGINGFACE_API_KEY Yes Used for scene image generation via diffusion models. Free tier available at huggingface.co SHOTSTACK_API_KEY Recommended Cloud video stitching for final export. Free tier gives 50 renders per month at shotstack.io. Falls back to MoviePy locally if limit is reached. MONGODB_URI Yes MongoDB connection string. Use mongodb://localhost:27017 for local or your Atlas URI for cloud FRONTEND_URL Yes Used for CORS configuration. Set to http://localhost:3000 for local development
STELLAROS is intentionally designed to work within free API tiers. If Groq is unavailable, the system automatically falls back to Gemini. If Shotstack reaches its monthly limit, video rendering falls back to MoviePy running locally.
Demo Flow For a hackathon or live presentation, the following sequence demonstrates the core value of STELLAROS in approximately 90 seconds and fits comfortably within a live demo format.
The key point to make during the demo: every other AI video tool would have regenerated all eight scenes in step four. STELLAROS only touched Scene 3.
API Reference Full interactive API documentation is available at /docs when the server is running. The main endpoints are listed below.
Future Improvements The following capabilities are planned for future development. Items marked as high priority are expected in the next release cycle.
Feature Description Priority Real-time Video Editor Timeline-based drag-and-drop editor for ordering and trimming scenes High AI Visual Animation Animate static scene visuals using video diffusion models High Collaboration Features Multi-user projects with shared scene review and approval Medium Team Storyboard Review Role-based access with Author, Reviewer, and Approver roles and comment threads Medium Cloud Rendering Queue Background async rendering with email notification on completion Medium Voice Cloning Upload a voice sample to generate narration in a custom voice Medium Multi-language Support Script generation and TTS narration in over 20 languages Medium LMS Integration One-click export to Moodle, Canvas, and Google Classroom Low Analytics Dashboard Per-scene viewer engagement tracking in exported videos Low
Contributing Contributions are welcome. To contribute to STELLAROS, fork the repository, create a feature branch, make your changes with docstrings on all new functions, and open a pull request. Please follow the existing code patterns and ensure new backend endpoints include appropriate error handling and fallback logic.
License STELLAROS is released under the MIT License. You are free to use, copy, modify, merge, publish, distribute, sublicense, and sell copies of the software. The full license text is included in the LICENSE file in the root of the repository.
Copyright (c) 2026 STELLAROS
Built for researchers who cannot afford a video that gets the science wrong. STELLAROS — Edit one scene. Touch nothing else.