feat: editing and subtitle agents using videodb.editor#185
feat: editing and subtitle agents using videodb.editor#185ashish-spext merged 2 commits intovideo-db:mainfrom
videodb.editor#185Conversation
📝 WalkthroughWalkthroughThis PR restructures the editing agent from a single monolithic module into a modular package with dedicated components (timeline models, code executor, media handler), refactors the subtitle agent to leverage the Editor API with template support and translation capabilities, and introduces per-agent context management in sessions alongside dependency version updates. Changes
Sequence Diagram(s)sequenceDiagram
participant Agent as EditingAgent
participant LLM as Language Model
participant Executor as CodeExecutor
participant Timeline as Timeline Engine
participant Stream as VideoDB Stream
Agent->>Agent: Initialize with collection
Agent->>LLM: Send prompt + media context
LLM->>LLM: Generate Timeline code
LLM-->>Agent: Return code + metadata
Agent->>Executor: execute_code(generated_code)
Executor->>Executor: Validate & prepare environment
Executor->>Timeline: Execute code (create timeline, add tracks)
alt Success
Timeline->>Stream: generate_stream()
Stream-->>Executor: stream_url
Executor-->>Agent: AgentResponse(stream_url)
else Audio/Transcoding Error
Executor->>Executor: _generate_video_only_fallback()
Executor->>Timeline: Execute fallback code
Timeline->>Stream: generate_stream()
Stream-->>Executor: stream_url
Executor-->>Agent: AgentResponse(stream_url)
else All Retries Failed
Executor-->>Agent: AgentResponse(error)
end
Agent-->>Agent: Return final response
sequenceDiagram
participant Agent as SubtitleAgent
participant VideoDB as VideoDB Tool
participant Translator as Translation Service
participant CaptionFactory as Caption Builder
participant Timeline as Timeline Engine
participant Stream as VideoDB Stream
Agent->>Agent: Parse parameters (template, languages, styling)
Agent->>VideoDB: Fetch transcript for video
VideoDB-->>Agent: transcript_text
alt Translation Required
Agent->>Translator: translate_transcript(text, target_language)
Translator-->>Agent: translated_text
else No Translation
Agent->>Agent: Use original transcript
end
Agent->>CaptionFactory: _create_caption_clip(config, duration, transcript)
CaptionFactory->>CaptionFactory: Build CaptionAsset with styling
CaptionFactory->>Timeline: Create Track + add Clip
Timeline->>Timeline: add_track(track)
Timeline->>Stream: generate_stream()
Stream-->>Agent: stream_url
Agent-->>Agent: Return AgentResponse with metadata
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
videodb.editor
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/director/core/session.py (1)
338-359: Avoid dropping persisted agent contexts on save.
Becauseget_context_messages(agent_name=...)doesn’t populateself.agent_context, a latersave_context_messages()can overwrite DB state with only reasoning messages, effectively erasing other agents’ contexts.🔧 Suggested fix (seed agent_context when loading)
def get_context_messages(self, agent_name: str = None): """Get the reasoning context messages from the database or use edited context if provided.""" if agent_name: context = self.edited_context or self.db.get_context_messages(self.session_id) - return [ + messages = [ ContextMessage.from_json(message) for message in context.get(agent_name, []) ] + self.agent_context[agent_name] = messages + return messages if not self.reasoning_context: context = self.edited_context or self.db.get_context_messages(self.session_id) self.reasoning_context = [ ContextMessage.from_json(message) for message in context.get("reasoning", []) ] + for name, msgs in context.items(): + if name != "reasoning": + self.agent_context[name] = [ + ContextMessage.from_json(m) for m in msgs + ]
🤖 Fix all issues with AI agents
In `@backend/director/agents/editing/agent.py`:
- Around line 721-734: The review points out the prompt calls a non-existent
tool generate_timeline_code(), which will trigger an "Unknown tool" path; fix by
either removing that call and invoking the available tools (get_media and
code_executor) to perform timeline generation or implement a local wrapper
function generate_timeline_code() that internally uses code_executor (or
get_media where appropriate) so the name is resolved at runtime; also ensure the
required import statement from videodb.editor (Timeline, Track, Clip,
VideoAsset, ImageAsset, AudioAsset, TextAsset, Position, Offset, Filter,
Transition, Fit, Font, Border, Shadow, Background, Alignment,
HorizontalAlignment, VerticalAlignment, TextAlignment) is present at top of the
module so symbols used by timeline generation are available.
- Around line 930-1005: The except block can raise UnboundLocalError because
video_content may not be defined if an exception occurs before its creation
(e.g., during MediaHandler initialization); to fix, ensure video_content is
always defined before the try/except or guard its use in the except: initialize
a default VideoContent (with agent_name=self.agent_name, status=MsgStatus.error
and a generic status_message) prior to the try or check/create video_content
inside the except before setting status/status_message and publishing via
self.output_message.publish(); update references to video_content in the except
handler (and keep logging with logger.exception(f"Error in {self.agent_name}"))
so the original error is not masked.
In `@backend/director/agents/editing/code_executor.py`:
- Around line 14-27: The code currently injects the entire __builtins__ into
exec_globals inside execute_code (which calls _execute_with_retry) creating a
security risk; replace this by whitelisting only the minimal safe names required
by the Timeline code (e.g., VideoAsset and any explicit helper
functions/constants) and remove passing __builtins__ wholesale, i.e., construct
a restricted builtins dict (or use a known safe_builtins mapping) and set
exec_globals to include only "conn" plus those explicit safe symbols before
calling _execute_with_retry; additionally, consider moving code execution into a
separate restricted subprocess or sandboxed environment with limited OS
permissions if feasible to further isolate untrusted code.
In `@backend/director/agents/subtitle.py`:
- Around line 429-483: When needs_translation is true we may call
translate_transcript with a None target_language_iso_code; add an early
validation in the subtitle generation flow to check target_language_iso_code
before calling VideoDBTool.translate_transcript (e.g. right where
needs_translation is True) and if it's missing set subtitles_content.status =
MsgStatus.error, set a clear subtitles_content.status_message and publish the
output_message, then return an AgentResponse with AgentStatus.ERROR and a
descriptive message; ensure you reference needs_translation,
target_language_iso_code, translate_transcript, subtitles_content and
output_message in the guard so the error path mirrors the existing exception
path.
🧹 Nitpick comments (1)
backend/director/agents/editing/media_handler.py (1)
41-49: Preserve failures in batch retrieval.
get_media_listcurrently drops failed IDs silently, which can mask missing assets upstream. Consider returning(results, errors)or a list ofAgentResponseobjects to keep failure context.
|
|
||
| Your import MUST be: | ||
| ```python | ||
| from videodb.editor import ( | ||
| Timeline, Track, Clip, | ||
| VideoAsset, ImageAsset, AudioAsset, TextAsset, | ||
| Position, Offset, Filter, Transition, Fit, | ||
| Font, Border, Shadow, Background, Alignment, | ||
| HorizontalAlignment, VerticalAlignment, TextAlignment | ||
| ) | ||
| ``` | ||
|
|
||
| Now generate code based on user requirements and **ALWAYS call generate_timeline_code()** tool! | ||
| """.strip() |
There was a problem hiding this comment.
Prompt references a non-existent tool.
The prompt instructs the LLM to call generate_timeline_code(), but only get_media and code_executor are registered. This can force an “Unknown tool” error path.
🔧 Suggested prompt fix
-Now generate code based on user requirements and **ALWAYS call generate_timeline_code()** tool!
+Now generate code based on user requirements and **ALWAYS call code_executor** tool!📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Your import MUST be: | |
| ```python | |
| from videodb.editor import ( | |
| Timeline, Track, Clip, | |
| VideoAsset, ImageAsset, AudioAsset, TextAsset, | |
| Position, Offset, Filter, Transition, Fit, | |
| Font, Border, Shadow, Background, Alignment, | |
| HorizontalAlignment, VerticalAlignment, TextAlignment | |
| ) | |
| ``` | |
| Now generate code based on user requirements and **ALWAYS call generate_timeline_code()** tool! | |
| """.strip() | |
| Your import MUST be: |
🤖 Prompt for AI Agents
In `@backend/director/agents/editing/agent.py` around lines 721 - 734, The review
points out the prompt calls a non-existent tool generate_timeline_code(), which
will trigger an "Unknown tool" path; fix by either removing that call and
invoking the available tools (get_media and code_executor) to perform timeline
generation or implement a local wrapper function generate_timeline_code() that
internally uses code_executor (or get_media where appropriate) so the name is
resolved at runtime; also ensure the required import statement from
videodb.editor (Timeline, Track, Clip, VideoAsset, ImageAsset, AudioAsset,
TextAsset, Position, Offset, Filter, Transition, Fit, Font, Border, Shadow,
Background, Alignment, HorizontalAlignment, VerticalAlignment, TextAlignment) is
present at top of the module so symbols used by timeline generation are
available.
| try: | ||
| self.prompt = prompt | ||
| self.collection_id = collection_id | ||
| self.iterations = 25 | ||
| self.stop_flag = False | ||
|
|
||
| self.media_handler = MediaHandler(collection_id) | ||
| self.output_message.actions.append("Preparing your editing workspace..") | ||
|
|
||
| video_content = VideoContent( | ||
| agent_name=self.agent_name, | ||
| status=MsgStatus.progress, | ||
| status_message="Generating editing instructions...", | ||
| ) | ||
| self.output_message.content.append(video_content) | ||
| self.output_message.push_update() | ||
|
|
||
| input_context = ContextMessage( | ||
| content=f"{self.prompt}", role=RoleTypes.user | ||
| ) | ||
| if not self.agent_context: | ||
| system_context = ContextMessage( | ||
| content=EDITING_PROMPT, role=RoleTypes.system | ||
| ) | ||
| self._update_context(system_context) | ||
| self._update_context(input_context) | ||
|
|
||
| self.output_message.actions.append("Crafting your video edit...") | ||
| self.output_message.push_update() | ||
|
|
||
| iteration = 0 | ||
| while self.iterations > 0: | ||
| self.iterations -= 1 | ||
| logger.info(f"Code generation iteration {iteration}") | ||
|
|
||
| if self.stop_flag: | ||
| break | ||
|
|
||
| self.run_llm() | ||
| iteration += 1 | ||
| logger.info("Timeline code generation completed") | ||
|
|
||
| if ( | ||
| self.editing_response | ||
| and self.editing_response.status == AgentStatus.SUCCESS | ||
| ): | ||
| stream_url = self.editing_response.data.get("stream_url") | ||
|
|
||
| if stream_url: | ||
| video_content.video = VideoData(stream_url=stream_url) | ||
| video_content.status = MsgStatus.success | ||
| video_content.status_message = ( | ||
| "Editing instructions executed successfully." | ||
| ) | ||
| self.output_message.actions.append("Video edit complete!") | ||
| else: | ||
| video_content.status = MsgStatus.error | ||
| video_content.status_message = ( | ||
| "No stream URL generated from Timeline execution." | ||
| ) | ||
| else: | ||
| video_content.status = MsgStatus.error | ||
| video_content.status_message = ( | ||
| "Something went wrong with Timeline execution. Please try again." | ||
| ) | ||
|
|
||
| self.output_message.publish() | ||
|
|
||
| except Exception as e: | ||
| logger.exception(f"Error in {self.agent_name}") | ||
| video_content.status = MsgStatus.error | ||
| video_content.status_message = "Error in Timeline code generation." | ||
| self.output_message.publish() | ||
| return AgentResponse( | ||
| status=AgentStatus.ERROR, message=f"Agent failed with error: {e}" | ||
| ) |
There was a problem hiding this comment.
Exception path can crash due to video_content scope.
If an error occurs before video_content is created (e.g., MediaHandler init), the except block will raise UnboundLocalError, masking the root cause.
🛠️ Suggested guard
- def run(self, prompt: str, collection_id: str, *args, **kwargs) -> AgentResponse:
- try:
+ def run(self, prompt: str, collection_id: str, *args, **kwargs) -> AgentResponse:
+ video_content = VideoContent(
+ agent_name=self.agent_name,
+ status=MsgStatus.progress,
+ status_message="Generating editing instructions...",
+ )
+ try:
self.prompt = prompt
self.collection_id = collection_id
self.iterations = 25
self.stop_flag = False
self.media_handler = MediaHandler(collection_id)
self.output_message.actions.append("Preparing your editing workspace..")
- video_content = VideoContent(
- agent_name=self.agent_name,
- status=MsgStatus.progress,
- status_message="Generating editing instructions...",
- )
self.output_message.content.append(video_content)
self.output_message.push_update()
@@
except Exception as e:
logger.exception(f"Error in {self.agent_name}")
- video_content.status = MsgStatus.error
- video_content.status_message = "Error in Timeline code generation."
- self.output_message.publish()
+ if video_content:
+ video_content.status = MsgStatus.error
+ video_content.status_message = "Error in Timeline code generation."
+ self.output_message.publish()
return AgentResponse(
status=AgentStatus.ERROR, message=f"Agent failed with error: {e}"
)🤖 Prompt for AI Agents
In `@backend/director/agents/editing/agent.py` around lines 930 - 1005, The except
block can raise UnboundLocalError because video_content may not be defined if an
exception occurs before its creation (e.g., during MediaHandler initialization);
to fix, ensure video_content is always defined before the try/except or guard
its use in the except: initialize a default VideoContent (with
agent_name=self.agent_name, status=MsgStatus.error and a generic status_message)
prior to the try or check/create video_content inside the except before setting
status/status_message and publishing via self.output_message.publish(); update
references to video_content in the except handler (and keep logging with
logger.exception(f"Error in {self.agent_name}")) so the original error is not
masked.
| def execute_code(self, code: str, description: str = "") -> AgentResponse: | ||
| try: | ||
| conn = self.session.state.get("conn") | ||
| if not conn: | ||
| raise ValueError("No VideoDB connection available in session") | ||
|
|
||
| exec_globals = { | ||
| "conn": conn, | ||
| "__builtins__": __builtins__, | ||
| } | ||
| exec_locals = {} | ||
|
|
||
| logger.info("Executing generated code.") | ||
| stream_url = self._execute_with_retry(code, exec_globals, exec_locals) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, read the complete file to understand the full context
wc -l backend/director/agents/editing/code_executor.pyRepository: video-db/Director
Length of output: 178
🏁 Script executed:
# Read the entire code_executor.py file
cat -n backend/director/agents/editing/code_executor.pyRepository: video-db/Director
Length of output: 8336
🏁 Script executed:
# Search for _execute_with_retry method implementation
rg "_execute_with_retry" -A 20 backend/director/agents/editing/Repository: video-db/Director
Length of output: 3502
🏁 Script executed:
# Check where execute_code is called from
rg "execute_code" --type py -B 3 -A 3 backend/Repository: video-db/Director
Length of output: 2318
🏁 Script executed:
# Look for any security documentation or comments in the file
rg "sandbox|security|restrict|unsafe" -i backend/director/agents/editing/code_executor.pyRepository: video-db/Director
Length of output: 43
🏁 Script executed:
# Check the session and state structure to understand code source
rg "session.*state" -B 5 -A 5 backend/director/agents/editing/code_executor.pyRepository: video-db/Director
Length of output: 431
Restrict exec() builtins for safer code execution.
The exec() call with full __builtins__ poses an unnecessary security risk. Although the code is LLM-generated (not direct user input), it should still be sandboxed to prevent potential exploits if the LLM is influenced through prompt injection or if the code generation constraints are bypassed. Restrict builtins to only what the Timeline code requires (likely just VideoAsset and a few others), and consider executing in a separate process with limited permissions.
🧰 Tools
🪛 Ruff (0.14.13)
18-18: Abstract raise to an inner function
(TRY301)
18-18: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
In `@backend/director/agents/editing/code_executor.py` around lines 14 - 27, The
code currently injects the entire __builtins__ into exec_globals inside
execute_code (which calls _execute_with_retry) creating a security risk; replace
this by whitelisting only the minimal safe names required by the Timeline code
(e.g., VideoAsset and any explicit helper functions/constants) and remove
passing __builtins__ wholesale, i.e., construct a restricted builtins dict (or
use a known safe_builtins mapping) and set exec_globals to include only "conn"
plus those explicit safe symbols before calling _execute_with_retry;
additionally, consider moving code execution into a separate restricted
subprocess or sandboxed environment with limited OS permissions if feasible to
further isolate untrusted code.
| video_lang = video_language.lower() | ||
| target_lang = (target_language or video_language).lower() | ||
| needs_translation = video_lang != target_lang | ||
|
|
||
| self.output_message.actions.append( | ||
| "Retrieving the subtitles in the video's original language" | ||
| "Initializing professional subtitle generation with Editor API" | ||
| ) | ||
| video_content = VideoContent( | ||
| subtitles_content = VideoContent( | ||
| agent_name=self.agent_name, | ||
| status=MsgStatus.progress, | ||
| status_message="Processing...", | ||
| ) | ||
| self.output_message.push_update() | ||
|
|
||
| self.output_message.actions.append("Retrieving video information") | ||
| self.output_message.push_update() | ||
| video_info = self.videodb_tool.get_video(video_id) | ||
| video_duration = video_info.get("length", 30) | ||
|
|
||
| try: | ||
| self.videodb_tool.get_transcript(video_id, text=False) | ||
| final_transcript = self.videodb_tool.get_transcript(video_id, text=False) | ||
| except InvalidRequestError: | ||
| logger.info( | ||
| f"Transcript not available for video {video_id}. Indexing spoken words.." | ||
| f"Transcript not available for video {video_id}. Indexing spoken words in {video_language}.." | ||
| ) | ||
| self.output_message.actions.append("Indexing video spoken words..") | ||
| self.output_message.actions.append(f"Indexing spoken words in {video_language}...") | ||
| self.output_message.push_update() | ||
|
|
||
| self.videodb_tool.index_spoken_words(video_id) | ||
| language_code = GEMINI_SUPPORTED_LANGUAGES.get(video_lang, None) | ||
| self.videodb_tool.index_spoken_words(video_id, language_code=language_code) | ||
| final_transcript = self.videodb_tool.get_transcript(video_id, text=False) | ||
|
|
||
| self.output_message.content.append(subtitles_content) | ||
| if needs_translation: | ||
| self.output_message.actions.append( | ||
| f"Translating subtitles from {video_language} to {target_language}.." | ||
| ) | ||
| self.output_message.push_update() | ||
|
|
||
| try: | ||
| final_transcript = self.videodb_tool.translate_transcript( | ||
| video_id=video_id, | ||
| language=target_language_iso_code, | ||
| additional_notes=notes, | ||
| ) | ||
| except Exception as e: | ||
| logger.error(f"Translation failed: {e}") | ||
| subtitles_content.status = MsgStatus.error | ||
| subtitles_content.status_message = "Translation failed. Please try again." | ||
| self.output_message.publish() | ||
| return AgentResponse( | ||
| status=AgentStatus.ERROR, | ||
| message=f"Translation failed: {str(e)}", | ||
| ) | ||
| else: |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, get an overview of the file and find the function containing lines 429-483
wc -l backend/director/agents/subtitle.pyRepository: video-db/Director
Length of output: 100
🏁 Script executed:
# Read the relevant section with context to understand the function signature and variable initialization
sed -n '380,490p' backend/director/agents/subtitle.py | cat -nRepository: video-db/Director
Length of output: 6505
🏁 Script executed:
# Get the function signature to see all parameters and their defaults
sed -n '410,430p' backend/director/agents/subtitle.py | cat -nRepository: video-db/Director
Length of output: 1447
🏁 Script executed:
# Look for the function definition line that contains the parameters
grep -n "def.*subtitle" backend/director/agents/subtitle.pyRepository: video-db/Director
Length of output: 43
🏁 Script executed:
# Get the complete function signature
sed -n '395,425p' backend/director/agents/subtitle.py | cat -nRepository: video-db/Director
Length of output: 2325
🏁 Script executed:
# Search for the function definition starting with "def" around line 380
sed -n '370,405p' backend/director/agents/subtitle.py | cat -nRepository: video-db/Director
Length of output: 1757
🏁 Script executed:
# Now check the complete flow after line 483 to see if there's any other validation or context
sed -n '429,520p' backend/director/agents/subtitle.py | cat -nRepository: video-db/Director
Length of output: 4821
Validate target_language_iso_code before translating.
When needs_translation is true, translate_transcript is called with target_language_iso_code even though it defaults to None. Add an early validation guard to provide a clear error message when the ISO code is missing.
Suggested guard
if needs_translation:
+ if not target_language_iso_code:
+ subtitles_content.status = MsgStatus.error
+ subtitles_content.status_message = (
+ "Please provide target_language_iso_code for translation."
+ )
+ self.output_message.publish()
+ return AgentResponse(
+ status=AgentStatus.ERROR,
+ message="Missing target_language_iso_code for translation.",
+ )
self.output_message.actions.append(
f"Translating subtitles from {video_language} to {target_language}.."
)🧰 Tools
🪛 Ruff (0.14.13)
474-474: Do not catch blind exception: Exception
(BLE001)
475-475: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
481-481: Use explicit conversion flag
Replace with conversion flag
(RUF010)
🤖 Prompt for AI Agents
In `@backend/director/agents/subtitle.py` around lines 429 - 483, When
needs_translation is true we may call translate_transcript with a None
target_language_iso_code; add an early validation in the subtitle generation
flow to check target_language_iso_code before calling
VideoDBTool.translate_transcript (e.g. right where needs_translation is True)
and if it's missing set subtitles_content.status = MsgStatus.error, set a clear
subtitles_content.status_message and publish the output_message, then return an
AgentResponse with AgentStatus.ERROR and a descriptive message; ensure you
reference needs_translation, target_language_iso_code, translate_transcript,
subtitles_content and output_message in the guard so the error path mirrors the
existing exception path.
subtitleandeditingagent with the updatedvideodb.editormoduleSummary by CodeRabbit
New Features
Refactor
Dependencies
✏️ Tip: You can customize this high-level summary in your review settings.