Multimedia collaboration with Gemini to create visual storyboards for children on the autism spectrum.
StepPrep is a multimodal application designed to help parents with children on the austism spectrum prepare for upcoming events by generating visual storyboards. Storyboarding helps the child understand what is happening, the steps involved and allows everyone to anticipate challenges before they arise.
The app leverages Gemini 3's advanced multimodal capabilities (specifically the Gemini Live API for real-time audio/speech interaction, plus image generation) and is built entirely on Google Cloud serverless infrastructure.
You can read more about it in our blog post or view a short video of it in action!
- Frontend: React (Progressive Web App - PWA) - Provides excellent out-of-the-box support for the Web Audio APIs and WebSockets required by the Gemini Live API, matching Google's primary implementation examples. Packaged as a PWA, it will still offer a native-like mobile experience for parents during the event.
- Backend: Python (FastAPI) deployed on Google Cloud Run - Handles orchestration of AI calls, secure proxying for WebSockets, and business logic. Cloud run hosts both the front and backend in separate services allowing for autoscaling, observability and production deployment patterns.
- Database: Firebase Firestore - Serverless NoSQL document database to store storyboard states, steps, and metadata. Allows real-time syncing across devices.
- Storage: Google Cloud Storage / GCS Storage - Stores generated media for future reference using the permanent URL.
- AI/ML: Google Vertex AI / Gemini Live API - For real-time conversational processing of audio input via WebSockets, generating ideation steps, function calling and creating images. Uses the GenAI SDK against VertexAI for production-grade deployment, observability, scaling.
The workflow is designed to seamlessly weave text, images and audio together into an easy to use, productive, iterative session to produce helpful storyboards.
-
Input & Ideation (Gemini Live API):
- User engages in a real-time voice conversation describing the event (e.g., "We are going to a restaurant tomorrow for lunch").
- Frontend establishes a WebSocket connection with the Backend, which proxies a secure connection to the Gemini Live API (or connects directly if using secure client-side tokens).
- User and Gemini 3 converse to brainstorm and extract key milestones interactively.
- Gemini 3 generates a structured JSON response (via function calling or structured output) with the steps (Arrive, Host, Seat, Order, Eat, Pay, Leave) and suggested visual themes.
-
Review & Customization:
- Frontend displays the steps and themes.
- Parent interactively discusses edits to the steps with Gemini over voice, selects a theme, and approves the storyboard.
-
Storyboard Generation:
- Backend receives the approved storyboard.
- Backend iterates through steps, calling Gemini 3 / Vertex AI Image Generation to create a storyboard in the selected theme.
- Gemini's thoughts during image generation are displayed via the front end as it works through the task.
- Media is saved to Cloud Storage.
- Document is updated in Firestore with media URLs.
-
Execution (The Event):
- Each storyboard is provided a unique ID for the storyboard, accessible via a permanent URL.
- UI presents the storyboard in a "Step-Through" mode.
- Child/Parent marks items as "Done" with a clear visual indicator.
flowchart TB
subgraph Frontend [Client-Side / React PWA]
UI[React User Interface]
WebAudio[Web Audio API & Worklets]
end
subgraph GCP [Google Cloud Platform]
subgraph Compute [Serverless Compute]
FastAPI[Python FastAPI Backend]
CloudRun((Google Cloud Run))
FastAPI -.- CloudRun
end
subgraph GenAI [Vertex AI Models]
LiveAPI[Gemini Multimodal Live API<br/>Real-time Audio/Text]
ImageModel[Gemini Nano Banana / Gemini<br/>Storyboard Panels]
end
subgraph Data [Data & Storage]
Firestore[(Firebase Firestore)]
Storage[(Google Cloud Storage)]
end
end
%% Client Interactions
WebAudio <-->|1. PCM Audio WebSockets| FastAPI
UI -->|2. POST / SSE Text Stream| FastAPI
UI -.->|3. Real-time Listen| Firestore
UI -.->|4. Load Generated Images| Storage
%% Backend Interactions
FastAPI <-->|5. Bi-directional WebSockets| LiveAPI
FastAPI -->|6. Generate Panel Requests| ImageModel
ImageModel -->|7. Return Image Bytes| FastAPI
FastAPI -->|8. Upload Assets| Storage
FastAPI -->|9. Save App State & URLs| Firestore
classDef gcp fill:#e3f2fd,stroke:#1e3a8a,stroke-width:2px,color:#1e3a8a;
classDef frontend fill:#f0fdf4,stroke:#047857,stroke-width:2px,color:#047857;
classDef ai fill:#fff1f2,stroke:#be123c,stroke-width:2px,color:#be123c;
classDef db fill:#fef3c7,stroke:#be123c,stroke-width:2px,color:#be123c;
class CloudRun gcp;
class UI,WebAudio frontend;
class LiveAPI,ImageModel ai;
class Firestore,Storage db;
- You have a CLI ticketing tool available to you called
tk. `tk --help`` for instructions on how to use it to plan tasks, mark them in progress, update them and mark them done as needed. - The current directory is initialized with a python virtual environment via
uv. Please always use this virtual environment or uv for executing, installilng any python libraries. - If you need to pick a google cloud region please use us-central1
- The project does not have any services enabled, you can use gcloud commands to enable and deploy services as needed.
- When using firestore, please use the 'default' database to make use of the free tier provided by firestore.
- If you need to run docker locally, we are using the colima docker desktop workalike. Colima is already running and should accept native docker commands.
This project features and automated build pipeline based on Google's cloud build system and it's native integrations to github. A push to the 'main' branch will trigger a build guided by custom cloudbuild.yaml files to either perform a terraform plan or apply as desired.
This project follows the pattern established in https://github.com/jeffbryner/gcp-cloudrun-adkwebv2 but uses a single, pre-established GCP project meant to house the production instance for rapid iteration.
To get started take the repo and bootstrap ourselves into a GCP cloudbuild pipeline.
- fork the repo, clone locally and operate in the main branch
- set the varables in the .tfvars files (use .tfvars.example as a guide)
- open a shell in cicd/prod
- render the backend.tf file inert (we don't have a bucket yet) by renaming to backend.tf.inert
- run
terraform initto initialize terraform and providers. - run
terraform plan -target=module.gcp_project_setupto check the bootstrap build plan - run
terraform apply -target=module.gcp_project_setupto bootstrap the project and build pipeline
Note that terraform may not complete due to some chicken/egg problems.
Some services may not complete activiation: Solution: wait a bit to allow activation and retry Authorization: If you do not have the google cloudbuild app for github installed, you'll need to follow steps below
You will need to authorize the google cloudbuild app to access your github repo. You can use a URL like this using your project name to allow access: https://console.cloud.google.com/cloud-build/triggers;region=global/connect?project=123456789
Clicking it will take you to GCP to complete the authorization.
Before turning things over to the CICD pipeline, you will need to set the state bucket:
Rename backend.tf.inert to backend.tf to enable state to be stored in the bucket created in the bootstrap step.
Then re-init terraform to allow it to transfer state to GCS: From the /cicd/prod directory
terraform init -force-copy -backend-config="bucket=<name of the bucket from terraform -output>"
Lastly, to avoid terraform vars ending up in a repo AND to allow our CICD pipeline to use the terraform state in the bucket we will add variables to the 'tfvars' google cloud secret.
Create a text file with the following variables: (don't include the <> brackets, but do enclose in quotes)
project_name = "<your full name for the project>"
github_org = "<your githug org name>"
github_repo = "<the github repo you want to pull code from>"
bucket = "<name of the bucket from terraform -output>"
In the GCP console for secret manager https://console.cloud.google.com/security/secret-manager Upload this file as a new 'version' of the 'tfvars' secret. This will be used by cloudbuild at build time. Note: Technically the bucket isn't a real terraform variable, but we store it here harmlessly as a way to avoid having to store extra secrets just for another variable.
Triggers for cloudbuild can be created directly in the cloudbuild console allowing you to choose whether you'd like the triggers to run automatically, manually or with approval. Reference the /prod/cloudbuild.yaml file for a trigger to run terraform plan and reference the cloudbuild-apply.yaml file for terraform apply
If you'd rather avoid terraform, you can ensure your Google Cloud project has all the necessary services enabled for the backend, Gemini generation, and future deployment, you can run the following gcloud commands:
Ensure your CLI is pointed to the correct project:
gcloud config set project prj-project-nameThis single command will enable all the foundational APIs required by our architecture (Vertex AI, Cloud Run, Cloud Storage, Firestore, and Cloud Build):
gcloud services enable \
aiplatform.googleapis.com \
run.googleapis.com \
cloudbuild.googleapis.com \
storage.googleapis.com \
firestore.googleapis.com \
firebase.googleapis.comIf you haven't already created the default Firestore database for the project, you can initialize it with this command (you can change --location to your preferred region, like us-east1 or europe-west1):
gcloud firestore databases create --location=us-central1 --type=firestore-nativeOnce these services are enabled, ensure you have project owner permissions and your local backend will have the correct permissions to interact with Vertex AI (Gemini/Imagen), Cloud Storage, and Firestore using your Application Default Credentials (gcloud auth application-default login).
Since we are using uv for package management, you can start the backend by running the following commands in your first terminal:
cd backend
# Install dependencies into a virtual environment using uv
uv venv
source .venv/bin/activate
uv pip install -r requirements.txt
# Ensure you have your Google Cloud credentials set up
# You can authenticate via the gcloud CLI if you haven't already:
# gcloud auth application-default login
# gcloud config set project <YOUR_PROJECT>
# You will need a gemini api key to be able to reliably create images.
# Get one via aistudio.gemini.com and either place it in a local .env file as
# GEMINI_IMAGE_API_KEY=AI......
# or add it as the 'latest' secret in secret manager
# Terraform will have created a secret, but not the value. You can add it via:
# https://console.cloud.google.com/security/secret-manager
# Start the server
uvicorn main:app --reload --port 8000The backend will now be running at http://localhost:8000.
In your second terminal window, run the following to start the React development server:
cd frontend
# Install the Node.js dependencies
npm install
# Start the Vite development server
npm run devThe frontend should now be running at http://localhost:5173.
Open http://localhost:5173 in your browser. You should be greeted by the LiveChat UI where you can test the microphone and real-time voice connection to the Gemini Live API!

