diff --git a/.gitignore b/.gitignore index 7595c72..54eca50 100644 --- a/.gitignore +++ b/.gitignore @@ -64,6 +64,17 @@ assets/final/*.mp4 assets/words/ assets/chained/ +# Phase 4 corpus — video bytes + per-clip pose JSON are huge; we only +# track the manifest JSON and the FAISS index file. +assets/corpus/openasl/ +assets/corpus/openasl_poses/ +assets/corpus/openasl_embeddings.npy +assets/corpus/aslcitizen/ +assets/corpus/aslcitizen_poses/ +assets/corpus/aslcitizen_embeddings.npy +# Phase 4 WLASL pose-library fallback output +assets/pose_library/ + # Pipeline stage disk cache (regenerated on first run) data/cache/ diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..d477d19 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,196 @@ +# CLAUDE.md — Working in this repository + +This file gives any AI assistant (Claude Code, Cursor, etc.) the +minimum context needed to make good edits here. **Read it once per +session, then defer to the canonical docs it points to.** + +--- + +## What this project is + +GenASL is an AI pipeline that produces a **3D ASL interpreter avatar** +overlay for YouTube videos. It mimics how a human interpreter works: +listen → analyse emotion + prosody → decide signing strategy with an +LLM → drive a Ready Player Me VRM avatar in the browser via three.js. + +> **Status:** prototype in build-out. Phase 1 (bootstrap) is shipped; +> Phases 2–7 are pending and have detailed plans under `docs/plan/`. + +--- + +## Read these before editing + +In this order: + +1. **[`README.md`](README.md)** — what the project does and how to run it. +2. **[`docs/architecture-overview.md`](docs/architecture-overview.md)** — canonical technical reference. +3. **[`docs/plan/README.md`](docs/plan/README.md)** — implementation roadmap; if you're working a specific phase, also read the matching `docs/plan/phase-N-*.md`. +4. **[`business/feasibility-study/01-technology-feasibility.md`](business/feasibility-study/01-technology-feasibility.md)** — why this architecture and not the others. + +If those four contradict this file, **the docs win**; flag the +inconsistency and ask before reconciling. + +--- + +## Non-negotiable invariants + +These come from the feasibility study and the user's explicit instructions. +Violating them invalidates the work. + +1. **No word-level ASL output.** Word-level gloss is a *valid internal + representation* inside `AslPlanSegment.sign_sequence`, but it is + **never surfaced to the user**. The Chrome extension never shows + gloss text. We do not ship the old WLASL clip-stitching pipeline. + +2. **Phrase-level retrieval-augmented, not pure generative.** Tightened + on 2026-05-24: every output segment's motion comes from a Deaf-signer + recording, *and the default tier is a continuous clip retrieved at + phrase level* from `assets/corpus/openasl/` (with ASL Citizen as a + lexical secondary). Per-gloss WLASL stitching from + `assets/pose_library/` is the last-resort fallback, always tagged + `fidelity="stitched"` (or `"degraded"` if > 50% of glosses miss). + AI orchestrates known-good primitives; generative steps only fill + *transitions* and *NMM augmentation on top of* the retrieved face. + If a phase implementation makes this invariant un-verifiable after + the fact, the phase plan is wrong — flag it before shipping. + +3. **Platform-agnostic and platform-pays.** The B2B monetization model + is platforms paying for the SDK, not end users paying for access. + Do not add consumer paywalls or restrict accessibility behind a + user-tier gate. Free for Deaf-led orgs is non-negotiable. + +4. **Per-stage disk cache or it doesn't ship.** Every pipeline stage + subclasses `Stage[InT, OutT]` from `src/pipeline/stages/base.py` + and implements a deterministic `fingerprint()`. Reruns must be + JSON-read fast. + +5. **Pydantic models, not dicts, between stages.** The schema in + `src/pipeline/models.py` is authoritative; new fields land there. + Bump `schema_version` only on a breaking change to `AvatarRenderPlan` + (current target: `5.1` once Phase 5 lands with the retrieval + metadata fields). + +6. **Market expansion, not substitution.** GenASL serves the underserved — content that today has no ASL at all because human interpretation isn't economically viable for it. Human interpreters remain the gold standard for live, high-stakes, nuanced settings, and broader ambient ASL exposure created by GenASL increases demand and visibility for their work. Public-facing copy must reflect this: we expand the pie, we don't take a slice from interpreters. + +--- + +## Repository layout (essential bits only) + +``` +src/ +├── api/server.py # /health, /asl/avatar +├── audio/ +│ ├── source_video.py # yt-dlp source MP4 (Stage 1 input) +│ └── ... # Phase 2 lands extractor, asr, prosody, emotion, analyzer +├── interpreter/ # Phase 3 — chunker, prompt, planner +├── avatar/ # Phase 4–5 — retrieval, pose extractor, vrm retarget, +│ # motion synth, NMM, vrm schema +├── core/ +│ ├── config.py # Pydantic Settings; get_settings() singleton +│ ├── paths.py # all filesystem paths +│ ├── ffmpeg.py # find_ffmpeg / find_ffprobe +│ └── logging.py +├── llm/providers/ # Ollama / Gemini / OpenAI; one chat() method +├── pipeline/ +│ ├── models.py # v5.0 Pydantic schema (authoritative) +│ ├── pipeline_avatar.py # InterpreterAvatarPipeline orchestrator +│ ├── run_pipeline.py # CLI entry +│ ├── io.py # save_avatar_plan + print_summary +│ └── stages/ +│ ├── base.py # Stage[InT, OutT] ABC + cache +│ └── ... # concrete stages land per phase plans +chrome-extension/ # MV3; Phase 6 wires three.js + VRM +docs/{architecture-overview, plan/, ...} +business/{README, feasibility-study/} +``` + +--- + +## Common commands + +```bash +# Tests +pytest tests/ -v + +# Run the pipeline CLI on a YouTube video ID +python -m src.pipeline.run_pipeline 31y2Bq1RYQA + +# Run the local API server +python -m src.api.server # http://127.0.0.1:8794 +curl http://127.0.0.1:8794/health +``` + +`config.yaml` (root) overrides Pydantic defaults from `src/core/config.py`. +API keys (`GEMINI_API_KEY`, `OPENAI_API_KEY`) come from the environment, +never from config. + +--- + +## Conventions + +- **Stages live in `src/pipeline/stages/.py`**, one class per + file, `name` class-var = snake_case matching the filename. +- **Domain logic** (the heavy lifting a stage delegates to) goes under + `src/{audio,interpreter,avatar}/` so stages stay thin and testable. +- **Tests** mirror module paths: `tests/test_.py`. New stage + tests follow `tests/test_stage_cache.py`. Integration smoke tests + follow `tests/test_avatar_pipeline_bootstrap.py`. +- **LLM access** goes through `src.llm.providers.make_provider`. + Never import `openai` directly outside the providers dir. +- **Paths** import from `src.core.paths`, never re-derive with + `Path(__file__).parents[N]`. +- **Heavy library imports** (faster-whisper, librosa, mediapipe) are + lazy — inside functions, not at module top-level — so importing a + module is free for tests that don't exercise it. +- **One-line module docstrings** on the first line stating purpose + and phase of origin. + +--- + +## What NOT to do + +- ❌ Resurrect the gloss pipeline. v4.0 schema, `Pipeline` class, + `compose_pip`, `transcript_ingestion`, and the WLASL clip-chaining + code are gone deliberately. Git history preserves them; don't + cherry-pick back into the active tree. +- ❌ Build a consumer payment tier or premium toggle. Platforms pay. +- ❌ Add a `mode` toggle returning to word-level output. There is one + pipeline mode now. +- ❌ Ship a pure-neural sign synthesiser (SignDiff/T2S-GPT style) + without the retrieval anchor. The corpus is the moat. +- ❌ Auto-install dependencies, modify `cookies.txt`, or commit secrets. + `cookies.txt` is tracked but session-refresh diffs to it should be + reverted, not pushed. +- ❌ Edit `src/pipeline/models.py` shapes without bumping + `schema_version` if it would break the extension's JSON consumer. +- ❌ Skip the `fingerprint()` on a new stage. "It's just a prototype" + is not an excuse; cache invariants are load-bearing. + +--- + +## When something is unclear + +1. Check `docs/architecture-overview.md` — it's the canonical reference. +2. Check the matching `docs/plan/phase-N-*.md` for the phase you're in. +3. Check the feasibility study under `business/feasibility-study/` + for the *why*. +4. If still unclear, leave a `# TODO(phaseN-clarify):` comment and a + brief note in the phase doc's **Open questions** section. Ship the + rest; don't block. + +--- + +## Phase status (mirror of `docs/plan/README.md`) + +| Phase | Status | +|-------|--------| +| 1 — Bootstrap | **Done** | +| 2 — Audio backbone | **Done** | +| 3 — Interpreter brain | **Done** | +| 4 — Corpus retrieval (OpenASL + ASL Citizen; WLASL fallback) | Pending | +| 5 — Motion synthesis (retrieval-driven) + NMM | Pending | +| 6 — Chrome extension VRM | Pending | +| 7 — API + end-to-end | Pending | + +When you ship a phase, update **both** this table and +`docs/plan/README.md`. diff --git a/README.md b/README.md index 03e0107..133b9f8 100644 --- a/README.md +++ b/README.md @@ -153,8 +153,8 @@ that any contributor (human or AI) can pick up a phase cold: | Phase | What it delivers | Status | |---|---|---| | [1 — Bootstrap](docs/plan/phase-1-bootstrap.md) | Config sections, v5.0 schema, skeleton, mode toggle | **Done** | -| [2 — Audio backbone](docs/plan/phase-2-audio-backbone.md) | Whisper + librosa + emotion → `AudioAnalysis` | Pending | -| [3 — Interpreter brain](docs/plan/phase-3-interpreter-brain.md) | LLM persona producing `AslPlanSegment` | Pending | +| [2 — Audio backbone](docs/plan/phase-2-audio-backbone.md) | Whisper + librosa + emotion → `AudioAnalysis` | **Done** | +| [3 — Interpreter brain](docs/plan/phase-3-interpreter-brain.md) | LLM persona producing `AslPlanSegment` | **Done** | | [4 — Pose library](docs/plan/phase-4-pose-library.md) | Mediapipe → per-gloss joint-angle JSON | Pending | | [5 — Motion synthesis + NMM](docs/plan/phase-5-motion-synthesis.md) | Retrieve + spline + prosody-driven NMM | Pending | | [6 — Chrome extension VRM](docs/plan/phase-6-chrome-extension-vrm.md) | three.js + @pixiv/three-vrm in PiP | Pending | diff --git a/business/01-executive-summary.md b/business/01-executive-summary.md index f7ea4d0..e1950c3 100644 --- a/business/01-executive-summary.md +++ b/business/01-executive-summary.md @@ -1,27 +1,45 @@ # 1 — Executive Summary -> **The verdict in one line:** GenASL is a feasible and innovative project with a real business path — **if** it pivots from "ASL replacement for captions" to **"ASL augmentation layer for regulated video content and ASL learners,"** and prioritizes Deaf-community co-design before any paid GTM. +> **The verdict in one line:** GenASL is a feasible, innovative, and business-viable +> accessibility-infrastructure play — **if** it ships the *retrieval-augmented, +> grammar-aware* ASL avatar it has committed to, sells to **platforms** (not Deaf +> viewers), and makes Deaf-community co-design the first hire rather than the last check. --- -## The opportunity in three facts +## The committed approach in one paragraph + +A production GenASL ingests speech, chunks it on prosody and clause boundaries, translates +to an ASL *plan* with an LLM (gloss + topic-comment structure, classifiers, role shifts, +question/negation flags — **internal only, never shown to a user**), and drives a rigged +VRM avatar with motion that is **anchored to real Deaf-signer recordings**. The default +tier retrieves a *continuous clip at the phrase level* from a real corpus; a lexical +secondary covers gaps; per-gloss stitching is a tagged last resort. Generative steps fill +*only* transitions and the non-manual-marker (NMM) channel synthesised from prosody. The +result is a **platform-agnostic ASL track** — a JS SDK any video player embeds, billed B2B +per minute. This is the "middle" of ASL: more than word clips, short of pure neural +synthesis. It needs clean data, compute, and Deaf partnership — and that cost *is* the moat. + +--- + +## The opportunity in three facts (refreshed May 2026) | | Fact | Source | |---|------|--------| -| 1 | **~48 million** US adults report hearing loss; ~2M are functionally Deaf; ~500k–1M use ASL as a primary language. Worldwide, the WFD estimates **~72M Deaf signers**. | [NIDCD](https://www.nidcd.nih.gov/health/statistics/quick-statistics-hearing); [WFD](https://wfdeaf.org/) | -| 2 | **Digital accessibility lawsuits hit 4,187 in 2024**, pacing **+37% in 2025**, with settlements **$10k–$75k per violation**. ADA Title II compliance deadline for state/local gov is **April 24, 2026**. The EU Accessibility Act began enforcement **June 28, 2025**. | [Deque](https://www.deque.com/blog/companys-videos-sued-ada-noncompliance/); [3Play Media](https://www.3playmedia.com/blog/european-accessibility-act-eaa/) | -| 3 | The **closed-captioning market is ~$2.5B in 2025**, projected to ~$8B by 2033 at **~15% CAGR**. North America is ~40% of the global market. ASL is the next compliance frontier as captions become commoditized. | [GlobalGrowthInsights](https://www.globalgrowthinsights.com/market-reports/captioning-and-subtitling-market-111936) | +| 1 | **~70M Deaf signers worldwide** (WFD), across 300+ sign languages. In the US: ~500k–1M primary ASL users, ~6.4–7.0M total signers (~2.8% of adults), ~2M functionally Deaf, ~48M with some hearing loss. | [WFD](https://wfdeaf.org/); [ASL Bloom](https://www.aslbloom.com/blog/how-many-people-use-asl); [NIDCD](https://www.nidcd.nih.gov/health/statistics/quick-statistics-hearing) | +| 2 | **The compliance runway moved toward us.** The ADA Title II web deadline was **extended to April 26, 2027** (≥50k pop.) / **2028** (smaller) by a DOJ interim final rule effective April 20, 2026 — which explicitly cites *the limits of current AI to remediate accessibility at scale*. The EU Accessibility Act has been live across 27 states since **June 28, 2025** and names sign-language interpretation for audiovisual media. | [Federal Register](https://www.federalregister.gov/documents/2026/04/20/2026-07663/extension-of-compliance-dates-for-nondiscrimination-on-the-basis-of-disability-accessibility-of-web); [3Play — EAA](https://www.3playmedia.com/blog/european-accessibility-act-eaa/) | +| 3 | **Digital-accessibility litigation rebounded** to ~3,900 filings in 2025 (+24% YoY), and **sign-language-specific markets are growing 8–20% CAGR** — interpretation services ~$0.89B (2026) → $1.72B (2034); translation software ~$0.5–1.2B (2026) → $2.5–4.5B (2033). | [EcomBack](https://www.ecomback.com/annual-2025-ada-website-accessibility-lawsuit-report); [Business Research Insights](https://www.businessresearchinsights.com/market-reports/sign-language-interpretation-services-market-112737) | --- ## What GenASL does well (and doesn't) -| Strength | Weakness | -|----------|----------| -| **Hybrid retrieval architecture** (LLM gloss + WLASL clips) is cheaper, more deterministic, and easier to QA than pure neural avatar synthesis. | **Word-level gloss is not real ASL.** It lacks ASL grammar (topic-comment structure, classifiers, non-manual markers). Native Deaf signers will reject it for primary consumption. | -| **Browser overlay** is the right distribution surface — it meets users on the platforms they already use (YouTube), instead of forcing them to a destination site. | **WLASL has known label-quality issues**, and 2,000 glosses ≈ a fraction of conversational ASL vocabulary. Coverage will be a persistent ceiling. | -| **Provider-agnostic LLM layer** (Ollama, Gemini, OpenAI) means enterprises can self-host — a real wedge against incumbents like 3Play that require cloud. | **Single-platform (YouTube) + dependency on `youtube-transcript-api`** is fragile. Any TOS change breaks distribution. | -| **Pipeline architecture is clean** (recent refactor to a staged `Pipeline` class) — readable, testable, well-documented. | **No Deaf-community validation yet.** Sprint docs explicitly mark this as a student PoC. This is the most important blocker for monetization. | +| Strength | Weakness / open risk | +|----------|----------------------| +| **Retrieval anchoring to Deaf-signer recordings** bounds the failure modes that sink pure-neural avatars (no six-fingered hands), and produces an *auditable* artifact a compliance officer can defend. | **Clean data is the hard part.** Phrase-level retrieval needs a curated, consented, NMM-annotated corpus. Public datasets (OpenASL, ASL Citizen) are the floor; the proprietary corpus is a multi-quarter, paid-Deaf-signer effort. | +| **Grammar-aware plan stage** encodes topic-comment, classifiers, and NMMs as explicit labels — the structure word-level systems can't represent. | **Idiomatic / classifier-heavy / narrative ASL is fundamentally generative**, not lexical. Poetry and storytelling stay out of scope for years; this must be disclosed, not hidden. | +| **Platform-agnostic SDK** meets viewers on the platforms they already use and removes single-platform (YouTube) dependency. | **Incumbent risk is now live.** [Sorenson acquired Hand Talk + OmniBridge](https://sorenson.com/newsroom/sorenson-acquires-omnibridge-and-hand-talk-to-develop-automated-sign-language-translation-capabilities/) and is demoing ASL avatars. The window is ~24 months. | +| **Per-stage cached, Pydantic-typed pipeline** (Phases 1–3 shipped: audio backbone + interpreter brain) is real, testable, and reproducible — not a slide. | **No Deaf-community validation yet.** This is the single most important blocker for monetisation and the first gate in the plan. | --- @@ -29,36 +47,39 @@ | Layer | Definition | Size | |-------|------------|------| -| **TAM** | Global video accessibility tools (captioning, audio description, sign language, transcription) | **~$3.0B in 2026**, growing to ~$8B by 2033 | -| **SAM** | English-speaking markets requiring ASL/BSL for regulated digital video (US, UK, CA, AU, IE) | **~$650M** addressable in 2026 | -| **SOM** | Realistic 5-year capture: 0.5% of SAM through education + mid-market enterprise + creator tools | **~$15–25M ARR by year 5** | +| **TAM** | Global video-accessibility tooling (captioning, audio description, sign language, transcription) | **~$3.5–4B in 2026**, ~$8–10B by early 2030s | +| **SAM** | English-speaking regulated digital video (US, UK, CA, AU, IE), ASL/BSL slice | **~$750M in 2026**, ~$1.8B by 2030 | +| **SOM** | Realistic 5-year capture via platform-pays B2B | **~$22M ARR by Year 5** (~3–5% of SAM) | +| **Induced** | Net-new ASL-content market the tool *creates* (see [F3](feasibility-study/03-market-expansion.md)) | **~+$4.5B/yr by 2035** (~3× baseline) | --- ## The product evolution path -GenASL today is a **demo**. The path to a defensible business has three rungs: +GenASL today is a **working pipeline through Phase 3**. The path to a defensible business +runs through data and trust, not features: ``` ┌─────────────────────────────────────────────────────────────┐ - │ YEAR 1 → EDUCATION WEDGE │ - │ K-12 + community college ASL learners; Chrome extension │ - │ freemium + $9/mo individual; B2B school district pilot │ - │ Gross profit goal: break-even on infra; learn product │ + │ M0–M6 → FOUNDATION & DATA │ + │ Deaf advisory board + first Deaf hire; corpus from public │ + │ sets (OpenASL/ASL Citizen) + first proprietary capture; │ + │ Phases 4–5 (retrieval + motion synth) land │ + │ Goal: intelligibility ≥ 3.5/5 on a Deaf-rater panel │ └─────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────┐ - │ YEAR 2 → ENTERPRISE AUGMENTATION LAYER │ - │ LMS, MOOC, gov portal video — ASL-on-top-of-captions │ - │ $0.40–$1.20/min pricing, audited compliance reports │ - │ Self-hosted option for regulated buyers │ + │ M6–M18 → PLATFORM SDK + FIRST PAID CONTRACTS │ + │ Phases 6–7 (VRM extension + API); platform-agnostic SDK; │ + │ compliance reporting (WCAG/EAA/508); 3–5 paid pilots │ + │ Goal: $300k+ ARR; SOC 2 Type I; panel ≥ 3.8/5 │ └─────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────┐ - │ YEAR 3+ → GENERATIVE ASL PLATFORM │ - │ Deaf-led co-design; sentence-level synthesis; │ - │ white-label SDK for creators, EdTech, telehealth │ - │ Defensible moat: certified ASL corpus + community trust │ + │ M18–M24+ → SCALE │ + │ 10+ platform contracts; Tier-3 strategic pipeline; │ + │ BSL/AUSLAN reuse of the same architecture │ + │ Goal: ~$2M ARR run-rate; Series A; panel ≥ 4.0/5 │ └─────────────────────────────────────────────────────────────┘ ``` @@ -66,31 +87,51 @@ GenASL today is a **demo**. The path to a defensible business has three rungs: ## Why this is innovative -Existing AI sign-language tools fall into two camps and both have problems: +Existing AI sign-language tools fall into camps that each hit a wall: | Camp | Examples | Limitation | |------|----------|------------| -| **Pure avatar synthesis** | Signapse, SignAvatar, Hand Talk's Hugo | High-effort 3D avatar; Deaf community pushback on lack of facial grammar; expensive to render | -| **Translation-as-a-service** | SignAll, Sorenson AI | Heavy ML stack; cloud-only; designed for interpreting, not media | +| **Word/clip retrieval** | Old GenASL PoC; Hand Talk clip mode | No grammar, no NMMs; not real ASL | +| **Notation-driven avatar** | JASigning (HamNoSys), Paula | Every sign hand-authored by linguists; doesn't scale | +| **MoCap playback** | Signapse (Kara avatar) | Vocabulary bounded by what was captured; coverage scales linearly with studio time | +| **End-to-end neural** | SignDiff, T2S-GPT; Sorenson's text-to-sign POC | BLEU-4 still in the teens; hallucinated handshapes; uncanny faces | -**GenASL is the first credible attempt to be a *browser-native overlay* using a *retrieval-augmented* approach.** That makes it cheaper to ship, easier to audit, and uniquely positioned for the regulated-video market where deterministic outputs are a feature, not a bug. +**GenASL's lane is the uncontested fifth: retrieval-augmented + parallel-NMM + SDK +distribution + platform-pays.** It is cheaper to QA, defensible by corpus ownership, and +the only approach that simultaneously clears fidelity, Deaf-acceptance, and auditability +bars (full scoring in [03-competitive-landscape.md](03-competitive-landscape.md)). --- ## Why this is risky -Three risks dominate. All are surmountable but must be confronted directly. +Three risks dominate; all are surmountable but must be confronted directly. -1. **Cultural-acceptability risk.** Research consistently shows Deaf users reject avatars / synthetic ASL that lack non-manual markers and authentic grammar (see [PMC 8866438](https://pmc.ncbi.nlm.nih.gov/articles/PMC8866438/)). The mitigation is co-design and explicit positioning ("ASL augmentation, not interpretation"). -2. **Platform risk.** YouTube can break the transcript API, throttle extensions, or ship native ASL features. The mitigation is multi-platform support (Vimeo, Coursera, Brightcove, Kaltura) and a B2B SDK that runs without YouTube at all. -3. **Coverage risk.** 2,000 glosses ≈ ~70% lexical coverage of common educational content but ~40% of conversational content. The mitigation is corpus expansion via a paid Deaf signer panel — which doubles as a community-trust signal. +1. **Cultural-acceptability risk.** Deaf users reject avatars that lack NMMs and authentic + grammar ([PMC 8866438](https://pmc.ncbi.nlm.nih.gov/articles/PMC8866438/)). Mitigation: + Deaf-led co-design from day 0; explicit "augmentation, not replacement" position; + compensated corpus contributors. This is a gate, not a workstream. +2. **Incumbent / platform-build risk.** Sorenson is moving; a platform could ship native + ASL. Mitigation: move first, win 3+ platform reference logos by month 18, differentiate + on Deaf-trust + auditable corpus + media-overlay (not point-of-service) use case; + acquisition by an incumbent is a legitimate outcome, not only a threat. +3. **Data / coverage risk.** Long-tail vocabulary (medical, legal, technical) and + classifier-heavy ASL are hard. Mitigation: domain-specific capture in later phases; + honest scope disclosure; phrase-level retrieval degrades gracefully (tagged fidelity). -Full risk register in [06-go-to-market-and-risk.md](06-go-to-market-and-risk.md). +Full register in [06-go-to-market-and-risk.md](06-go-to-market-and-risk.md). --- ## Recommendation -**Continue. Pivot from "consumer ASL captions" to "education + enterprise compliance augmentation."** The architecture and team are good. The product needs a sharper wedge and a Deaf-community-first validation loop. The market is unambiguously real, mandated by law in two of the world's largest economies, and underserved by current solutions. +**Proceed — on the committed thesis, not the old one.** Build the retrieval-augmented, +grammar-aware avatar; raise a real seed (~$4–5M, not a pre-seed bridge); hire Deaf-first; +sell only to platforms. The architecture is sound, Phases 1–3 are shipped, the corpus is +the moat, and the regulatory runway (extended ADA Title II, live EAA) lands inside the +24-month build window. -See [04-value-proposition.md](04-value-proposition.md) for the product strategy and [06-go-to-market-and-risk.md](06-go-to-market-and-risk.md) for the 24-month operating plan. +**If the four conditions in [§5.2 of the verdict](feasibility-study/05-feasibility-verdict.md) +cannot be met in their time-frames, stop and reorganise as a research / open-source +contribution.** That is a legitimate outcome — and far better than a venture that fails for +the wrong reasons in year 3. diff --git a/business/02-market-analysis.md b/business/02-market-analysis.md index 31d1631..358d750 100644 --- a/business/02-market-analysis.md +++ b/business/02-market-analysis.md @@ -1,12 +1,16 @@ # 2 — Market Analysis -This section sizes the opportunity from three angles: **who needs ASL**, **why someone will pay for it**, and **how big the addressable market actually is**. +This section sizes the opportunity from four angles: **who needs ASL**, **why someone +will pay for it**, **how big the addressable market is**, and — the question the old plan +under-counted — **how much a credible tool *grows* that market**. --- ## 2.1 — Population & demand: who actually uses ASL -Sign-language demographics are noisy because surveys conflate three different populations: people with hearing loss, people who are functionally Deaf, and people who use a signed language daily. The numbers below separate them. +Sign-language demographics are noisy because surveys conflate three populations: people +with hearing loss, people who are functionally Deaf, and people who use a signed language +daily. The numbers below separate them and are refreshed to May 2026. ### United States @@ -15,122 +19,184 @@ Sign-language demographics are noisy because surveys conflate three different po | Adults reporting **some hearing loss** | **~48 million** | NIDCD; broadest definition | | **Functionally Deaf** adults | **~2 million** | Cannot hear normal conversation | | **Culturally Deaf** (capital-D, ASL-using community) | **~500,000 – 1,000,000** | Primary-language ASL users | -| Adults claiming **some sign-language knowledge** | **~6.4 – 7.0 million** | ACS 2014 extrapolation; ~83% hearing | -| **ASL learners** (high school + college + adult ed) | **~250,000 – 500,000 active learners/year** | ASL is the 3rd most-studied language in US universities | +| Adults using **some sign language** | **~6.4 – 7.0 million** | ~2.8% of US adults; ~83% hearing | +| **ASL learners** (HS + college + adult ed) | **~250,000 – 500,000 active/year** | ASL is the **3rd most-studied language** in US universities | -Source: [NIDCD](https://www.nidcd.nih.gov/health/statistics/quick-statistics-hearing), [RIT InfoGuides](https://infoguides.rit.edu/c.php?g=380750&p=9393643), [ASL Bloom](https://www.aslbloom.com/blog/how-many-people-use-asl). +Sources: [NIDCD](https://www.nidcd.nih.gov/health/statistics/quick-statistics-hearing), +[ASL Bloom](https://www.aslbloom.com/blog/how-many-people-use-asl), +[RIT InfoGuides](https://infoguides.rit.edu/c.php?g=380750&p=9393643). ### Global | Population | Estimate | |-----------|----------| | People with disabling hearing loss worldwide (WHO) | ~430 million | -| Deaf signers globally (WFD) | **~72 million** | -| Recognized national sign languages | 200+ (only ~82 with legal recognition as of 2025) | +| Deaf signers globally (WFD) | **~70 million** | +| Sign languages in use | 300+ (only ~82 with legal recognition) | Source: [WFD](https://wfdeaf.org/). ### What this means for GenASL -- **The "primary user" market is small but high-conviction.** ~1M ASL-first users in the US is a niche by mass-consumer standards, but they are highly engaged, advocacy-organized, and legally protected. -- **The "ASL-adjacent" market is 10–15× larger.** ASL learners, families of Deaf children (CODAs), interpreters in training, healthcare workers — these are the populations who will pay for an *imperfect* learning aid where Deaf-native users won't. -- **Globally, ASL is *one* of 200+ signed languages.** GenASL's English+gloss approach generalizes to BSL, AUSLAN, and PSE (Pidgin Signed English). Brazil's Hand Talk has shown that a regional signed-language SaaS can hit 10M+ downloads. +- **The primary-user market is small but high-conviction.** ~1M ASL-first users in the US + is a niche by mass-consumer standards, but they are highly engaged, advocacy-organised, + and legally protected. *They are not the customer — they are the reason the customer pays.* +- **ASL is a growth language.** In the most recent MLA census, ASL was one of only three + languages (with Korean and biblical Hebrew) whose US university enrolments *grew* + (~108k enrolees, 487 programmes). The learner market is expanding while most language + study contracts. +- **The architecture generalises.** English→ASL-plan→retrieval generalises to BSL, AUSLAN, + and other signed languages with their own corpora — a reuse path, not a rebuild. --- -## 2.2 — Demand drivers: why someone will write a check +## 2.2 — Demand drivers: why a platform writes a check -Three forces are pulling money into accessible video. GenASL must align with at least one. +Three forces pull money into accessible video. GenASL must align with at least one; it +aligns with all three. -### Driver A — Compliance & litigation +### Driver A — Compliance & litigation (refreshed) -| Lever | Detail | -|-------|--------| -| **ADA Title II deadline** | April 24, 2026 — state and local government websites must meet WCAG 2.1 AA. Video content is in scope. | -| **ADA litigation volume** | 4,187 digital-accessibility lawsuits in 2024, pacing **+37% in 2025**. ~77% target companies with **< $25M revenue** — i.e. the mid-market is the litigation hotspot. | -| **Per-violation settlements** | Typically **$10,000 – $75,000**, plus remediation costs. | -| **EU Accessibility Act (EAA)** | Enforcement began June 28, 2025 across 27 member states. Audiovisual media services must offer captions, audio description, **and sign-language interpretation** for certain content types. | -| **Section 508 / CVAA** | US federal procurement requires accessible video; CVAA covers online video that previously aired on TV. | +| Lever | Detail (May 2026) | +|-------|--------------------| +| **ADA Title II — deadline *extended*** | The web/mobile accessibility deadline moved from April 24, 2026 to **April 26, 2027** (entities ≥50k pop.) and **April 26, 2028** (smaller / special districts), via a DOJ interim final rule effective April 20, 2026. The rule explicitly cites *the limits of current technology, including generative AI, to automate accessibility remediation at scale* — a buying signal as much as a delay. ([Federal Register](https://www.federalregister.gov/documents/2026/04/20/2026-07663/extension-of-compliance-dates-for-nondiscrimination-on-the-basis-of-disability-accessibility-of-web)) | +| **ADA litigation volume** | After dipping in 2023–24, filings **rebounded to ~3,900 in 2025 (+24% YoY)**; H1 2025 was +37% vs. H1 2024. Settlements run **$10k–$75k per violation** plus remediation. ([EcomBack](https://www.ecomback.com/annual-2025-ada-website-accessibility-lawsuit-report)) | +| **EU Accessibility Act (EAA)** | Live across all 27 member states since **June 28, 2025**. Audiovisual media services must provide captions, audio description, and — where appropriate — **sign-language interpretation**. Penalties vary (Italy: up to 5% of turnover; Germany: up to €100k). ([3Play — EAA](https://www.3playmedia.com/blog/european-accessibility-act-eaa/)) | +| **Section 508 / CVAA** | US federal procurement requires accessible video; CVAA covers online video previously aired on TV. | -Source: [Deque](https://www.deque.com/blog/companys-videos-sued-ada-noncompliance/), [3Play Media — EAA guide](https://www.3playmedia.com/blog/european-accessibility-act-eaa/). +**Why the extension *helps* GenASL.** A deadline already in the past creates remediation +panic that favours quick caption fixes. A deadline in **2027–2028** creates a procurement +*planning* window — exactly the horizon on which a buyer can adopt a new ASL line item, and +exactly the 24 months GenASL needs to ship a defensible product. The DOJ itself flagging +that current AI can't remediate at scale is an invitation to the vendor who can. -### Driver B — Pure quality / UX gap +### Driver B — The quality / UX gap captions don't close -YouTube auto-captions still have a **~30% error rate** on real-world video. For Deaf viewers, that means **1 in 3 words is wrong**. AI transcription benchmarks at 80–95% accuracy — below the 99% threshold needed for accessibility-grade output ([Taption](https://www.taption.com/blog/en/video-accessibility-compliance-2025-en)). The captioning industry has not solved this; ASL adds an *additional* channel rather than fixing captions. +YouTube auto-captions still carry a **~30% error rate** on real-world video; AI +transcription benchmarks at 80–95%, below the 99% accessibility threshold +([Taption](https://www.taption.com/blog/en/video-accessibility-compliance-2025-en)). ASL +is not a better caption — it is a *different channel*, and for ~1M primary ASL users it is +the channel in their first language. Captions and ASL are complements, not substitutes. -### Driver C — ASL is the next vertical for the captioning industry +### Driver C — ASL is the next vertical for the accessibility industry -The captioning market is consolidating around 3Play, Verbit, Rev, AI Media. ASL is the next product surface for these incumbents to upsell. GenASL is either: -- a **feature** they'll build internally (acquisition exit), or -- a **specialized layer** that integrates with their pipelines (partnership/SDK play). - -Either way, demand for "ASL on top of existing captioning" is forming, not stagnant. +The captioning market is consolidating around 3Play, Verbit, Rev, and AI Media; ASL is +the next surface to upsell. The defining 2025–26 event proves it: **Sorenson** — the +incumbent VRS provider — [acquired Hand Talk and OmniBridge](https://sorenson.com/newsroom/sorenson-acquires-omnibridge-and-hand-talk-to-develop-automated-sign-language-translation-capabilities/) +and is [demoing AI ASL avatars](https://sorenson.com/newsroom/sorenson-communications-unveils-ai-sign-language-translation-ast-proofs-of-concept/). +Demand for "ASL on top of existing access services" is forming, not stagnant — which makes +GenASL either a feature an incumbent builds (acquisition exit) or a specialised layer that +integrates with their pipelines (partnership/SDK play). --- ## 2.3 — Market sizing: TAM / SAM / SOM -### TAM — Total Addressable Market +Analyst figures for "captioning" disagree by an order of magnitude because some scope +*media localisation* (foreign-language subtitling, ~$30B+) and some scope *focused +accessibility captioning* (hundreds of millions to low billions). We use the **focused** +frame and triangulate against **sign-language-specific** reports, which are smaller but +more honest about GenASL's actual market. -The broadest credible frame is **video accessibility tooling**. +### TAM — Total Addressable Market -| Segment | 2025 size | Source | +| Segment | 2026 size | Source | |---------|-----------|--------| -| Closed captioning services (focused) | ~$370M – $2.5B (range across analysts) | [GlobalGrowthInsights](https://www.globalgrowthinsights.com/market-reports/captioning-and-subtitling-market-111936), [DataIntelo](https://dataintelo.com/report/global-closed-captioning-services-market) | -| Captioning + subtitling solutions (broad) | $32B (incl. media localization) | [MRFR](https://www.marketresearchfuture.com/reports/captioning-subtitling-solution-market-28263) | -| **Video accessibility tooling (our blended estimate)** | **~$3.0B in 2026, ~$8B by 2033 (~15% CAGR on focused captioning)** | Synthesis | +| Closed-captioning *services* (focused) | ~$0.6B–$2.5B (analyst range), growing ~10–12% CAGR | [Research Nester](https://www.researchnester.com/reports/captioning-and-subtitling-solutions-market/6638), [Verified Market Reports](https://www.verifiedmarketreports.com/product/closed-captioning-services-market/) | +| Captioning + subtitling *solutions* (broad, incl. localisation) | ~$6B in 2026 → ~$66B by 2035 (6.8% CAGR) | [MRFR](https://www.openpr.com/news/4400913/captioning-subtitling-solution-market-is-estimated-to-grow-usd) | +| Sign-language interpretation *services* | ~$0.89B (2026) → $1.72B (2034), 8.5% CAGR | [Business Research Insights](https://www.businessresearchinsights.com/market-reports/sign-language-interpretation-services-market-112737) | +| Sign-language translation *software/tech* | ~$0.5B–$1.2B (2025–26) → $2.5B–$4.5B (2033), 8–20% CAGR | [DataInsights](https://www.datainsightsmarket.com/reports/sign-language-translation-software-1956596) | +| **Video-accessibility tooling (our blended TAM)** | **~$3.5–4B in 2026, ~$8–10B by early 2030s** | Synthesis | -We use the **focused captioning + accessibility-services figure (~$3.0B in 2026)** as TAM because the $32B figure is dominated by localization (foreign-language subtitling), which is not GenASL's market. +We anchor TAM on **focused video-accessibility tooling (~$3.5–4B)** because the $30B+ +localisation figure is not GenASL's market, and the pure sign-language-tech figures +(~$1–1.5B) under-count the captioning budget GenASL prices *against*. ### SAM — Serviceable Addressable Market -GenASL's near-term reachable market is **English-speaking, regulated digital video** in the US, UK, Canada, Australia, and Ireland. +GenASL's near-term reachable market is **English-speaking, regulated digital video** in +the US, UK, Canada, Australia, and Ireland. -**Derivation:** -- North America = ~40% of global captioning demand → ~$1.2B -- UK + AU + IE + CA add ~10% more → ~$1.5B addressable in English-speaking markets -- Of that, the **ASL/BSL slice** is a fraction. Today sign-language services are perhaps 5–8% of the accessibility budget in regulated buyers, but the EAA and ADA Title II are pulling that ratio up. +- North America ≈ ~40% of global captioning demand → ~$1.4B +- UK + AU + IE + CA add ~10% → ~$1.7B addressable in English-speaking markets +- The **ASL/BSL slice** is ~5–8% of accessibility budgets today, but ADA Title II and the + EAA (which *names* sign language) are pulling that ratio up. -**SAM estimate: ~$650M in 2026, growing to ~$1.6B by 2030** as compliance demand expands sign-language line items. +**SAM estimate: ~$750M in 2026, growing to ~$1.8B by 2030** as compliance demand expands +sign-language line items. -### SOM — Realistic 5-year Capture +### SOM — Realistic 5-year capture (platform-pays) -| Year | Segment | Capture | Revenue | -|------|---------|---------|---------| -| Y1 | Education (ASL learners, K-12 pilot) | 5k paid individuals + 3 districts | ~$700k | -| Y2 | + EdTech / LMS partnerships | 10 enterprise contracts | ~$2.4M | -| Y3 | + Mid-market enterprise (training, HR) | 30 contracts | ~$6M | -| Y4 | + Government / public sector | 50 contracts | ~$12M | -| Y5 | + Creator economy (Patreon-tier indie publishers) | Mature mix | **~$18 – $25M ARR** | +| Year | Mix | Revenue | +|------|-----|--------:| +| Y1 | First self-serve SDK pilots | ~$30k | +| Y2 | + first mid-market platform contracts | ~$0.8M | +| Y3 | + Tier-2 platforms scale; first Tier-3 strategic | ~$3.9M | +| Y4 | + public-sector & strategic accounts | ~$10.2M | +| Y5 | Mature platform mix | **~$22M ARR** | -**SOM = ~$15–25M ARR by Year 5 = ~3–4% of SAM.** That is well below Hand Talk's traction in Brazil and consistent with what a focused EdTech/accessibility startup can capture in 5 years with a $5–10M cumulative raise. +**SOM ≈ $22M ARR by Year 5 ≈ 3–5% of SAM** — achievable with ~90 platform contracts on a +$4M seed → $15M Series A path. Full model in +[05-pricing-and-business-model.md](05-pricing-and-business-model.md). --- -## 2.4 — Customer segments ranked by willingness-to-pay +## 2.4 — Induced demand: the tool grows the market + +The biggest correction to the old analysis: the ASL-content market is **not fixed**. When +the marginal cost of adding ASL to a video drops from **$300–800/min** (human interpreter) +to **~$0.10–0.40/min** (retrieval-augmented pipeline), the market for the complement grows — +the same dynamic that expanded encyclopaedias (Wikipedia), video (YouTube hosting), and +language learning (Duolingo) by orders of magnitude. + +Three growth channels (full model in [F3](feasibility-study/03-market-expansion.md)): + +- **A — Latent Deaf demand.** ~1M US primary ASL users abandon the long tail of + YouTube/Coursera/Khan/TED today because nothing offers ASL. A trustworthy track unlocks + them. Even +30 min/day of engagement ≈ ~90M incremental user-hours/year in the US alone. +- **B — Hearing learners.** ASL has no Duolingo; the bottleneck is exposure to real ASL in + everyday content. Conservative Duolingo-style trajectories imply learners growing from + ~250–500k/yr to ~1.5–3M/yr by 2035. +- **C — Content supply.** If creators' marginal cost to add ASL drops to ~$0, ASL inventory + explodes — tens of thousands of hours/day if even 1% of educational/news content gets a + track. + +**Modelled induced-demand wedge: ~+$4.5B/yr by 2035, ~3× the baseline market.** The catch, +which founders must internalise: **user counts grow faster than direct revenue**, because +most new beneficiaries (Deaf viewers, learners) don't pay. GenASL captures the slice +*platforms* reallocate from compliance + engagement budgets. This is the central argument +for platform-pays ([F4](feasibility-study/04-pricing-strategy-comparison.md)). + +--- + +## 2.5 — Customer segments ranked by willingness-to-pay | Segment | Pain | WTP | Notes | |---------|------|-----|-------| -| **K-12 / college ASL programs** | Need engaging media; ASL is the 3rd most-studied language; word-level gloss is pedagogically *correct* for learners | **High** ($) | Easiest first market; institutional purchasing | -| **Higher-ed LMS / MOOC platforms** | NAD vs. Harvard/MIT precedent; massive video libraries; compliance ROI | **Very high** ($$$) | Slow sales cycle; long pilots | -| **Government & public-sector portals** | ADA Title II deadline; explicit mandate | **Very high** ($$$) | Procurement friction high | -| **Mid-market corporate training / HR** | EEOC, internal accessibility commitments | **Medium** ($$) | Crowded; need clear ROI vs. 3Play | -| **YouTube creators (long-tail)** | Audience growth, viewer loyalty | **Low** ($) | Won't pay unless free-tier or ad-funded | -| **Deaf-native primary consumers** | Genuine need but Deaf community is rightly skeptical of avatars/synthetic ASL | **Very low** unless co-designed | Critical for credibility, not for revenue | +| **EdTech / LMS / MOOC platforms** | Massive video libraries; Section 508; institutional procurement | **Very high** ($$$) | One integration reaches millions of learners | +| **Government & public-sector portals** | ADA Title II (2027/28); explicit mandate | **Very high** ($$$) | Procurement friction high; deadline now a planning horizon | +| **Streaming / UGC / media platforms** | EAA names sign language; engagement + brand | **High** ($$$) | Competitive parity once one ships ASL | +| **Enterprise publishers (banks, health, training)** | EEOC, brand, internal accessibility | **Medium-High** ($$) | Self-hosted option unlocks regulated buyers | +| **YouTube creators (long-tail)** | Audience growth | **Low** | Reached *through* platform integrations, not billed directly | +| **Deaf-native primary consumers** | Genuine need; rightly skeptical of avatars | **N/A — never billed** | Critical for credibility, not revenue; free forever | -The takeaway: **revenue comes from publishers and institutions, not from Deaf end-users.** This is the same economic structure as captioning today. +The takeaway, unchanged but sharpened: **revenue comes from platforms and publishers, +never from Deaf end-users.** This is the same economic structure as captioning today. --- -## 2.5 — Market timing assessment +## 2.6 — Market timing assessment -**It is a good moment to start, but a difficult moment to be late.** +**A good moment to start; a dangerous moment to be late.** | Tailwind | Headwind | |----------|----------| -| ADA Title II deadline (April 2026) creating procurement urgency | LLM costs falling — incumbents may build in-house | -| EAA enforcement (June 2025) opening EU market | Big platforms (YouTube, TikTok) may ship native ASL features | -| Generative AI making sign-synthesis cheaper to prototype | Deaf-community skepticism is rising in tandem with hype | -| Signapse, Hand Talk raising capital → validation | Captioning incumbents (3Play, Verbit) will likely acquire-or-build | - -**Conclusion: the window is ~24 months to establish credibility and a defensible corpus.** After that, distribution will be dominated by incumbents or platform-native features. +| ADA Title II extended to 2027/28 → a real *planning* window for new line items | LLM/synthesis costs falling — incumbents can build in-house | +| EAA live since June 2025, explicitly naming sign language | Sorenson (post-Hand-Talk) shipping ASL avatars with a huge Deaf customer base | +| Sign-language-tech markets growing 8–20% CAGR | A platform (YouTube/Netflix) could ship native ASL | +| DOJ on record that current AI can't remediate at scale → vendor opening | Deaf-community skepticism rises with hype | + +**Conclusion: the window is ~24 months** to establish Deaf-community trust and a defensible +corpus. After that, distribution will be dominated by incumbents (Sorenson) or +platform-native features. The corpus and the community relationships are the only assets +that don't evaporate when a better model ships. diff --git a/business/03-competitive-landscape.md b/business/03-competitive-landscape.md index 8fef2f0..42f0884 100644 --- a/business/03-competitive-landscape.md +++ b/business/03-competitive-landscape.md @@ -1,107 +1,158 @@ # 3 — Competitive Landscape -GenASL operates at the intersection of three adjacent markets: **sign-language generation**, **video captioning**, and **ASL education**. Each has different incumbents and different competitive dynamics. +GenASL competes on two planes at once: **which technical approach** to sign production +wins, and **which company** owns distribution. This section maps both, then locates the +white space — which is narrower than it was a year ago. --- -## 3.1 — Direct competitors: AI sign-language generation +## 3.1 — The five technical families + +Sign-language production splits into five technical families. Most products mix two or +three; few are pure. GenASL is the only one committed to the fifth. + +| # | Family | Representative systems | One-line description | +|---|--------|------------------------|----------------------| +| 1 | **Word/clip retrieval** | Old GenASL PoC; Hand Talk clip mode | English → gloss → look up one clip per word → concatenate. No grammar, no NMMs. | +| 2 | **Notation-driven avatar** | JASigning (SiGML/HamNoSys), Paula (EASIER) | Linguists author each sign in symbolic notation; avatar renders it. Doesn't scale. | +| 3 | **MoCap playback** | Signapse (Kara avatar) | Capture Deaf signers; play back per sentence with limited stitching. Coverage bounded by capture. | +| 4 | **End-to-end neural** | SignDiff, T2S-GPT, Sign-MExD; **Sorenson text-to-sign POC** | Text → motion in one shot, no retrieval anchor. BLEU-4 in the teens; hallucination risk. | +| 5 | **Hybrid retrieval + generative** ← **GenASL** | (no widely productised ASL system) | Phrase-level retrieval of Deaf-signer clips + generative transitions + parallel NMM channel. | + +### Comparison matrix (1–5, higher is better) + +| Dimension | (1) Clip | (2) Notation | (3) MoCap | (4) Neural | (5) Hybrid (GenASL) | +|---|:--:|:--:|:--:|:--:|:--:| +| Manual-sign fidelity | 4 | 3 | **5** | 3 | **5** | +| Non-manual markers (NMMs) | 1 | 2 | 4 | 3 | 4 | +| ASL grammar (topic-comment, classifiers) | 1 | 3 | 3 | 3 | 4 | +| Vocabulary coverage | 2 | 5 | 2 | 4 | 4 (scales with corpus) | +| Determinism / auditability | **5** | **5** | **5** | 1 | 4 | +| Failure modes acceptable to Deaf community | 2 | 3 | 4 | 1 (uncanny) | 4 | +| Real-time latency feasible | **5** | 4 | 2 | 3 | 4 | +| Inference cost | **5** | **5** | 4 | 2 | 4 | +| Scales to new content domains | 2 | 3 | 2 | 4 | 4 | +| Defensibility / moat | 1 | 2 | 4 | 2 | **5** (corpus + system) | +| Time-to-MVP | **5** | 3 | 3 | 2 | 2 | +| **TOTAL** | 43 | 46 | 44 | 31 | **51** | + +The hybrid approach is neither cheapest nor fastest, but it is the **only** family that +*simultaneously* clears fidelity, Deaf-acceptance, and auditability. Full scoring rationale +in [F2](feasibility-study/02-competitive-tech-comparison.md). -| Company | HQ | Approach | Funding | Strength | Weakness vs. GenASL | -|--------|----|----------|---------|----------|---------------------| -| **Signapse AI** | UK | 3D AI avatar — BSL & ASL; "SignStudio" SaaS for video translation, "SignStream" free tier | **$3.5M total** (£2M seed April 2024, incl. Innovate UK + Royal Assoc. for Deaf people) | Deaf-led credibility; institutional backing; both BSL + ASL | Avatar-based, expensive to render; no browser-overlay distribution | -| **Hand Talk** | Brazil | "Hugo" 3D avatar — Libras + ASL; consumer app + B2B website plugin | Multi-stage raised; ~$10M+ raised over years | **10M+ downloads**; deep B2B in Brazilian banking/gov; 100M+ words translated | Libras-first; ASL is a secondary product; avatar criticism applies | -| **SignAll** | US/Hungary | Computer-vision **ASL→English** translation (direction reversed from GenASL); "SignAll Learn" widely adopted in US higher ed | ~$3.6M raised | Footprint in US universities; strong CV stack | Different direction (sign→text), not a competitor for overlay but a partner | -| **Sorenson Communications** | US | Decades-old VRS provider; now investing in AI sign-language translation | Established enterprise; not VC-funded | Massive Deaf customer base; trusted brand | Slow incumbent; not focused on online video | -| **SignAvatar / academic projects** | Various | Speech→ASL animation pipelines (e.g. Speak2Sign3D 2025) | Research grants | Cutting-edge synthesis quality | Not productized | - -Sources: [Slator on Signapse](https://slator.com/ai-sign-language-firm-signapse-raises-usd-2-4m-in-seed-funding/), [Crunchbase](https://www.crunchbase.com/organization/signapse-ec44), [Hand Talk on App Store](https://apps.apple.com/us/app/hand-talk-learn-sign-language/id659816995), [CB Insights — SignAll](https://www.cbinsights.com/company/signall1). +--- -**GenASL's defensible difference:** retrieval+overlay, not avatar synthesis. It's the only player attacking the *YouTube-watching moment* rather than building a destination product or a SaaS endpoint. +## 3.2 — Direct competitors: AI sign-language generation ---- +| Company | HQ | Approach | Funding / scale | Strength | Weakness vs. GenASL | +|---------|----|----------|-----------------|----------|---------------------| +| **Sorenson** | US | Family 3+4. Acquired **Hand Talk + OmniBridge** (Jan 2025); April 2026 POCs: text-to-sign **human-looking avatar** + real-time sign-to-text | Largest US VRS base; established enterprise revenue | **The incumbent threat** — Deaf customer base + brand + capital | Pure-neural avatar (experts raised concerns); POC aimed at *point-of-service* interactions (retail, airports), not media overlay; slow institutional velocity | +| **Signapse AI** | UK | Family 3 + light 4 — MoCap of Deaf signers + neural style transfer; BSL + ASL | **~$3.5M total**; ~$6.6M seed valuation (2024); accelerator round Aug 2025 | Deaf-led credibility; transport partnerships (Network Rail, Translink) | Vocabulary bounded by sessions captured; expanding coverage scales linearly with studio time; SaaS-portal, not browser overlay | +| **Hand Talk** | Brazil | Family 2 (Hugo avatar) + neural smoothing; **now part of Sorenson** | 4M+ downloads; 700M+ words; UN "World's Best Social App" | Distribution scale in emerging markets | Libras-first; avatar criticised for stiff motion / missing NMMs; ASL secondary | +| **SignDiff / T2S-GPT / Sign-MExD** | Academic | Family 4, pure neural | Research grants | Generalises to arbitrary input | BLEU-4 ~12–17 on How2Sign; not productised; documented hallucinated handshapes | +| **JASigning** | Academic (UEA) | Family 2, notation | Research | Linguistically rigorous; many languages | Every sign hand-authored; doesn't scale as a runtime | -## 3.2 — Adjacent competitors: captioning incumbents +Sources: [Sorenson newsroom](https://sorenson.com/newsroom/sorenson-acquires-omnibridge-and-hand-talk-to-develop-automated-sign-language-translation-capabilities/), +[Slator on Signapse](https://slator.com/ai-sign-language-firm-signapse-raises-usd-2-4m-in-seed-funding/), +[Crunchbase — Signapse](https://www.crunchbase.com/organization/signapse-ec44), +[Hand Talk on App Store](https://apps.apple.com/us/app/hand-talk-learn-sign-language/id659816995). -These are the businesses GenASL must **align with** or **disrupt**. They are the buyers of accessibility budget today. +**GenASL's defensible difference:** phrase-level **retrieval of Deaf-signer recordings** ++ **platform-agnostic media overlay**. Sorenson is attacking the *transactional service +desk*; Signapse the *bounded-vocabulary announcement*; GenASL the *long tail of +instructional/expository online video* that neither addresses and that no human +interpreter is economically viable for. -| Company | Model | Pricing | Implication for GenASL | -|--------|-------|---------|------------------------| -| **3Play Media** | Hybrid AI + human captioning, audio description, transcripts | ~$0.90/min alignment; average enterprise spend **~$117k/yr** | The benchmark for enterprise pricing; partner or get acquired | -| **Verbit** | AI live + post-production captioning | ~$0.95/min alignment | Aggressive EdTech sales; obvious acquirer in 3-5 yrs | -| **Rev / Rev AI** | API-first transcription & captions | **$0.25/min** live AI captions | Sets the floor price for AI-only output | -| **AI Media / AIMG** | Live captioning, broadcast focus | Custom | Established in broadcast | -| **Otter, Descript, Sonix** | Adjacent meeting/podcast captioning | $10–$30/mo seat | Out of scope but show consumer SaaS pricing | +--- -Sources: [3Play pricing](https://www.3playmedia.com/plans-pricing/), [WiscKB vendor pricing](https://kb.wisc.edu/accessibility/15016), [Sonix live captioning roundup](https://sonix.ai/resources/best-live-captioning-software-tools/). +## 3.3 — The data layer: why the corpus is the moat -**Strategic implication:** GenASL should **price as a premium add-on to captioning, not a replacement.** A reasonable buyer mental model: +GenASL's approach is only as good as the corpus it retrieves from. The public datasets set +the floor; the proprietary, consented, NMM-annotated corpus is the asset competitors can't +copy. -``` - Captions: $0.50 – $1.00 per minute (commodity) - Audio descr.: $4 – $15 per minute (specialized) - ASL overlay: $1 – $4 per minute ← GenASL target band -``` +| Dataset | Scale | Role in GenASL | +|---------|-------|----------------| +| [**OpenASL**](https://arxiv.org/pdf/2205.12870) | 288 h, 200+ signers, multi-domain — largest public ASL translation set | **Default phrase-level retrieval tier** | +| [**ASL Citizen**](https://www.microsoft.com/en-us/research/project/asl-citizen/dataset-description/) | 83,399 videos, 2,731 signs, 52 signers, consented | **Lexical secondary** (gap coverage) | +| [**YouTube-ASL**](https://arxiv.org/pdf/2306.15162) | 984 h, 11,093 videos (~3× OpenASL), open-domain | Training/retrieval expansion | +| **WLASL** | ~2,000 glosses | **Last-resort per-gloss fallback** (tagged `fidelity="stitched"`/`"degraded"`) | +| **Proprietary capture** | 200 h+ Deaf-signer, NMM-annotated (built over Phases 4+) | **The moat** — consented, royalty-bearing, auditable | -This puts ASL in a defensible "specialty access service" band — above commodity captions, below human ADA-grade audio description. +A neural-only system can be reverse-engineered from public data. A 200 h+ Deaf-signer +corpus you *own*, with explicit consent and royalty agreements, cannot be — and it doubles +as the community-trust signal that wins enterprise deals. --- -## 3.3 — Adjacent competitors: ASL education +## 3.4 — Adjacent competitors: captioning incumbents -| Player | Model | Notes | -|--------|-------|-------| -| **ASL University / Lifeprint** | Free + premium courses | Massive long-tail traffic; complement, not competitor | -| **ASLdeafined** | School subscriptions | Education-channel incumbent; possible partner | -| **Lingvano (ASL)** | Duolingo-style app | Strong UX, ~$10/mo | -| **Bill Vicars on YouTube** | YouTube channel | The "Duolingo for ASL" is fragmented; gap exists | -| **Hand Talk Learn** | Consumer app | 10M+ downloads but Libras-first | +These are the businesses GenASL **prices against** and may **partner with or be acquired by.** -**The opening:** *there is no dominant Duolingo-for-ASL.* GenASL's word-level pipeline is *better suited to learners than to native users.* This is a credible entry market — and the path Lingvano, Memrise (back in 2015), and ELSA Speak all followed before pivoting to enterprise. +| Company | Model | Pricing | Implication for GenASL | +|---------|-------|---------|------------------------| +| **3Play Media** | Hybrid AI + human captioning, AD, transcripts | ~$0.90/min; avg enterprise ~$117k/yr | Benchmark for enterprise pricing; partner or acquirer | +| **Verbit** | AI live + post-production captioning | ~$0.95/min | Aggressive EdTech sales; likely acquirer | +| **Rev / Rev AI** | API-first transcription & captions | **$0.25/min** AI-only | Sets the floor price for AI-only output | +| **AI Media / AIMG** | Live captioning, broadcast | Custom | Established in broadcast | ---- +**Strategic implication:** price ASL as a **premium add-on to captioning, not a replacement.** -## 3.4 — Positioning map +``` + Captions: $0.25 – $1.00 / min (commodity) + ASL overlay: $0.30 – $1.20 / min ← GenASL band (1–3× captioning, volume-discounted) + Audio descr.: $4 – $15 / min (specialised human) + Human ASL: $300 – $800 / min (gold standard; what GenASL does NOT replace) +``` -Two axes that matter for buyers: +--- + +## 3.5 — Positioning map ``` - CHEAP & COMMODITY - │ - │ Rev AI - │ YouTube auto-CC - │ - │ - BROWSER / │ SAAS / - OVERLAY ──────────────────────────┼────────────────────── DESTINATION - │ - GenASL ◀───┐ │ - │ │ 3Play, Verbit - │ │ Signapse SignStudio - │ │ Hand Talk B2B - │ │ SignAll Learn - │ - PREMIUM / SPECIALIZED + HIGH FIDELITY + │ + Human interpreter + │ + Signapse (MoCap) ● │ ● Sorenson AST + │ (avatar POC) + COMMODITY ─────────────────────────────────────┼────────────────────── BESPOKE + COST │ COST + │ ★ GenASL + │ retrieval-augmented + ● │ (target zone) + Hand Talk Hugo │ + ● JASigning │ + ● SignDiff / T2S-GPT │ + (research) │ + ● old GenASL PoC │ + LOW FIDELITY ``` -**GenASL is the only quadrant occupant: browser-overlay + premium/specialized.** Every other player either (a) sells you a SaaS portal you upload videos into, or (b) sells you a destination app. - -This is the most important strategic finding in this report: **the overlay surface is uncontested**, because incumbents are organizationally built around upload-process-deliver workflows, not real-time augmentation. +The empty quadrant — **high fidelity at commodity cost** — is what the hybrid pipeline +opens. A year ago it was uncontested; today **Sorenson's AST program is aiming at the same +quadrant from above.** The difference: Sorenson is pure-neural and point-of-service; +GenASL is retrieval-anchored and media-overlay. The race is real and the moat is +*corpus + community + integration*, not algorithms. --- -## 3.5 — Five forces summary +## 3.6 — Five forces summary | Force | Strength | Notes | |-------|----------|-------| -| **Threat of new entrants** | High | LLM + WLASL is reproducible; barrier is corpus & community trust | -| **Bargaining power of customers** | Medium-High | Enterprises have RFP leverage; individual creators have none | -| **Bargaining power of suppliers** | Low | LLM is multi-provider; WLASL is public; ffmpeg is open | +| **Threat of new entrants** | High | LLM + public datasets are reproducible; the barrier is a *consented, NMM-annotated* corpus and community trust | +| **Bargaining power of customers** | Medium-High | Platforms have RFP leverage; one integration is worth millions of viewers | +| **Bargaining power of suppliers** | Low | LLM is multi-provider; OpenASL/ASL Citizen are public; rendering is open | | **Substitutes** | High | Captions, transcripts, human interpreters all substitute partially | -| **Industry rivalry** | Medium | Niche today, will intensify by 2027 | +| **Industry rivalry** | **Rising fast** | Sorenson's acquisitions + POCs moved this from "niche" to "contested" inside a year | -**Defensible moats GenASL can build (none are present yet):** +**Moats GenASL can build (none are fully present yet):** -1. **A licensed, expanded Deaf-signer corpus.** This is the most valuable asset to build. WLASL's 2k glosses is the floor; a 10k+ corpus with proper non-manual markers, recorded with paid Deaf signers, becomes a real asset. -2. **An audit-grade compliance reporting layer.** Procurement officers buy paperwork as much as software. -3. **Browser-distribution lock-in.** The Chrome Web Store category for accessibility extensions is small; being the dominant ASL extension is a moat against incumbents who don't ship extensions. -4. **Deaf-community endorsement.** A formal advisory board with NAD / Gallaudet partnerships is non-replicable for late entrants. +1. **A licensed, expanded Deaf-signer corpus** with NMMs and royalty agreements — the + single most valuable asset. +2. **Integration lock-in.** Once a platform embeds the SDK in its player + compliance + pipeline, switching is a multi-month engineering project. +3. **Audit-grade compliance reporting** mapped to WCAG/EAA/508 — procurement buys paperwork. +4. **Deaf-community endorsement** (NAD / Gallaudet / NTID advisory) — non-replicable for + late entrants and the thing Sorenson's pure-neural avatar most conspicuously lacks. diff --git a/business/04-value-proposition.md b/business/04-value-proposition.md index edee877..f373be4 100644 --- a/business/04-value-proposition.md +++ b/business/04-value-proposition.md @@ -2,34 +2,45 @@ This section answers two questions: -1. **What exactly does GenASL promise — to whom, in language they recognize?** -2. **What does the product become, in 24 months, to deliver on that promise?** +1. **What does GenASL promise — to whom, in language they recognise?** +2. **What does the product become, over 24 months, to deliver on that promise?** --- ## 4.1 — The honest value proposition -Most accessibility AI marketing overclaims. GenASL must do the opposite. Here is the *credible* value claim — phrased differently for each buyer. +Accessibility-AI marketing overclaims. GenASL does the opposite. Here is the *credible* +claim, phrased per buyer — and note that **the Deaf viewer is never a buyer.** -### For ASL learners (B2C) +### For EdTech / LMS / MOOC platforms (primary B2B) -> **"Practice ASL on the videos you already watch."** -> Pause any YouTube video and see word-level ASL signs overlaid in time with the spoken English. It's not a substitute for a teacher — it's a millions-of-hours-richer-than-Duolingo flashcard built into every educational video on the internet. +> **"An embeddable ASL track for your video library — no re-uploads, no human bottleneck."** +> Drop our SDK into your player; we render an ASL avatar in your learners' first language, +> anchored to real Deaf-signer recordings. Coverage and fidelity are reported per video for +> Section 508 and procurement. One integration reaches every learner you have. -### For school districts and ASL programs (B2B education) +### For government & public-sector portals (primary B2B) -> **"A free CALL (Computer-Assisted Language Learning) tool for ASL classrooms."** -> Word-level gloss matches how ASL I/II curricula already teach. Students get sign exposure on TED-Ed, Crash Course, Khan Academy, and any teacher-assigned YouTube video. Schools get usage analytics and a centrally-managed extension deployment. +> **"Make the Title II planning window count."** +> The deadline moved to 2027–2028 and the DOJ itself flagged that current AI can't +> remediate at scale. We are the ASL line item you can adopt now, with audit-grade +> coverage reports mapped to WCAG 2.1 AA, and a self-hosted option for data-residency rules. -### For LMS / EdTech / corporate L&D (B2B mid-market) +### For streaming / media platforms (B2B) -> **"An ASL augmentation layer for your existing video library — without re-uploading."** -> Plug our SDK into your video player; we generate aligned ASL clips on the fly. Compliance reports document coverage. Self-hosted option available for data-residency-sensitive buyers. +> **"ASL parity, before your competitor ships it."** +> The EAA names sign-language interpretation for audiovisual media. We give you a +> platform-agnostic ASL overlay with per-minute pricing your accessibility budget already +> understands — and an avatar your viewers won't reject, because it's built *with* the +> Deaf community, not at it. -### For Deaf-community partners (non-monetary) +### For the Deaf community (non-monetary, non-negotiable) -> **"A pre-production tool, not a replacement for human interpretation."** -> GenASL produces *gloss-level scaffolding* a Deaf editor can refine into a polished sign-language track. The product is built *with* Deaf collaborators and pays them for the corpus. +> **"Augmentation, not replacement — and you are never billed for access."** +> GenASL puts an ASL track on the long tail of content that has *none* today, because no +> human interpreter is economically viable for it. Human interpretation remains the gold +> standard for live, high-stakes, nuanced settings. Corpus contributors are paid, with +> royalties. Deaf-led organisations use it free, forever. --- @@ -37,96 +48,110 @@ Most accessibility AI marketing overclaims. GenASL must do the opposite. Here is | Buyer | Functional job | Emotional job | Social job | |-------|----------------|---------------|------------| -| ASL learner | "Help me practice on real content, not flashcards" | Feel like progress is happening | Identify as a serious learner | -| ASL teacher | "Give my students homework on authentic media" | Confidence the tool reinforces what I teach | Be seen as innovative | -| EdTech accessibility lead | "Cover ASL line item in WCAG compliance plan" | De-risk the legal review | Win the procurement narrative | -| Government webmaster | "Get the Title II deadline off my desk" | Avoid being on the news | Show measurable progress | -| Creator (long-tail YouTuber) | "Be the accessible channel in my niche" | Pride in inclusive content | Audience differentiation | +| EdTech accessibility lead | "Cover the ASL line item across my whole library" | De-risk the legal review | Win the procurement narrative | +| Government webmaster | "Be ready for the 2027 Title II deadline" | Avoid being the headline | Show measurable progress | +| Media platform PM | "Match EAA expectations and competitor parity" | Confidence it won't be rejected by Deaf users | Be seen as genuinely inclusive | +| Enterprise L&D lead | "Make training accessible without per-video human cost" | Predictable budget | Brand as an inclusive employer | +| Deaf viewer (beneficiary, not buyer) | "Watch the content hearing people watch, in ASL" | Belonging, not afterthought | Participate in the same culture | --- -## 4.3 — The product wedge: what to actually build first +## 4.3 — Why retrieval-augmented is the defensible product (not word clips, not pure neural) -Given the competing options, here is the recommended wedge. +The product wedge *is* the architecture. Three properties make it sellable where the +alternatives aren't: -``` - ┌─────────────────────────────────────────────────┐ - │ WEDGE: "ASL Practice Mode" for YouTube │ - │ │ - │ • Chrome extension, freemium │ - │ • Pause-on-sign learning mode (key UX twist) │ - │ • Vocabulary tracker / streaks (light gamify) │ - │ • Teacher-friendly classroom mode (B2B hook) │ - └─────────────────────────────────────────────────┘ -``` +1. **Buyers buy paperwork.** A compliance officer challenged by a Deaf advocacy group needs + a defensible artifact. *"Every segment is anchored to a Deaf-signer recording; the model + only interpolates timing and NMMs"* is defensible. *"A neural net generated it"* is not. +2. **Failure modes are bounded.** A retrieval miss is a momentary gap or a slightly + off-context sign (tagged `fidelity="stitched"`). A generative failure is an *uncanny* + output — a six-fingered hand, a dead face — which is reputationally catastrophic with the + Deaf community and is exactly the critique levelled at pure-neural avatars. +3. **Corpus expansion has linear, ownable payoff.** Each capture session directly improves + coverage and *is owned*. Neural-only systems need orders of magnitude more data per + quality jump and can be reverse-engineered from public sets. -**Why "ASL Practice Mode" beats "ASL Captions for the Deaf" as a wedge:** - -1. **Word-level gloss is actually correct for learners.** It matches ASL I curriculum. It's wrong for native consumption — but learners need exactly this granularity. -2. **B2C learner traction → B2B education sales.** Once teachers see students using it on their own, district pilots get easy. -3. **It defers the cultural-acceptability question** until the product has earned standing to enter the conversation. -4. **It generates the data flywheel** — usage logs of which words confuse learners feed corpus prioritization. - -The existing GenASL codebase already does ~80% of what this wedge requires. The remaining 20% is UX polish, gamification, and a learner-mode toggle. +This is **motion-RAG** — the same insight (retrieval beats free generation for +high-stakes, auditable output) that made RAG win in document QA. Detail in +[F1 §1.5](feasibility-study/01-technology-feasibility.md). --- -## 4.4 — Product roadmap (24 months) +## 4.4 — Product roadmap (mapped to the actual pipeline phases) + +The codebase has shipped **Phases 1–3** (audio backbone + interpreter brain). The business +roadmap is the remaining phases plus the data and trust work that gates them. -### Phase 1 — Months 0–6: Validation & wedge launch +### M0–M6 — Foundation & data (Phases 4–5 begin) | Workstream | Deliverable | Why | |-----------|-------------|-----| -| **Deaf community advisory** | 5-person paid advisory board (Gallaudet alumni network is the obvious starting place) | Cannot be skipped; everything else depends on this | -| **Privacy & ToS hardening** | Replace `youtube-transcript-api` with official Data API + caption upload pipeline | Eliminate the single biggest fragility | -| **Learner UX** | Pause-on-sign mode; per-sign confidence indicator; "I don't know this sign" feedback button | The wedge product | -| **Chrome Web Store launch** | Public extension, freemium tier | Distribution begins | -| **K-12 pilot** | 3 schools, free 1-year pilot with feedback contract | Reference customers | +| **Deaf community advisory** | 5-person paid board; first non-founder hire is Deaf | The gate everything depends on | +| **Corpus v1** | OpenASL + ASL Citizen indexed for phrase-level retrieval; first proprietary capture session | Phase 4 (retrieval) lands | +| **Motion synthesis** | Retrieval-driven motion + NMM channel from prosody | Phase 5 lands | +| **Closed demo** | Avatar v1 (VRM, single identity, basic NMMs) on instructional clips | Demoable for design partners | +| **Gate** | Deaf-rater panel intelligibility **≥ 3.5/5** | No paid GTM before this | -### Phase 2 — Months 6–12: Education GTM +### M6–M12 — SDK + first contracts (Phases 6–7) | Workstream | Deliverable | |-----------|-------------| -| **Pricing live** | $9/mo individual; $4/seat/yr education | First revenue | -| **LMS integrations** | Canvas + Brightspace add-ons (read-only assignments mode) | EdTech beachhead | -| **Corpus expansion** | 2,000 → 4,000 glosses; signed by paid Deaf signers, with non-manual markers captured | Quality differentiator | -| **Compliance reporting v1** | Coverage report PDF per video for procurement teams | Enterprise prep | +| **Chrome extension** | Three.js + VRM overlay (Phase 6) — the showcase surface | +| **Platform SDK + API** | Embeddable on any HTML5 `