Skip to content

ashhuxt/scribeforge-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

βš’οΈ ScribeForge AI

AI-Powered Intelligent Notes Workspace

🧠 Schema-Guarded Full-Stack Knowledge Engine

Built with FastAPI β€’ Supabase β€’ Gemini 2.5 Flash


Python FastAPI Supabase Gemini AI Status



A production-grade AI workspace that transforms raw human notes into structured, searchable, and intelligence-ready knowledge assets.


🧠 Research & Engineering Context

ScribeForge AI was designed as a production-oriented AI infrastructure system focused on:

  • Structured AI generation
  • Reliable schema-constrained inference
  • Intelligent note lifecycle management
  • Scalable backend architecture
  • Human-AI collaborative workflows

Unlike typical AI note applications, ScribeForge emphasizes:

βœ… Deterministic AI outputs
βœ… Backend reliability
βœ… Structured knowledge extraction
βœ… Searchable information systems
βœ… Production-grade API architecture


πŸš€ Executive Overview

Modern note-taking systems suffer from a fundamental problem:

Humans generate unstructured information faster than they can organize it.

ScribeForge AI solves this by converting fragmented raw notes into:

  • Intelligent summaries
  • Extracted action items
  • Searchable tagged entities
  • AI-generated titles
  • Structured knowledge records

The platform acts as an AI-powered knowledge refinement engine rather than a simple note editor.


🎯 Problem Statement

Traditional note systems face several limitations:

  • Notes become unsearchable over time
  • Raw text lacks structure
  • Important action items are buried
  • AI outputs often break JSON parsing pipelines
  • Scaling AI-assisted systems introduces instability

Most AI-integrated note apps rely on:

❌ Free-form AI responses
❌ Weak validation
❌ Fragile parsing logic
❌ Monolithic backend systems

Resulting in unreliable production behavior.


πŸ’‘ Solution: Schema-Guarded AI Architecture

ScribeForge introduces a strictly validated AI orchestration pipeline.

Instead of trusting raw LLM text, the system forces AI outputs into validated application schemas.


βš™οΈ Core Features

🧠 Structured AI Summarization

Transforms unorganized text into:

  • High-level summaries
  • Key insights
  • Structured metadata

βœ… Action Item Extraction

Automatically detects:

  • Tasks
  • Decisions
  • Follow-ups
  • Priority actions

🏷️ Intelligent Tagging & Search

Supports:

  • Real-time fuzzy search
  • Tag containment filtering
  • Server-side querying
  • Indexed retrieval pipelines

πŸ”’ Schema-Guarded AI Responses

Uses:

  • Gemini structured output mode
  • Pydantic v2 contracts
  • Strict response validation

Ensuring:

βœ… Zero malformed AI payloads
βœ… Reliable downstream processing
βœ… Deterministic backend behavior


πŸ—‚οΈ Soft-Deletion Archive Layer

Instead of destructive deletion:

  • Notes are archived using is_archived
  • Historical indexing remains intact
  • Database fragmentation is minimized

🌐 Anonymous Public Sharing

Implements isolated public-read endpoints for:

  • Shared notes
  • Public references
  • Knowledge distribution

Without exposing protected infrastructure.


πŸ—οΈ System Architecture

Raw User Notes
        ↓
FastAPI Backend
        ↓
AI Orchestration Layer
        ↓
Gemini 2.5 Flash
        ↓
Pydantic Schema Validation
        ↓
Structured Knowledge Objects
        ↓
Supabase Persistence Layer
        ↓
Search + Retrieval APIs

πŸ› οΈ Tech Stack

Layer Technology
Backend Framework FastAPI
Language Python 3.12
Database Supabase (PostgreSQL)
AI Engine Gemini 2.5 Flash
Validation Pydantic v2
Authentication JWT + Passlib
API Standard REST
Runtime Model Asynchronous Python

πŸ“‚ Project Structure

backend/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ routes/
β”‚   β”‚   β”œβ”€β”€ auth.py
β”‚   β”‚   └── notes.py
β”‚   β”‚
β”‚   β”œβ”€β”€ ai_service.py
β”‚   β”œβ”€β”€ auth_utils.py
β”‚   β”œβ”€β”€ main.py
β”‚   └── schemas.py
β”‚
β”œβ”€β”€ .env.example
└── requirements.txt

πŸš€ Local Setup

1️⃣ Clone Repository

git clone https://github.com/ashhuxt/scribeforge-ai.git
cd backend

2️⃣ Create Virtual Environment

python -m venv venv

Activate:

Windows

.\venv\Scripts\activate

Linux / macOS

source venv/bin/activate

3️⃣ Install Dependencies

pip install -r requirements.txt

4️⃣ Configure Environment Variables

Create .env

GOOGLE_API_KEY=your_google_api_key
GEMINI_API_KEY=your_gemini_api_key
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_KEY=your_supabase_key
JWT_SECRET=your_secret_key

πŸƒ Running the Application

python -m uvicorn app.main:app --reload

Swagger Documentation:

http://127.0.0.1:8000/docs

πŸ”¬ Engineering Decisions & Research Thinking


βœ… Schema-Driven AI Reliability

Most AI systems fail because LLMs generate inconsistent outputs.

Solution

Gemini responses are bound directly to:

  • Pydantic schemas
  • Typed validation contracts
  • Structured response enforcement

This removes:

❌ JSON corruption ❌ Parsing instability ❌ Invalid payload crashes


βœ… Fully Decoupled Backend Design

Feature-isolated routers:

  • auth.py
  • notes.py

Prevent:

  • Circular imports
  • Tight coupling
  • Scaling bottlenecks

βœ… Defensive Data Layer Engineering

All retrieval systems include:

  • Explicit index checks
  • Empty-state guards
  • Safe array handling

Protecting the backend from:

  • Runtime failures
  • Indexing crashes
  • Null reference errors

βœ… Storage Layer Optimization

Implemented:

is_archived

Instead of destructive deletion.

Benefits:

  • Historical preservation
  • Efficient indexing
  • Lower fragmentation
  • Better auditability

⚠️ Research Gap & Limitations

While ScribeForge AI achieves strong reliability and structured AI orchestration, several important challenges remain:

  • AI-generated summaries may still miss contextual nuance
  • Current retrieval pipeline is keyword-centric, not semantic
  • No vector embedding search layer currently implemented
  • Multi-document reasoning is limited
  • Long-term memory and knowledge graph relationships are not modeled

These limitations highlight the transition required from structured generation systems toward semantic knowledge reasoning systems.


πŸš€ Research Direction

ScribeForge AI serves as a foundational system for exploring the future of:

  • AI-assisted productivity systems
  • Structured knowledge engineering
  • Semantic retrieval architectures
  • Human-AI collaborative workflows

Proposed Future Directions

🧠 Semantic Retrieval Systems

  • Vector embeddings
  • Hybrid retrieval pipelines
  • Context-aware ranking

πŸ€– Agentic AI Workflows

  • Autonomous knowledge agents
  • Multi-step reasoning systems
  • AI task delegation

πŸ•ΈοΈ Knowledge Graph Integration

  • Relationship extraction
  • Entity linking
  • Long-term memory systems

⚑ Distributed AI Infrastructure

  • Queue-based orchestration
  • Async distributed workers
  • Horizontal scalability

Key Research Questions

  • How can AI systems reliably structure human knowledge?
  • How can semantic retrieval outperform keyword-based systems?
  • Can AI-generated knowledge systems maintain long-term consistency?
  • How should AI agents collaborate with human productivity workflows?

This positions ScribeForge AI at the intersection of:

  • Knowledge Engineering
  • Information Retrieval
  • AI Systems Design
  • Human-Centered AI

πŸ“ˆ Why This Project Stands Out

Unlike basic CRUD note applications, ScribeForge demonstrates:

βœ… Production-grade backend engineering βœ… Structured AI integration βœ… Typed schema enforcement βœ… Async API orchestration βœ… Real-world system reliability βœ… Research-oriented architectural thinking


πŸ‘¨β€πŸ’» Developer

Ashish Patel

Focused on building:

  • AI Infrastructure Systems
  • Intelligent Backend Architectures
  • Knowledge Engineering Platforms
  • Scalable Production APIs

🌟 Final Statement

ScribeForge AI is not just a notes application. It is a production-oriented foundation for intelligent knowledge systems.


βš’οΈ ScribeForge Turns Raw Information Into Structured Intelligence.

About

A resilient, schema-guarded full-stack knowledge engine built with FastAPI, Supabase, and Gemini 3 Flash to transform unstructured text into validated, structured assets.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages