Annotex is a full-stack data labeling platform designed to address the opacity and compensation issues in traditional crowdsourcing. It integrates automated validation, real-time analytics, and blockchain-based incentives to create a trustworthy ecosystem for high-quality machine learning data.
High-quality labeled data is essential for modern machine learning systems. Existing platforms often suffer from opacity in quality assessment, delayed compensation, and limited performance visibility.
Annotex solves these challenges by allowing contributors to perform labeling tasks while transparently tracking completion rates, quality metrics, and earnings. Administrators can upload datasets, configure validation rules, monitor quality through dashboards, and manage crypto-based payouts.
- Task Discovery: Browse available tasks with detailed descriptions and transparent reward information.
- Real-time Analytics: Track personal performance metrics, including accuracy rates and productivity trends.
- Immediate Feedback: Receive instant updates on label acceptance status.
- Secure Payouts: Connect Solana wallets for secure, verifiable earnings.
- Dataset Management: Upload datasets in various formats (CSV, JSON, Images) and define labeling parameters.
- Automated Validation: Implement majority voting with configurable acceptance thresholds and consensus rules.
- Quality Monitoring: Visualize inter-annotator agreement and overall dataset reliability.
- Batch Payments: Execute bulk payouts via blockchain transactions to ensure timely compensation.
The platform follows a modular three-tier architecture (Presentation, Application, Data).
| Component | Technology | Description |
|---|---|---|
| Frontend | Next.js 14+ | Server-side rendered React with App Router and Tailwind CSS. |
| Backend | Express.js / Node.js | RESTful API with JWT authentication and Prisma ORM. |
| Database | PostgreSQL | Relational storage ensuring ACID compliance and data integrity. |
| Blockchain | Solana Pay / @solana/web3.js | Wallet integration and transaction verification on Solana devnet. |
| DevOps | Docker | Containerized deployment via Docker Compose. |
| Auth | JWT / Bcrypt | Secure session management and password hashing. |
Node.js is used as the runtime and tooling layer for every JavaScript/TypeScript part of Annotex:
- Backend runtime: The API is an Express application that runs on Node.js and starts from
backend/src/server.ts. - Frontend runtime/build system: The web app is a Next.js application that uses Node.js for the dev server, production server, and build pipeline.
- Package management:
npminstalls dependencies and runs project scripts from the root, frontend, and backendpackage.jsonfiles. - TypeScript execution: Development commands use Node-based tools such as
tsx,nodemon, andtsc. - Testing and data tooling: Backend tests run through Node/Jest, and Prisma CLI commands also execute in the Node environment.
- Containers: Both Dockerfiles use
node:20-alpine, so Docker development and production builds are also standardized around Node 20.
- Root workspace:
npm run dev,npm run build,npm run test - Backend:
npm run dev,npm run build,npm start,npm run prisma:generate - Frontend:
npm run dev,npm run build,npm run start
Use Node.js 20.19+ for local development. That matches the backend engine requirement and the Node 20 Docker images used by both services.
- Task Creation: Admins upload data and set validation rules/rewards.
- Label Submission: Contributors complete assignments via task-specific interfaces.
- Validation Engine: Submissions are processed using majority voting algorithms.
- Metric Updates: Accepted labels automatically update quality scores.
- Payout Processing: Rewards are calculated and transferred via blockchain.
The system uses a normalized PostgreSQL schema designed for integrity and efficient querying:
- Users: Authentication credentials, roles, wallet addresses, and earnings.
- Tasks: Dataset references, label types, validation configurations, and status.
- Labels: Submission values, timestamps, and validation/acceptance flags.
- Payouts: Transaction hashes, amounts, and completion status.
- Metrics: Aggregated performance data for dashboards.
- Docker & Docker Compose
- Node.js 20.19+ (for local frontend/backend development)
- npm 9+ recommended
-
Clone the Repository
git clone https://github.com/Yashb404/Annotex.git cd Annotex -
Run the Stack
docker-compose up --build
This starts the Frontend, Backend, and Database containers.
-
Access the Application Open http://localhost:3000 in your browser.
- Default backend API:
http://localhost:5000 - Database: PostgreSQL on port 5433
- Default backend API:
If you want to run services locally:
-
Backend Setup
cd backend npm install cp .env.example .env npm run dev -
Frontend Setup
cd frontend npm install npm run dev -
Database PostgreSQL must be running (see
.envfor connection details)
The repository now includes a production stack file at docker-compose.prod.yml.
- Prepare production environment variables:
cp .env.production.example .env.production-
Update
.env.productionwith real secrets and public domains. -
Start the production stack:
docker compose --env-file .env.production -f docker-compose.prod.yml up -d --build- Verify services:
docker compose --env-file .env.production -f docker-compose.prod.yml ps
docker compose --env-file .env.production -f docker-compose.prod.yml logs -f backendNotes:
- Backend runs Prisma migrations at startup using
prisma migrate deploy. - Backend refuses to start in production when required env vars are missing or JWT secrets are left on insecure defaults.
- Use reverse proxy/TLS (Nginx, Caddy, or a cloud load balancer) for public HTTPS.
- ML-Assisted Pre-labeling: Integration of models to suggest labels and speed up workflow.
- Reputation System: Advanced contributor scoring to identify high-value