Skip to content

MeisnerDan/awesome-opensource-ai

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome Open Source AI

Awesome Open Source AI

A curated list of notable open-source AI models, libraries, infrastructure, and developer tools.

Awesome PRs Welcome License: CC0-1.0

Managed by MoltFounders

by Boring Dystopia Development

boringdystopia.ai   X @alvinunreal   Telegram Join channel


📋 Contents


🧬 1. Core Frameworks & Libraries

Core libraries and frameworks used to build, train, and run AI and machine learning systems.

Deep Learning Frameworks

  • PyTorch GitHub stars - Dynamic computation graphs, Pythonic API, dominant in research and production. The current standard for most frontier AI work.
  • TensorFlow GitHub stars - End-to-end platform with excellent production deployment, TPU support, and large-scale serving tools.
  • JAX GitHub stars + Flax GitHub stars - High-performance numerical computing with composable transformations (JIT, vmap, grad). Rising favorite for research and scientific ML.
  • Keras GitHub stars - High-level, beginner-friendly API that now runs on multiple backends (TensorFlow, JAX, PyTorch). Perfect for rapid experimentation.

Rust ML Frameworks

  • Burn GitHub stars - Next-generation deep learning framework in Rust. Backend-agnostic with CPU, GPU, WebAssembly support.
  • Candle (Hugging Face) GitHub stars - Minimalist ML framework for Rust. PyTorch-like API with focus on performance and simplicity.

NLP & Transformers

Data Processing & Manipulation

  • Pandas GitHub stars - The gold standard for data analysis and manipulation in Python.
  • Polars GitHub stars - Blazing-fast DataFrame library (Rust backend) - modern alternative to pandas for large-scale workloads.
  • Dask GitHub stars - Parallel computing for big data - scales pandas/NumPy/scikit-learn to clusters.
  • NumPy GitHub stars - Fundamental array computing library that powers almost every AI stack.
  • SciPy GitHub stars - Scientific computing algorithms (optimization, linear algebra, statistics, signal processing).

Classical ML & Gradient Boosting

  • scikit-learn GitHub stars - Industry-standard library for traditional machine learning (classification, regression, clustering, pipelines).
  • XGBoost GitHub stars - Scalable, high-performance gradient boosting library. Still dominates Kaggle and tabular competitions.
  • LightGBM GitHub stars - Microsoft's ultra-fast gradient boosting framework, optimized for speed and memory.
  • CatBoost GitHub stars - Gradient boosting that handles categorical features natively with great out-of-the-box performance.

AutoML & Hyperparameter Optimization

  • Optuna GitHub stars - Modern, define-by-run hyperparameter optimization with pruning and visualizations. Extremely popular in 2026.
  • AutoGluon GitHub stars - AWS AutoML toolkit for tabular, image, text, and multimodal data - state-of-the-art with almost zero code.
  • FLAML GitHub stars - Microsoft's fast & lightweight AutoML focused on efficiency and low compute.
  • AutoKeras GitHub stars - Neural architecture search on top of Keras.
  • TPOT GitHub stars - Genetic programming-based AutoML for full pipeline optimization.

Model Training & Optimization Utilities

  • Hugging Face Accelerate GitHub stars - Simple API to make training scripts run on any hardware (multi-GPU, TPU, mixed precision) with minimal code changes.
  • DeepSpeed GitHub stars - Microsoft's deep learning optimization library for extreme-scale training (ZeRO, offloading, MoE).
  • Transformers GitHub stars - Library of pretrained transformer models and utilities for text, vision, audio, and multimodal training and inference.
  • FlashAttention GitHub stars - Fast exact attention kernels that reduce memory usage and accelerate transformer training and inference.
  • xFormers GitHub stars - Optimized transformer building blocks and attention operators for PyTorch.
  • PyTorch Lightning GitHub stars - High-level wrapper for PyTorch that removes boilerplate and adds best practices.
  • ONNX Runtime GitHub stars - High-performance inference and training for ONNX models across hardware.

🧠 2. Open Foundation Models

Pretrained language, multimodal, speech, and video models with publicly available weights.

Large Language Models (Base + Chat)

  • Qwen3.5 (Alibaba) GitHub stars - Native multimodal open series spanning from small to frontier-scale models (0.8B–122B). Released Feb 2026 with 397B total MoE params (17B active), strong in coding, math, vision, and instruction following.
  • DeepSeek-V3.2 / R1 (DeepSeek) GitHub stars - Mixture-of-Experts family with exceptional reasoning, math, and efficient large-scale inference.
  • Gemma 3 (Google) GitHub stars - Lightweight yet powerful open models with excellent efficiency for on-device use. Strong multilingual support across 100+ languages.
  • MiniMax-M2.1 / M1 (MiniMax) GitHub stars - Open-weight MiniMax model line spanning long-context reasoning and agentic software tasks, with strong tool use and publicly released weights for local deployment.
  • Kimi K2.5 (Moonshot AI) GitHub stars - Frontier open-weight MoE model with 256K context, strong coding and reasoning performance, and native multimodal + tool-use support for agentic workflows.
  • Mistral Large / Nemo / Small - High-performance model family with strong multilingual capability, tool use, and efficient deployment profiles.
  • Phi-4 / Phi-3.5 (Microsoft) GitHub stars - Small but highly capable models optimized for reasoning, edge devices, and on-device inference.
  • GLM-5 (Zhipu AI) GitHub stars - Strong open model line with solid coding, reasoning, and agentic-task performance.
  • OLMo 2 (Allen AI) GitHub stars - Fully open-source LLMs (1B–32B) with complete transparency: models, data, training code, and logs. Designed by scientists, for scientists.

Coding & Reasoning Models

Multimodal Models (Vision + Language)

  • Qwen3-VL (Alibaba) GitHub stars - Latest flagship VLM with native 256K context (expandable to 1M), visual agent capabilities, 3D grounding, and superior multimodal reasoning. Major leap over Qwen2.5-VL.
  • InternVL3 (OpenGVLab) GitHub stars - Native multimodal pretraining with mixed preference optimization (MPO). Superior perception and reasoning over InternVL 2.5, extends to GUI agents and 3D vision.
  • GLM-4.5V / GLM-4.1V-Thinking (Zhipu AI) GitHub stars - Strong multimodal reasoning with scalable reinforcement learning. Compares favorably with Gemini-2.5-Flash on benchmarks.
  • LLaVA-OneVision GitHub stars - Successor to LLaVA 1.6 with expanded capabilities across vision-language tasks.
  • MiniCPM-V 2.6 GitHub stars - Handles images up to 1.8M pixels with top-tier OCR performance. Excellent for on-device deployment.
  • Gemma 3 (Google) GitHub stars - Lightweight multimodal supporting vision-language input, optimized for efficiency and on-device use.

Speech & Audio Models (TTS, STT, Music)

Video & Animation Models


⚡ 3. Inference Engines & Serving

Inference runtimes, serving systems, and optimization tools for running models locally or in production.

Local / On-device Inference

  • llama.cpp GitHub stars - Pure C/C++ inference engine with GGUF format support. The gold standard for CPU/GPU/Apple Silicon on-device running. Includes llama-server for OpenAI-compatible API.
  • Ollama GitHub stars - Dead-simple local LLM runner with a one-line install, model registry, and OpenAI-compatible API.
  • MLX GitHub stars (Apple) - High-performance array framework + LLM inference optimized for Apple Silicon.
  • MLC-LLM GitHub stars - Deployment engine that compiles and runs LLMs across browsers, mobile devices, and local hardware.
  • llama-cpp-python GitHub stars - Official Python bindings for llama.cpp.
  • KoboldCpp GitHub stars - User-friendly llama.cpp fork focused on role-playing and creative writing.
  • Potato OS GitHub stars - Linux distribution for fully local AI inference on Raspberry Pi 5 and 4. Optimized for running open models at the edge.

High-performance Serving & API Servers

  • vLLM GitHub stars - State-of-the-art serving engine with PagedAttention and continuous batching. Currently the fastest production-grade LLM server.
  • Text Generation Inference (TGI) GitHub stars - Hugging Face's production-ready Rust-based server.
  • SGLang GitHub stars - Next-gen serving framework with RadixAttention.
  • TensorRT-LLM GitHub stars - NVIDIA's official high-performance inference backend.
  • Aphrodite Engine GitHub stars - vLLM fork optimized for role-play and creative writing.
  • Open Model Engine (OME) GitHub stars - Kubernetes operator for LLM serving. GPU scheduling, model lifecycle management. Works with vLLM, SGLang, TensorRT-LLM.

Quantization, Distillation & Optimization

  • GGUF GitHub stars (part of llama.cpp) - Modern quantized format that powers most local inference.
  • bitsandbytes GitHub stars - 8-bit and 4-bit optimizers + quantization.
  • AutoAWQ GitHub stars - Activation-aware Weight Quantization toolkit.
  • AutoGPTQ GitHub stars - GPTQ quantization framework.
  • HQQ GitHub stars - Half-Quadratic Quantization - ultra-fast method rising in 2026.
  • ExLlamaV2 GitHub stars - Highly optimized CUDA kernels for 4-bit/8-bit inference.
  • Optimum GitHub stars - Hardware-specific acceleration and quantization.

🤖 4. Agentic AI & Multi-Agent Systems

Frameworks and platforms for building agent-based systems and multi-agent workflows.

Single-Agent Frameworks

  • LangGraph GitHub stars - Stateful, controllable agent orchestration.
  • CrewAI GitHub stars - Role-based agent framework.
  • AutoGen (AG2) GitHub stars - Flexible multi-agent conversation framework.
  • DSPy GitHub stars - Framework for programming language model pipelines with modules, optimizers, and evaluation loops.
  • Semantic Kernel GitHub stars - SDK for building and orchestrating AI agents and workflows across multiple programming languages.
  • smolagents GitHub stars - Lightweight agent framework centered on tool use and code-executing workflows.
  • LangChain GitHub stars - Foundational library for agents, chains, and memory.
  • smolagents (Hugging Face) GitHub stars - Minimalist agent library. Build agents in 3 lines of code with code-first action execution.

Multi-Agent Orchestration

  • MetaGPT GitHub stars - Simulates an entire "AI software company".
  • Swarm GitHub stars - Lightweight multi-agent orchestration from OpenAI.
  • Swarms GitHub stars - Bleeding-edge enterprise multi-agent orchestration.
  • Llama-Agents GitHub stars - Async-first multi-agent system.
  • Mastra GitHub stars - TypeScript-first agent framework with built-in RAG, workflows, tool integrations, observability and observational memory.
  • Mission Control GitHub stars - Cockpit for the agentic era — manage AI agent swarms with autonomous daemon, Field Ops for real-world execution, encrypted vault, and approval workflows.

Autonomous Coding Agents

  • OpenHands (ex-OpenDevin) GitHub stars - Full-featured open-source AI software engineer.
  • Goose GitHub stars - Extensible on-machine AI agent for development tasks.
  • OpenCode GitHub stars - Terminal-native autonomous coding agent.
  • Aider GitHub stars - Command-line pair-programming agent.
  • Pi (badlogic) GitHub stars - Terminal coding agent with hash-anchored edits, LSP integration, subagents, MCP support, and package ecosystem.
  • Mistral-Vibe (Mistral) GitHub stars - Minimal CLI coding agent by Mistral. Lightweight, fast, and designed for local development workflows.
  • Nanocoder (Nano-Collective) GitHub stars - Beautiful local-first coding agent running in your terminal. Built for privacy and control with support for multiple AI providers via OpenRouter.

Domain-Specific Agents

  • Langflow GitHub stars - Visual low-code platform for agentic workflows.
  • Dify GitHub stars - Production-ready agentic workflow platform.
  • OWL (camel-ai/owl) GitHub stars - Advanced multi-agent collaboration system.
  • SuperAGI GitHub stars - Dev-first autonomous AI agent platform.

Agent Memory & State

  • Letta (ex-MemGPT) GitHub stars - Platform for building stateful agents with advanced memory that learn and self-improve over time.
  • Mem0 GitHub stars - Universal memory layer for AI agents. Persistent, multi-session memory across models and environments.
  • Forgetful GitHub stars - MCP server for persistent AI agent memory. Stores atomic single-concept notes and auto-links them into a knowledge graph via semantic similarity. SQLite or PostgreSQL.

🔍 5. Retrieval-Augmented Generation (RAG) & Knowledge

Retrieval systems, vector databases, embedding models, and related tooling for RAG pipelines.

Vector Databases & Search Engines

  • Chroma GitHub stars - Most popular open-source embedding database.
  • Qdrant GitHub stars - High-performance vector search engine in Rust.
  • Weaviate GitHub stars - GraphQL-native vector search engine.
  • Milvus GitHub stars - Scalable cloud-native vector database.
  • Faiss GitHub stars - Similarity search and clustering library for dense vectors with CPU and GPU implementations.
  • NornicDB GitHub stars - Golang Low-latency graph + vector hybrid retrieval, Neo4j and qDrant driver compatible.
  • LanceDB GitHub stars - Serverless vector DB optimized for multimodal data.
  • pgvector GitHub stars - PostgreSQL extension for vector similarity search.

Embedding Models

RAG Frameworks & Advanced Retrieval Tools

  • LlamaIndex GitHub stars - Full-featured RAG pipeline with advanced indexing.
  • Haystack GitHub stars - End-to-end NLP and RAG framework.
  • RAGFlow GitHub stars - Deep-document-understanding RAG engine.
  • GraphRAG (Microsoft) GitHub stars - Knowledge-graph-based RAG.
  • Verba (Weaviate) GitHub stars - Golden RAG frontend with intuitive UI for retrieval and exploration.
  • RAGatouille GitHub stars - Advanced retrieval tools with late interaction models (ColBERT).
  • Docling GitHub stars - Document processing toolkit for turning PDFs and other files into structured data for GenAI workflows.
  • Unstructured GitHub stars - Best-in-class document preprocessing.
  • ColPali / ColQwen GitHub stars - Vision-language models for document retrieval.

Web Data Ingestion

  • Crawl4AI GitHub stars - LLM-friendly web crawler that turns websites into clean Markdown for RAG and agentic workflows.
  • Lightpanda GitHub stars - Machine-first headless browser in Zig; rendering-free and ultra-lightweight for AI agent browsing.

🎨 6. Generative Media Tools

Open-source models and applications for image, video, audio, and 3D generation and editing.

Image Generation & Editing

Video Generation

Audio / Music / Voice Generation

3D & Creative Tools


🛠️ 7. Training & Fine-tuning Ecosystem

Tools for model training, fine-tuning, synthetic data generation, and distributed training.

Full Training Frameworks

  • LLaMA-Factory GitHub stars - One-stop unified framework for SFT, DPO, ORPO, KTO with web UI.
  • Axolotl GitHub stars - YAML-driven full pipeline for SFT, DPO, GRPO.
  • Unsloth GitHub stars - 2× faster, 70% less memory fine-tuning.
  • LitGPT GitHub stars - Clean from-scratch implementations of 20+ LLMs.
  • torchtune GitHub stars - PyTorch-native library for post-training, fine-tuning, and experimentation with LLMs.
  • TRL (Transformers Reinforcement Learning) GitHub stars - Official library for RLHF, SFT, DPO, ORPO.

LoRA / PEFT Tools

Synthetic Data Generation

  • distilabel GitHub stars - End-to-end pipeline for synthetic instruction data.
  • Data-Juicer GitHub stars - High-performance data processing for LLM training.
  • Argilla GitHub stars - Open-source data labeling + synthetic data platform.
  • SDV (Synthetic Data Vault) GitHub stars - High-fidelity tabular and relational synthetic data.

Distributed Training

  • DeepSpeed GitHub stars - Extreme-scale training optimizations.
  • Colossal-AI GitHub stars - Unified system for 100B+ models.
  • Megatron-LM GitHub stars - Distributed training framework and reference codebase for large transformer models at scale.
  • Ray Train GitHub stars - Scalable distributed training.

📊 8. MLOps / LLMOps & Production

Tooling for tracking, deploying, monitoring, and operating AI systems in production.

Experiment Tracking & Versioning

  • MLflow GitHub stars - End-to-end open platform for the ML/LLM lifecycle.
  • DVC (Data Version Control) GitHub stars - Git-like versioning for data and models.
  • ClearML GitHub stars - Open-source platform for experiment tracking, orchestration, data management, and model serving.
  • Weights & Biases Weave GitHub stars - Open-source tracing and experiment tracking.

Deployment & Orchestration

  • BentoML GitHub stars - Unified framework to build, ship, and scale AI apps.
  • Ray Serve GitHub stars - Scalable model serving library.
  • ZenML GitHub stars - Pipeline and orchestration framework for taking ML and LLM systems from development to production.
  • Kubeflow GitHub stars - Kubernetes-native ML/LLM platform.
  • KServe GitHub stars - Kubernetes-based model serving.

Monitoring, Evaluation & Observability

Guardrails & Safety Tools


📈 9. Evaluation, Benchmarks & Datasets

Benchmarks, evaluation frameworks, datasets, and supporting tools for model assessment.

Benchmark Suites

  • lm-evaluation-harness (EleutherAI) GitHub stars - De-facto standard for generative model evaluation.
  • HELM (Stanford) GitHub stars - Holistic Evaluation of Language Models.
  • GAIA - Real-world multi-step agentic benchmark.
  • LiveCodeBench GitHub stars - Contamination-free coding benchmark.
  • MMLU-Pro / GPQA GitHub stars - Hardened expert-level benchmarks.
  • OpenCompass GitHub stars - Evaluation platform for benchmarking language and multimodal models across large benchmark suites.
  • SWE-rebench (Nebius) - Continuously updated benchmark with 21,000+ real-world SWE tasks for evaluating agentic LLMs. Decontaminated, mined from GitHub.

Evaluation Frameworks

  • DeepEval GitHub stars - The "Pytest for LLMs".
  • RAGAs GitHub stars - End-to-end RAG evaluation framework.
  • Lighteval GitHub stars - Evaluation toolkit for LLMs across multiple backends with reusable tasks, metrics, and result tracking.
  • Hugging Face Evaluate GitHub stars - Standardized evaluation metrics.

High-quality Open Datasets & Data Tools


🛡️ 10. AI Safety, Alignment & Interpretability

Tools for alignment, interpretability, safety evaluation, and adversarial testing.

Alignment & RLHF Tools

  • Safe-RLHF GitHub stars - Safe reinforcement learning from human feedback.
  • Alignment Handbook GitHub stars - Complete recipes for full-stack alignment.
  • OpenRLHF GitHub stars - High-performance distributed RLHF framework.

Interpretability & Explainability

  • TransformerLens GitHub stars - Gold-standard for mechanistic interpretability.
  • nnsight GitHub stars - Scalable library for intervening on neural networks.
  • SAELens GitHub stars - Sparse autoencoders for interpretable features.
  • Captum GitHub stars - PyTorch's official interpretability library.

Adversarial & Red-teaming Tools


🧩 11. Specialized Domains

Computer Vision

  • OpenCV GitHub stars - World's most widely used computer vision library.
  • Ultralytics YOLO GitHub stars - State-of-the-art real-time object detection.
  • Detectron2 GitHub stars - High-performance object detection library.
  • SAM 2 GitHub stars - Promptable image and video segmentation model with released checkpoints and training code.
  • Kornia GitHub stars - Differentiable computer vision library.
  • MediaPipe GitHub stars - Cross-platform multimodal pipelines.

Reinforcement Learning & Robotics

Time Series & Scientific AI

  • Time Series Library (TSLib) GitHub stars - Comprehensive benchmark for time-series models.
  • Chronos (Amazon) GitHub stars - Pretrained foundation models for time-series forecasting.
  • Darts GitHub stars - Easy-to-use time-series forecasting library.
  • AutoTS GitHub stars - Automated time series forecasting with broad model selection, ensembling, anomaly detection, and holiday effects. Designed for production deployment with minimal setup.

Edge / On-device AI


🖥️ 12. User Interfaces & Self-hosted Platforms

Local AI Chat UIs & Personal Assistants

  • OpenClaw GitHub stars - Local-first personal AI assistant with multi-channel integrations and full agentic task execution.
  • Open WebUI GitHub stars - Most popular self-hosted ChatGPT-style interface.
  • text-generation-webui GitHub stars - Web UI for running local LLMs with multiple backends, extensions, and model formats.
  • LobeChat GitHub stars - Sleek modern chat UI.
  • LibreChat GitHub stars - Feature-packed multi-LLM interface.
  • HuggingChat (self-hosted) GitHub stars - Official open-source codebase for HuggingChat.
  • Khoj GitHub stars - Self-hostable personal AI assistant for search, chat, automation, and workflows over local and web data.
  • Newelle GitHub stars - GNOME/Linux desktop virtual assistant with integrated file editor, global hotkeys, and profile manager.

Full Self-hosted AI Platforms

  • AnythingLLM GitHub stars - All-in-one RAG + agents platform.
  • Dify GitHub stars - Complete AI application platform with visual builder.
  • Langflow GitHub stars - Visual low-code platform for LangChain flows.
  • Flowise GitHub stars - Drag-and-drop LLM app builder.

Desktop & Mobile AI Apps

  • GPT4All GitHub stars - Privacy-first local desktop chatbot.
  • Jan GitHub stars - Local-first AI app framework.
  • SillyTavern GitHub stars - Highly customizable role-playing frontend.

🧪 13. Developer Tools & Integrations

AI Coding Assistants (open-source)

  • Continue GitHub stars - Open-source AI coding autopilot for VS Code & JetBrains.
  • Tabby GitHub stars - Self-hosted AI coding assistant.
  • Cline GitHub stars - Open-source IDE coding agent that can edit files, run commands, and use tools with user approval.
  • Open Interpreter GitHub stars - Lets LLMs run code locally.
  • Roo Code GitHub stars - Open-source editor-based coding agent with multiple modes and tool integrations.
  • Aider GitHub stars - Terminal-based AI pair programmer.

IDE Plugins & Extensions

  • llama.vim GitHub stars - Local LLM-powered code completion plugin for Vim/Neovim using llama.cpp. Fast, privacy-first, no API key needed.
  • CodeCompanion.nvim GitHub stars - AI-powered coding assistant for Neovim. Inline code generation, chat, actions, and tool use with support for multiple LLM providers.
  • Continue VS Code / JetBrains GitHub stars - Most installed open-source AI extension.
  • Jupyter AI GitHub stars - Chat and code generation inside notebooks.

Testing & Debugging Tools


📚 14. Resources & Learning

Papers with Open Implementations

Communities, Forums & Newsletters

Courses & Interactive Playgrounds

Starter Projects & Examples


Contributing

Contributions are highly welcome! Please read the CONTRIBUTING.md for guidelines (quality standards, formatting, license requirements, etc.).

  • Only OSI-approved licenses
  • Projects must be actively maintained (commits in last 6 months)
  • High-quality, well-documented, real adoption

License

This list itself is licensed under CC0 1.0 Universal. Feel free to use it for any purpose.


Made with ❤️ for the open-source AI community. Star the repo if you find it useful - it helps more people discover the best open tools!


About

Curated list of the best truly open-source AI projects, models, tools, and infrastructure.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors