From cbd965adf228830c33828abe36d3958bd0041953 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Tue, 27 Jan 2026 05:44:04 +0000 Subject: [PATCH 1/2] Initial plan From e3f3bd6aad8fd0cb31aea40048d6bddaeec08967 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Tue, 27 Jan 2026 05:47:08 +0000 Subject: [PATCH 2/2] Add repository-level agent instructions for PR review Co-authored-by: XucSh <107600043+XucSh@users.noreply.github.com> --- .github/copilot-instructions.md | 124 +++++++++ .../instructions/pr-review.instructions.md | 239 ++++++++++++++++++ 2 files changed, 363 insertions(+) create mode 100644 .github/copilot-instructions.md create mode 100644 .github/instructions/pr-review.instructions.md diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md new file mode 100644 index 0000000000..1cbd735414 --- /dev/null +++ b/.github/copilot-instructions.md @@ -0,0 +1,124 @@ +# Mooncake Repository Guidelines for AI Assistants + +## Repository Overview + +Mooncake is a **KVCache-centric disaggregated architecture for LLM serving**, designed to serve large language models at scale with high efficiency. It is production-proven infrastructure powering [Kimi](https://kimi.ai/) by Moonshot AI. + +### Key Technologies +- **Primary Languages**: C++ (core systems), CUDA (GPU acceleration), Python (bindings/integrations), CMake (build system), Shell (scripts) +- **Communication Protocols**: RDMA (InfiniBand/RoCEv2/eRDMA/NVIDIA GPUDirect), TCP, NVMe over Fabric (NVMe-of) +- **Core Components**: Transfer Engine, Mooncake Store, P2P Store, Elastic Expert Parallelism +- **Integrations**: vLLM, SGLang, LMDeploy, TensorRT-LLM, LMCache + +### Project Structure +- `mooncake-transfer-engine/`: Core data transfer framework +- `mooncake-store/`: Distributed KVCache store +- `mooncake-p2p-store/`: Peer-to-peer object sharing +- `mooncake-integration/`: Integration with LLM inference systems +- `mooncake-ep/`: Elastic expert parallelism support +- `mooncake-common/`: Shared utilities and libraries +- `benchmarks/`: Performance evaluation tools +- `docs/`: Documentation and guides + +## Coding Standards + +### Style Guides +- **C++**: Follow [Google C++ Style Guide](https://google.github.io/styleguide/cppguide.html) +- **Python**: Follow [Google Python Style Guide](https://google.github.io/styleguide/pyguide.html) +- **Documentation**: Well-documented code is essential for future contributors + +### Pre-commit Hooks +The repository uses [pre-commit](https://pre-commit.com/) to enforce consistent formatting: +- **C/C++**: `clang-format` (see `.clang-format`) +- **Python**: `ruff` for linting and formatting +- **CMake**: `cmake-format` +- **Spelling**: `codespell` + +**Setup**: +```bash +pip install -r requirements-dev.txt +pre-commit install +``` + +**Run checks**: +```bash +pre-commit run --all-files +``` + +## Build & Test + +### Dependencies Installation +```bash +sudo bash -x dependencies.sh -y +``` + +### Build Commands +```bash +mkdir build +cd build +cmake .. -DUSE_HTTP=ON -DUSE_ETCD=ON -DSTORE_USE_ETCD=ON -DENABLE_ASAN=ON -DCMAKE_BUILD_TYPE=Debug +make -j$(nproc) +sudo make install +``` + +### Testing +- **Unit Tests**: Located in respective component test directories +- **Integration Tests**: See `.github/workflows/integration-test.yml` +- **Coverage**: Use `lcov` and `gcovr` for coverage reports + +### CI/CD +- **Main CI**: `.github/workflows/ci.yml` - Build & test on Ubuntu 22.04 with Python 3.10/3.12 +- **Integration Tests**: `.github/workflows/integration-test.yml` +- **Release**: `.github/workflows/release.yaml` and `.github/workflows/release-non-cuda.yaml` + +## Performance Considerations + +### CUDA and GPU Operations +- **Memory Management**: Be mindful of VRAM allocations; use proper CUDA memory management APIs +- **Synchronization**: Ensure proper CUDA stream synchronization to avoid race conditions +- **Kernel Launches**: Verify kernel launch configurations for optimal performance +- **GPUDirect RDMA**: Understand RDMA-GPU memory interactions for zero-copy transfers + +### RDMA and Networking +- **Bandwidth Aggregation**: Leverage multiple RDMA NICs when available +- **Topology Awareness**: Consider NUMA affinity in path selection +- **Error Handling**: Implement robust retry mechanisms for temporary network failures + +### Concurrency +- **Thread Safety**: Ensure proper locking mechanisms in multi-threaded contexts +- **Lock-Free Data Structures**: Prefer lock-free designs where appropriate for high-throughput paths +- **Async Operations**: Use asynchronous I/O for non-blocking data transfers + +## Documentation + +When modifying user-facing behaviors: +- Update `docs/` with relevant guides and API documentation +- Follow existing documentation structure +- Include code examples where appropriate +- Update README.md if adding new features or components + +## PR Classification + +Use appropriate prefixes in PR titles: +- `[Bugfix]` - Bug fixes +- `[CI/Build]` - Build or continuous integration improvements +- `[Doc]` - Documentation fixes and improvements +- `[Integration]` - Changes in `mooncake-integration` +- `[P2PStore]` - Changes in `mooncake-p2p-store` +- `[Store]` - Changes in `mooncake-store` +- `[TransferEngine]` - Changes in `mooncake-transfer-engine` +- `[Misc]` - Other changes (use sparingly) + +## RFC Process + +For major architectural changes (>500 LOC excluding tests): +- Open a GitHub issue as an RFC (Request for Comments) +- Discuss technical design and justification +- Gather community feedback before implementation + +## Community and Support + +- **Documentation**: https://kvcache-ai.github.io/Mooncake/ +- **Slack**: https://join.slack.com/t/mooncake-project/shared_invite/zt-3ig4fjai8-KH1zIm3x8Vm8WqyH0i_JaA +- **Issues**: GitHub Issues for bug reports and feature requests +- **Contributing Guide**: See `CONTRIBUTING.md` for detailed guidelines diff --git a/.github/instructions/pr-review.instructions.md b/.github/instructions/pr-review.instructions.md new file mode 100644 index 0000000000..62a4b572cd --- /dev/null +++ b/.github/instructions/pr-review.instructions.md @@ -0,0 +1,239 @@ +# PR Review Instructions for AI Code Reviewers + +## Your Role as a PR Reviewer + +You are reviewing code changes for **Mooncake**, a production-grade KVCache-centric disaggregated architecture for LLM serving. Your reviews must be thorough, technically precise, and focused on maintaining the high quality and reliability required for production infrastructure. + +## Core Review Responsibilities + +### 1. Correctness +- **Logic Validation**: Verify the code logic is sound and achieves the intended functionality +- **Edge Cases**: Identify missing edge case handling (null pointers, empty containers, boundary conditions) +- **Error Handling**: Ensure proper error handling and error propagation +- **API Contracts**: Verify that interfaces and APIs are used correctly according to documentation +- **Type Safety**: Check for type mismatches, implicit conversions, and unsafe casts (especially in C++) + +### 2. Concurrency and Thread Safety +- **Race Conditions**: Identify potential data races in multi-threaded code +- **Locking Discipline**: Verify proper use of mutexes, locks, and atomic operations +- **Deadlock Prevention**: Check for potential deadlock scenarios (lock ordering, lock hierarchies) +- **Lock-Free Algorithms**: Validate correctness of lock-free data structures if used +- **Shared State**: Ensure shared state is properly protected or immutable +- **CUDA Synchronization**: Verify proper CUDA stream synchronization and event handling + +### 3. Performance and Efficiency +- **Algorithmic Complexity**: Evaluate time and space complexity, flag O(n²) or worse in hot paths +- **Memory Allocations**: Identify unnecessary allocations in critical paths +- **Copy Operations**: Look for avoidable copies; suggest move semantics where appropriate +- **Cache Efficiency**: Consider data locality and cache-friendly access patterns +- **CUDA-Specific Performance**: + - **Kernel Launch Overhead**: Check for excessive kernel launches; suggest batching if applicable + - **Memory Transfer Overhead**: Identify unnecessary host-device transfers + - **Memory Coalescing**: Verify memory access patterns are coalesced for GPU efficiency + - **Stream Utilization**: Ensure proper use of CUDA streams for concurrency + - **Synchronization Points**: Flag unnecessary `cudaDeviceSynchronize()` calls + - **Occupancy**: Consider kernel occupancy and register usage when relevant +- **RDMA and Network Performance**: + - **Bandwidth Utilization**: Ensure efficient use of RDMA bandwidth (aggregation across NICs) + - **Zero-Copy**: Verify zero-copy techniques are used where applicable + - **Batching**: Check if operations can be batched to reduce latency overhead + +### 4. Security and Privacy +- **Input Validation**: Ensure all external inputs are validated and sanitized +- **Buffer Overflows**: Check for potential buffer overflows (especially in C/C++) +- **Integer Overflows**: Verify arithmetic operations handle overflow safely +- **Resource Leaks**: Identify memory leaks, file descriptor leaks, GPU memory leaks +- **Injection Attacks**: Look for SQL injection, command injection, or similar vulnerabilities +- **Sensitive Data**: Ensure no logging or exposure of sensitive information +- **Privilege Escalation**: Verify proper access controls and permission checks +- **Cryptography**: If crypto is used, ensure use of standard libraries (no custom crypto) + +### 5. Build System and CI +- **Build Correctness**: Verify CMakeLists.txt changes don't break builds +- **Dependency Management**: Check that new dependencies are necessary and properly declared +- **Backwards Compatibility**: Ensure changes don't break existing build configurations +- **CI/CD Impact**: Consider impact on CI workflows and test execution time +- **Platform Support**: Verify cross-platform compatibility (Linux distributions, CUDA versions) + +### 6. Testing +- **Test Coverage**: Ensure new code paths have corresponding unit tests +- **Test Quality**: Verify tests actually validate the intended behavior +- **Edge Case Testing**: Check that edge cases are covered in tests +- **Integration Tests**: For user-facing changes, ensure integration tests exist +- **Performance Tests**: For performance-critical changes, verify benchmarks are included +- **Regression Tests**: Ensure bug fixes include regression tests + +## Anti-Hallucination Guidelines + +**CRITICAL**: If you lack information to make a judgment, **ASK** — do not assume or guess. + +### When to Ask Questions +- **Unclear Intent**: When the purpose or reasoning behind a change is not clear +- **Missing Context**: When you need more information about the surrounding codebase +- **Ambiguous Behavior**: When code behavior is ambiguous or undocumented +- **Incomplete Information**: When the PR description lacks necessary details +- **Design Decisions**: When architectural choices need clarification + +### How to Ask Questions +- Be specific about what information you need +- Reference file paths and line numbers +- Explain why the information is needed for the review +- Suggest what you would check if you had the information + +**Example**: "In `file.cpp:45`, the buffer size is hardcoded to 1024. Can you clarify if this is sufficient for all use cases, or should this be configurable? I cannot verify correctness without understanding the maximum expected data size." + +## Review Output Format + +Structure your review using the following template with severity levels: + +### Severity Levels +- **šŸ”“ Blocker**: Critical issues that must be fixed before merge (security vulnerabilities, correctness bugs, data corruption risks, race conditions) +- **🟠 Major**: Significant issues that should be fixed (performance problems, incorrect API usage, missing error handling, potential resource leaks) +- **🟔 Minor**: Improvements that enhance code quality (code style violations, minor inefficiencies, better naming) +- **ā„¹ļø Info**: Suggestions and observations (alternative approaches, nitpicks, educational comments) + +### Review Template + +```markdown +## Summary +[Brief overview of the PR and your assessment] + +## šŸ”“ Blockers +[List critical issues that must be fixed before merge] + +**File**: `path/to/file.cpp` (Line X-Y) +**Issue**: [Describe the issue] +**Impact**: [Explain why this is critical] +**Recommendation**: [Specific fix or approach] + +--- + +## 🟠 Major Issues +[List significant issues that should be addressed] + +**File**: `path/to/file.cpp` (Line X) +**Issue**: [Describe the issue] +**Impact**: [Explain the consequences] +**Recommendation**: [Suggested fix] + +--- + +## 🟔 Minor Issues +[List code quality improvements] + +**File**: `path/to/file.cpp` (Line X) +**Issue**: [Describe the issue] +**Recommendation**: [Suggested improvement] + +--- + +## ā„¹ļø Informational +[List suggestions, observations, or alternative approaches] + +**File**: `path/to/file.cpp` (Line X) +**Observation**: [Your observation] +**Suggestion**: [Optional suggestion] + +--- + +## ā“ Questions for Author +[List questions where you need clarification to complete the review] + +1. **File**: `path/to/file.cpp` (Line X) + **Question**: [Your specific question] + **Context**: [Why you're asking] + +2. [Additional questions...] + +--- + +## āœ… Positive Observations +[Highlight good practices, clever solutions, or improvements] + +--- + +## Verification Recommendations +[Specific commands or steps to verify the changes] + +- [ ] Build: `cd build && cmake .. [options] && make -j$(nproc)` +- [ ] Unit Tests: [specific test commands or reference to README/CI] +- [ ] Integration Tests: [specific test commands] +- [ ] Performance Benchmarks: [if applicable] +- [ ] Pre-commit Checks: `pre-commit run --all-files` +``` + +## Verification Guidance + +### Build Verification +Always recommend building the affected components to verify changes: +```bash +mkdir -p build && cd build +cmake .. -DUSE_HTTP=ON -DUSE_ETCD=ON -DSTORE_USE_ETCD=ON -DENABLE_ASAN=ON -DCMAKE_BUILD_TYPE=Debug +make -j$(nproc) +``` + +### Test Verification +Recommend running relevant tests: +```bash +# Unit tests - consult component-specific test directories +# Integration tests - see .github/workflows/integration-test.yml +# Performance benchmarks - see benchmarks/ directory +``` + +If specific test commands are unknown, recommend: **"Consult README.md, CI workflows (.github/workflows/), or component-specific test documentation."** + +### Pre-commit Verification +Always recommend running pre-commit hooks: +```bash +pre-commit run --all-files +``` + +## Mooncake-Specific Considerations + +### Transfer Engine Reviews +- **Protocol Selection**: Verify appropriate protocol (TCP/RDMA/NVMe-of) for use case +- **Multi-NIC Handling**: Check proper aggregation across multiple RDMA NICs +- **Topology Awareness**: Ensure NUMA affinity is considered in device selection +- **Error Recovery**: Verify retry logic and fallback paths for network errors + +### CUDA Code Reviews +- **Memory Safety**: Check `cudaMalloc`/`cudaFree` pairing, no leaks +- **Stream Management**: Verify CUDA streams are created, used, and destroyed properly +- **Kernel Correctness**: Validate grid/block dimensions, shared memory usage +- **Error Checking**: Ensure `cudaGetLastError()` or similar checks after kernel launches +- **GPUDirect**: Verify correct use of GPUDirect RDMA APIs if applicable + +### Store Reviews (KVCache, P2P) +- **Consistency**: Verify cache consistency guarantees are maintained +- **Eviction Policies**: Check cache eviction logic is correct +- **Distributed Coordination**: Ensure proper use of etcd or other coordination services +- **Serialization**: Verify tensor serialization/deserialization is correct + +### Integration Reviews (vLLM, SGLang, etc.) +- **API Compatibility**: Ensure changes don't break integration contracts +- **Version Compatibility**: Check compatibility with supported versions +- **Performance Impact**: Consider impact on end-to-end inference performance + +## What NOT to Do + +- āŒ **Don't nitpick style** if pre-commit hooks will catch it +- āŒ **Don't request major refactors** unless there's a clear correctness/security/performance issue +- āŒ **Don't make assumptions** about code behavior without verification +- āŒ **Don't approve** if you have unanswered questions that affect correctness/security +- āŒ **Don't block** on minor style preferences +- āŒ **Don't suggest optimizations** without evidence they're needed (avoid premature optimization) + +## Final Checklist + +Before submitting your review, ensure: +- [ ] All comments reference specific file paths and line numbers +- [ ] Severity levels are assigned appropriately +- [ ] Questions are clearly stated and justified +- [ ] Recommendations are specific and actionable +- [ ] Positive observations are included where applicable +- [ ] Verification steps are provided +- [ ] No assumptions are made where information is missing + +--- + +**Remember**: Your goal is to help maintain high code quality while being respectful, constructive, and avoiding blocking on trivial matters. Focus on correctness, security, and performance in production-critical infrastructure.