From cbd965adf228830c33828abe36d3958bd0041953 Mon Sep 17 00:00:00 2001
From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com>
Date: Tue, 27 Jan 2026 05:44:04 +0000
Subject: [PATCH 1/2] Initial plan


From e3f3bd6aad8fd0cb31aea40048d6bddaeec08967 Mon Sep 17 00:00:00 2001
From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com>
Date: Tue, 27 Jan 2026 05:47:08 +0000
Subject: [PATCH 2/2] Add repository-level agent instructions for PR review

Co-authored-by: XucSh <107600043+XucSh@users.noreply.github.com>
---
 .github/copilot-instructions.md               | 124 +++++++++
 .../instructions/pr-review.instructions.md    | 239 ++++++++++++++++++
 2 files changed, 363 insertions(+)
 create mode 100644 .github/copilot-instructions.md
 create mode 100644 .github/instructions/pr-review.instructions.md

diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
new file mode 100644
index 0000000000..1cbd735414
--- /dev/null
+++ b/.github/copilot-instructions.md
@@ -0,0 +1,124 @@
+# Mooncake Repository Guidelines for AI Assistants
+
+## Repository Overview
+
+Mooncake is a **KVCache-centric disaggregated architecture for LLM serving**, designed to serve large language models at scale with high efficiency. It is production-proven infrastructure powering [Kimi](https://kimi.ai/) by Moonshot AI.
+
+### Key Technologies
+- **Primary Languages**: C++ (core systems), CUDA (GPU acceleration), Python (bindings/integrations), CMake (build system), Shell (scripts)
+- **Communication Protocols**: RDMA (InfiniBand/RoCEv2/eRDMA/NVIDIA GPUDirect), TCP, NVMe over Fabric (NVMe-of)
+- **Core Components**: Transfer Engine, Mooncake Store, P2P Store, Elastic Expert Parallelism
+- **Integrations**: vLLM, SGLang, LMDeploy, TensorRT-LLM, LMCache
+
+### Project Structure
+- `mooncake-transfer-engine/`: Core data transfer framework
+- `mooncake-store/`: Distributed KVCache store
+- `mooncake-p2p-store/`: Peer-to-peer object sharing
+- `mooncake-integration/`: Integration with LLM inference systems
+- `mooncake-ep/`: Elastic expert parallelism support
+- `mooncake-common/`: Shared utilities and libraries
+- `benchmarks/`: Performance evaluation tools
+- `docs/`: Documentation and guides
+
+## Coding Standards
+
+### Style Guides
+- **C++**: Follow [Google C++ Style Guide](https://google.github.io/styleguide/cppguide.html)
+- **Python**: Follow [Google Python Style Guide](https://google.github.io/styleguide/pyguide.html)
+- **Documentation**: Well-documented code is essential for future contributors
+
+### Pre-commit Hooks
+The repository uses [pre-commit](https://pre-commit.com/) to enforce consistent formatting:
+- **C/C++**: `clang-format` (see `.clang-format`)
+- **Python**: `ruff` for linting and formatting
+- **CMake**: `cmake-format`
+- **Spelling**: `codespell`
+
+**Setup**:
+```bash
+pip install -r requirements-dev.txt
+pre-commit install
+```
+
+**Run checks**:
+```bash
+pre-commit run --all-files
+```
+
+## Build & Test
+
+### Dependencies Installation
+```bash
+sudo bash -x dependencies.sh -y
+```
+
+### Build Commands
+```bash
+mkdir build
+cd build
+cmake .. -DUSE_HTTP=ON -DUSE_ETCD=ON -DSTORE_USE_ETCD=ON -DENABLE_ASAN=ON -DCMAKE_BUILD_TYPE=Debug
+make -j$(nproc)
+sudo make install
+```
+
+### Testing
+- **Unit Tests**: Located in respective component test directories
+- **Integration Tests**: See `.github/workflows/integration-test.yml`
+- **Coverage**: Use `lcov` and `gcovr` for coverage reports
+
+### CI/CD
+- **Main CI**: `.github/workflows/ci.yml` - Build & test on Ubuntu 22.04 with Python 3.10/3.12
+- **Integration Tests**: `.github/workflows/integration-test.yml`
+- **Release**: `.github/workflows/release.yaml` and `.github/workflows/release-non-cuda.yaml`
+
+## Performance Considerations
+
+### CUDA and GPU Operations
+- **Memory Management**: Be mindful of VRAM allocations; use proper CUDA memory management APIs
+- **Synchronization**: Ensure proper CUDA stream synchronization to avoid race conditions
+- **Kernel Launches**: Verify kernel launch configurations for optimal performance
+- **GPUDirect RDMA**: Understand RDMA-GPU memory interactions for zero-copy transfers
+
+### RDMA and Networking
+- **Bandwidth Aggregation**: Leverage multiple RDMA NICs when available
+- **Topology Awareness**: Consider NUMA affinity in path selection
+- **Error Handling**: Implement robust retry mechanisms for temporary network failures
+
+### Concurrency
+- **Thread Safety**: Ensure proper locking mechanisms in multi-threaded contexts
+- **Lock-Free Data Structures**: Prefer lock-free designs where appropriate for high-throughput paths
+- **Async Operations**: Use asynchronous I/O for non-blocking data transfers
+
+## Documentation
+
+When modifying user-facing behaviors:
+- Update `docs/` with relevant guides and API documentation
+- Follow existing documentation structure
+- Include code examples where appropriate
+- Update README.md if adding new features or components
+
+## PR Classification
+
+Use appropriate prefixes in PR titles:
+- `[Bugfix]` - Bug fixes
+- `[CI/Build]` - Build or continuous integration improvements
+- `[Doc]` - Documentation fixes and improvements
+- `[Integration]` - Changes in `mooncake-integration`
+- `[P2PStore]` - Changes in `mooncake-p2p-store`
+- `[Store]` - Changes in `mooncake-store`
+- `[TransferEngine]` - Changes in `mooncake-transfer-engine`
+- `[Misc]` - Other changes (use sparingly)
+
+## RFC Process
+
+For major architectural changes (>500 LOC excluding tests):
+- Open a GitHub issue as an RFC (Request for Comments)
+- Discuss technical design and justification
+- Gather community feedback before implementation
+
+## Community and Support
+
+- **Documentation**: https://kvcache-ai.github.io/Mooncake/
+- **Slack**: https://join.slack.com/t/mooncake-project/shared_invite/zt-3ig4fjai8-KH1zIm3x8Vm8WqyH0i_JaA
+- **Issues**: GitHub Issues for bug reports and feature requests
+- **Contributing Guide**: See `CONTRIBUTING.md` for detailed guidelines
diff --git a/.github/instructions/pr-review.instructions.md b/.github/instructions/pr-review.instructions.md
new file mode 100644
index 0000000000..62a4b572cd
--- /dev/null
+++ b/.github/instructions/pr-review.instructions.md
@@ -0,0 +1,239 @@
+# PR Review Instructions for AI Code Reviewers
+
+## Your Role as a PR Reviewer
+
+You are reviewing code changes for **Mooncake**, a production-grade KVCache-centric disaggregated architecture for LLM serving. Your reviews must be thorough, technically precise, and focused on maintaining the high quality and reliability required for production infrastructure.
+
+## Core Review Responsibilities
+
+### 1. Correctness
+- **Logic Validation**: Verify the code logic is sound and achieves the intended functionality
+- **Edge Cases**: Identify missing edge case handling (null pointers, empty containers, boundary conditions)
+- **Error Handling**: Ensure proper error handling and error propagation
+- **API Contracts**: Verify that interfaces and APIs are used correctly according to documentation
+- **Type Safety**: Check for type mismatches, implicit conversions, and unsafe casts (especially in C++)
+
+### 2. Concurrency and Thread Safety
+- **Race Conditions**: Identify potential data races in multi-threaded code
+- **Locking Discipline**: Verify proper use of mutexes, locks, and atomic operations
+- **Deadlock Prevention**: Check for potential deadlock scenarios (lock ordering, lock hierarchies)
+- **Lock-Free Algorithms**: Validate correctness of lock-free data structures if used
+- **Shared State**: Ensure shared state is properly protected or immutable
+- **CUDA Synchronization**: Verify proper CUDA stream synchronization and event handling
+
+### 3. Performance and Efficiency
+- **Algorithmic Complexity**: Evaluate time and space complexity, flag O(n²) or worse in hot paths
+- **Memory Allocations**: Identify unnecessary allocations in critical paths
+- **Copy Operations**: Look for avoidable copies; suggest move semantics where appropriate
+- **Cache Efficiency**: Consider data locality and cache-friendly access patterns
+- **CUDA-Specific Performance**:
+  - **Kernel Launch Overhead**: Check for excessive kernel launches; suggest batching if applicable
+  - **Memory Transfer Overhead**: Identify unnecessary host-device transfers
+  - **Memory Coalescing**: Verify memory access patterns are coalesced for GPU efficiency
+  - **Stream Utilization**: Ensure proper use of CUDA streams for concurrency
+  - **Synchronization Points**: Flag unnecessary `cudaDeviceSynchronize()` calls
+  - **Occupancy**: Consider kernel occupancy and register usage when relevant
+- **RDMA and Network Performance**:
+  - **Bandwidth Utilization**: Ensure efficient use of RDMA bandwidth (aggregation across NICs)
+  - **Zero-Copy**: Verify zero-copy techniques are used where applicable
+  - **Batching**: Check if operations can be batched to reduce latency overhead
+
+### 4. Security and Privacy
+- **Input Validation**: Ensure all external inputs are validated and sanitized
+- **Buffer Overflows**: Check for potential buffer overflows (especially in C/C++)
+- **Integer Overflows**: Verify arithmetic operations handle overflow safely
+- **Resource Leaks**: Identify memory leaks, file descriptor leaks, GPU memory leaks
+- **Injection Attacks**: Look for SQL injection, command injection, or similar vulnerabilities
+- **Sensitive Data**: Ensure no logging or exposure of sensitive information
+- **Privilege Escalation**: Verify proper access controls and permission checks
+- **Cryptography**: If crypto is used, ensure use of standard libraries (no custom crypto)
+
+### 5. Build System and CI
+- **Build Correctness**: Verify CMakeLists.txt changes don't break builds
+- **Dependency Management**: Check that new dependencies are necessary and properly declared
+- **Backwards Compatibility**: Ensure changes don't break existing build configurations
+- **CI/CD Impact**: Consider impact on CI workflows and test execution time
+- **Platform Support**: Verify cross-platform compatibility (Linux distributions, CUDA versions)
+
+### 6. Testing
+- **Test Coverage**: Ensure new code paths have corresponding unit tests
+- **Test Quality**: Verify tests actually validate the intended behavior
+- **Edge Case Testing**: Check that edge cases are covered in tests
+- **Integration Tests**: For user-facing changes, ensure integration tests exist
+- **Performance Tests**: For performance-critical changes, verify benchmarks are included
+- **Regression Tests**: Ensure bug fixes include regression tests
+
+## Anti-Hallucination Guidelines
+
+**CRITICAL**: If you lack information to make a judgment, **ASK** — do not assume or guess.
+
+### When to Ask Questions
+- **Unclear Intent**: When the purpose or reasoning behind a change is not clear
+- **Missing Context**: When you need more information about the surrounding codebase
+- **Ambiguous Behavior**: When code behavior is ambiguous or undocumented
+- **Incomplete Information**: When the PR description lacks necessary details
+- **Design Decisions**: When architectural choices need clarification
+
+### How to Ask Questions
+- Be specific about what information you need
+- Reference file paths and line numbers
+- Explain why the information is needed for the review
+- Suggest what you would check if you had the information
+
+**Example**: "In `file.cpp:45`, the buffer size is hardcoded to 1024. Can you clarify if this is sufficient for all use cases, or should this be configurable? I cannot verify correctness without understanding the maximum expected data size."
+
+## Review Output Format
+
+Structure your review using the following template with severity levels:
+
+### Severity Levels
+- **🔴 Blocker**: Critical issues that must be fixed before merge (security vulnerabilities, correctness bugs, data corruption risks, race conditions)
+- **🟠 Major**: Significant issues that should be fixed (performance problems, incorrect API usage, missing error handling, potential resource leaks)
+- **🟡 Minor**: Improvements that enhance code quality (code style violations, minor inefficiencies, better naming)
+- **ℹ️ Info**: Suggestions and observations (alternative approaches, nitpicks, educational comments)
+
+### Review Template
+
+```markdown
+## Summary
+[Brief overview of the PR and your assessment]
+
+## 🔴 Blockers
+[List critical issues that must be fixed before merge]
+
+**File**: `path/to/file.cpp` (Line X-Y)
+**Issue**: [Describe the issue]
+**Impact**: [Explain why this is critical]
+**Recommendation**: [Specific fix or approach]
+
+---
+
+## 🟠 Major Issues
+[List significant issues that should be addressed]
+
+**File**: `path/to/file.cpp` (Line X)
+**Issue**: [Describe the issue]
+**Impact**: [Explain the consequences]
+**Recommendation**: [Suggested fix]
+
+---
+
+## 🟡 Minor Issues
+[List code quality improvements]
+
+**File**: `path/to/file.cpp` (Line X)
+**Issue**: [Describe the issue]
+**Recommendation**: [Suggested improvement]
+
+---
+
+## ℹ️ Informational
+[List suggestions, observations, or alternative approaches]
+
+**File**: `path/to/file.cpp` (Line X)
+**Observation**: [Your observation]
+**Suggestion**: [Optional suggestion]
+
+---
+
+## ❓ Questions for Author
+[List questions where you need clarification to complete the review]
+
+1. **File**: `path/to/file.cpp` (Line X)
+   **Question**: [Your specific question]
+   **Context**: [Why you're asking]
+
+2. [Additional questions...]
+
+---
+
+## ✅ Positive Observations
+[Highlight good practices, clever solutions, or improvements]
+
+---
+
+## Verification Recommendations
+[Specific commands or steps to verify the changes]
+
+- [ ] Build: `cd build && cmake .. [options] && make -j$(nproc)`
+- [ ] Unit Tests: [specific test commands or reference to README/CI]
+- [ ] Integration Tests: [specific test commands]
+- [ ] Performance Benchmarks: [if applicable]
+- [ ] Pre-commit Checks: `pre-commit run --all-files`
+```
+
+## Verification Guidance
+
+### Build Verification
+Always recommend building the affected components to verify changes:
+```bash
+mkdir -p build && cd build
+cmake .. -DUSE_HTTP=ON -DUSE_ETCD=ON -DSTORE_USE_ETCD=ON -DENABLE_ASAN=ON -DCMAKE_BUILD_TYPE=Debug
+make -j$(nproc)
+```
+
+### Test Verification
+Recommend running relevant tests:
+```bash
+# Unit tests - consult component-specific test directories
+# Integration tests - see .github/workflows/integration-test.yml
+# Performance benchmarks - see benchmarks/ directory
+```
+
+If specific test commands are unknown, recommend: **"Consult README.md, CI workflows (.github/workflows/), or component-specific test documentation."**
+
+### Pre-commit Verification
+Always recommend running pre-commit hooks:
+```bash
+pre-commit run --all-files
+```
+
+## Mooncake-Specific Considerations
+
+### Transfer Engine Reviews
+- **Protocol Selection**: Verify appropriate protocol (TCP/RDMA/NVMe-of) for use case
+- **Multi-NIC Handling**: Check proper aggregation across multiple RDMA NICs
+- **Topology Awareness**: Ensure NUMA affinity is considered in device selection
+- **Error Recovery**: Verify retry logic and fallback paths for network errors
+
+### CUDA Code Reviews
+- **Memory Safety**: Check `cudaMalloc`/`cudaFree` pairing, no leaks
+- **Stream Management**: Verify CUDA streams are created, used, and destroyed properly
+- **Kernel Correctness**: Validate grid/block dimensions, shared memory usage
+- **Error Checking**: Ensure `cudaGetLastError()` or similar checks after kernel launches
+- **GPUDirect**: Verify correct use of GPUDirect RDMA APIs if applicable
+
+### Store Reviews (KVCache, P2P)
+- **Consistency**: Verify cache consistency guarantees are maintained
+- **Eviction Policies**: Check cache eviction logic is correct
+- **Distributed Coordination**: Ensure proper use of etcd or other coordination services
+- **Serialization**: Verify tensor serialization/deserialization is correct
+
+### Integration Reviews (vLLM, SGLang, etc.)
+- **API Compatibility**: Ensure changes don't break integration contracts
+- **Version Compatibility**: Check compatibility with supported versions
+- **Performance Impact**: Consider impact on end-to-end inference performance
+
+## What NOT to Do
+
+- ❌ **Don't nitpick style** if pre-commit hooks will catch it
+- ❌ **Don't request major refactors** unless there's a clear correctness/security/performance issue
+- ❌ **Don't make assumptions** about code behavior without verification
+- ❌ **Don't approve** if you have unanswered questions that affect correctness/security
+- ❌ **Don't block** on minor style preferences
+- ❌ **Don't suggest optimizations** without evidence they're needed (avoid premature optimization)
+
+## Final Checklist
+
+Before submitting your review, ensure:
+- [ ] All comments reference specific file paths and line numbers
+- [ ] Severity levels are assigned appropriately
+- [ ] Questions are clearly stated and justified
+- [ ] Recommendations are specific and actionable
+- [ ] Positive observations are included where applicable
+- [ ] Verification steps are provided
+- [ ] No assumptions are made where information is missing
+
+---
+
+**Remember**: Your goal is to help maintain high code quality while being respectful, constructive, and avoiding blocking on trivial matters. Focus on correctness, security, and performance in production-critical infrastructure.