Skip to content

docs(eval): add reproducible memory evaluation guide#204

Open
RerankerGuo wants to merge 1 commit into
TencentCloud:mainfrom
RerankerGuo:docs/reproducible-memory-evals
Open

docs(eval): add reproducible memory evaluation guide#204
RerankerGuo wants to merge 1 commit into
TencentCloud:mainfrom
RerankerGuo:docs/reproducible-memory-evals

Conversation

@RerankerGuo

Copy link
Copy Markdown

Description | 描述

Add a lightweight reproducible memory evaluation guide for benchmark runs and future eval integrations.

The guide is framework-neutral and documents:

  • required run metadata such as commit, host, model, dataset split, session layout, and resolved config
  • short-term, long-term, layer stress, and temporal drift evaluation designs
  • L0/L1/L2/L3 layer audit checkpoints
  • recommended metrics for accuracy, token use, recall quality, budget use, and latency
  • a result template that contributors can reuse when reporting benchmark reproductions

This intentionally avoids adding a full benchmark runner or new runtime dependency.

Related Issue | 关联 Issue

Related to #106

Change Type | 修改类型

  • Bug fix | Bug 修复
  • New feature | 新功能
  • Documentation update | 文档更新
  • Code optimization | 代码优化

Self-test Checklist | 自测清单

  • Verified locally | 本地验证通过
  • No existing features affected | 无影响现有功能

Additional Notes | 其他说明

Verified with npm test and npm run build using Node v24.15.0.

Signed-off-by: Ziyang Guo <121015044+RerankerGuo@users.noreply.github.com>
@Maxwell-Code07

Copy link
Copy Markdown
Collaborator

A lightweight evaluation guide is a welcome addition for benchmark reproducibility. The framework-neutral design makes it easy for contributors to report results consistently. We'll take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants