Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ on:
branches:
- main
- master
- 1.4.x
paths:
- "docs/**"
- ".github/workflows/docs.yml"
Expand Down
9 changes: 8 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -63,4 +63,11 @@ attachments

# mvn flatten
.flattened-pom.xml
.dump
.dump

# github pages
docs/.bundle
docs/.jekyll-cache
docs/_site
docs/vendor
docs/Gemfile.lock
9 changes: 3 additions & 6 deletions docs/Gemfile
Original file line number Diff line number Diff line change
@@ -1,10 +1,7 @@
source "https://rubygems.org"

gem "jekyll", "~> 4.3"
gem "just-the-docs", "~> 0.10"
gem "sass-embedded", "~> 1.69" # 锁定兼容版本,避免 1.100.0 原生扩展编译问题
gem "just-the-docs"

group :jekyll_plugins do
# 如需添加插件,在此处声明
end

gem "jekyll-seo-tag"
end
12 changes: 11 additions & 1 deletion docs/_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,13 @@ baseurl: "/evalkit-framework"
url: "https://zendodx.github.io"

# just-the-docs 主题
remote_theme: just-the-docs/just-the-docs@v0.10.0
theme: just-the-docs

# 导航:默认展开所有子菜单,不折叠
nav_fold: false

# 页内目录(右侧 TOC):自动为 h2/h3 标题生成锚点
heading_anchors: true

# 全文搜索
search_enabled: true
Expand Down Expand Up @@ -58,6 +64,10 @@ kramdown:
block:
line_numbers: false

# Jekyll 插件
plugins:
- jekyll-seo-tag

# 排除不需要渲染的目录
exclude:
- "*.gemspec"
Expand Down
2 changes: 1 addition & 1 deletion docs/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ nav_order: 4

# ChangeLog

## [1.3.1] - 2026-06-02
## [1.4.0] - 2026-06-04

### Added

Expand Down
1 change: 1 addition & 0 deletions docs/dev-guide/delta-eval.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ layout: default
title: 增量评测技术说明
parent: 开发指南
nav_order: 1
has_toc: true
---

# 增量评测技术说明
Expand Down
57 changes: 57 additions & 0 deletions docs/dev-guide/github-pages.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
Github Pages使用方法

## 配置

(1) 在docs目录下创建Gemfile文件

```text
source "https://rubygems.org"

gem "just-the-docs"

group :jekyll_plugins do
gem "jekyll-seo-tag"
end
```

## 本地调试流程

(1)进入docs目录
```shell
cd docs
```

(2)安装依赖
```shell
bundle install --path vendor/bundle
```

如果一直卡在安装步骤,需要配置国内镜像源

```text
# 使用腾讯云镜像
bundle config mirror.https://rubygems.org https://gems.ruby-china.com

# 或者直接修改 Gemfile,第一行改为:
source "https://gems.ruby-china.com"
```

(3) 执行预览

```shell
bundle exec jekyll serve --livereload
```

LiveReload address: http://127.0.0.1:35729

Server address: http://127.0.0.1:4000/evalkit-framework/

## Github Actions执行流程

(1) 增加文档配置文件

```text
.github/workflow/docs.yml
```


10 changes: 5 additions & 5 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ nav_order: 1

> 基于 Java 的自动化 AI 评测框架,以 DAG 工作流为核心,快速完成对大模型或 AI 业务接口的自动化评测。

[快速开始](./user-guide/overview){: .btn .btn-primary .fs-5 .mb-4 .mb-md-0 .mr-2 }
[快速开始]({{ site.baseurl }}/user-guide/overview){: .btn .btn-primary .fs-5 .mb-4 .mb-md-0 .mr-2 }
[查看 GitHub](https://github.com/zendodx/evalkit-framework){: .btn .fs-5 .mb-4 .mb-md-0 }

---
Expand All @@ -25,8 +25,8 @@ nav_order: 1

## 快速导航

- 📖 [用户指南](./user-guide/) — 框架各模块完整使用说明
- 🛠️ [开发指南](./dev-guide/) — 框架内部设计与扩展开发
- 📋 [ChangeLog](./changelog) — 版本更新记录
- 🤝 [贡献指南](./contribute) — 提交规范与贡献流程
- 📖 [用户指南]({{ site.baseurl }}/user-guide/) — 框架各模块完整使用说明
- 🛠️ [开发指南]({{ site.baseurl }}/dev-guide/) — 框架内部设计与扩展开发
- 📋 [ChangeLog]({{ site.baseurl }}/changelog) — 版本更新记录
- 🤝 [贡献指南]({{ site.baseurl }}/contribute) — 提交规范与贡献流程

6 changes: 1 addition & 5 deletions docs/user-guide/api-completion-wrapper.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,13 @@ layout: default
title: 接口结果装饰器(ApiCompletionWrapper)
parent: 用户指南
nav_order: 9
has_toc: true
---

# 接口结果装饰器(ApiCompletionWrapper)

接口结果装饰器在**接口调用完成后、评估器执行前**运行,用于对 `ApiCompletionResult` 进行转化、清洗或补充,将接口原始输出变换为评估器所需的格式。

---

## 体系结构

Expand All @@ -18,7 +18,6 @@ ApiCompletionWrapper(抽象基类)
└── LLMBasedApiCompletionWrapper(抽象) 使用大模型转化接口输出
```

---

## ApiCompletionWrapper(基类)

Expand Down Expand Up @@ -73,7 +72,6 @@ protected void wrapper(DataItem dataItem) {
}
```

---

## LLMBasedApiCompletionWrapper

Expand Down Expand Up @@ -121,7 +119,6 @@ LLMBasedApiCompletionWrapper wrapper = new LLMBasedApiCompletionWrapper(
};
```

---

## 在工作流中使用

Expand All @@ -142,7 +139,6 @@ new WorkflowBuilder()
.execute();
```

---

## 注意事项

Expand Down
6 changes: 1 addition & 5 deletions docs/user-guide/api-completion.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,13 @@ layout: default
title: 接口调用器(ApiCompletion)
parent: 用户指南
nav_order: 8
has_toc: true
---

# 接口调用器(ApiCompletion)

接口调用器负责对每条测试数据**调用被测的业务接口**,并将返回结果存入 `DataItem` 的 `apiCompletionResult` 字段,供后续评估器使用。

---

## 体系结构

Expand All @@ -19,7 +19,6 @@ ApiCompletion(抽象基类)
└── OrderedApiCompletion(抽象) 保证同一会话内的请求按序执行
```

---

## ApiCompletion(基类)

Expand Down Expand Up @@ -107,7 +106,6 @@ public ScorerResult eval(DataItem dataItem) {
}
```

---

## HttpApiCompletion

Expand Down Expand Up @@ -181,7 +179,6 @@ HttpApiCompletion httpApiCompletion = new HttpApiCompletion(
};
```

---

## OrderedApiCompletion

Expand Down Expand Up @@ -244,7 +241,6 @@ OrderedApiCompletion orderedApiCompletion = new OrderedApiCompletion(
| 并发方式 | 所有数据完全并行 | 不同会话并行,同会话串行 |
| 需实现方法 | `invoke()` | `invoke()` + `prepareOrderKey()` + `prepareComparator()` |

---

## 注意事项

Expand Down
9 changes: 1 addition & 8 deletions docs/user-guide/checker.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,13 @@ layout: default
title: 检查器(Checker)
parent: 用户指南
nav_order: 11
has_toc: true
---

# 检查器(Checker)

检查器是 `MultiCheckerBasedScorer` 的内部组件,负责对评测数据进行**单个维度的细粒度检查**,每个检查器包含若干个**检查项(CheckItem)**,最终分数由所有检查项的结果汇总而来。

---

## 核心概念

Expand All @@ -28,7 +28,6 @@ MultiCheckerBasedScorer
- `Checker`(检查器)包含多个 `CheckItem`(检查项)
- `CheckItem` 是最小的打分单元

---

## CheckItem(检查项)

Expand Down Expand Up @@ -91,7 +90,6 @@ CheckItem llmItem = CheckItem.builder()
.build();
```

---

## Checker 接口

Expand All @@ -108,7 +106,6 @@ public interface Checker {
}
```

---

## AbstractChecker(抽象基类)

Expand All @@ -133,7 +130,6 @@ public interface Checker {
| `MinCheckItemScoreMergeStrategy` | 取最小检查项得分 |
| 自定义 | 实现 `CheckItemScoreMergeStrategy` 接口 |

---

## 示例:规则检查器

Expand Down Expand Up @@ -211,7 +207,6 @@ public class RequiredFieldsChecker extends AbstractChecker {
}
```

---

## 示例:LLM 检查器

Expand Down Expand Up @@ -279,7 +274,6 @@ public class QualityChecker extends LLMBasedChecker {
}
```

---

## 多轮对话检查(LLMBasedChecker 高级用法)

Expand Down Expand Up @@ -316,7 +310,6 @@ public class MultiTurnChecker extends LLMBasedChecker {
}
```

---

## 完整示例:组合多个检查器

Expand Down
8 changes: 1 addition & 7 deletions docs/user-guide/counter.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,13 @@ layout: default
title: 统计器(Counter)
parent: 用户指南
nav_order: 12
has_toc: true
---

# 统计器(Counter)

统计器在所有评估器执行完毕后运行,负责对所有数据的评测结果进行**汇总统计**,并将统计结果存入上下文,供 Reporter 使用。

---

## 体系结构

Expand All @@ -21,7 +21,6 @@ Counter(抽象基类)
└── AttributeCounterV2 LLM归因分析 V2(带类别、情感极性、置信度)
```

---

## BasicCounter

Expand Down Expand Up @@ -72,7 +71,6 @@ BasicCounter counter = new BasicCounter();

就这一行!无需任何配置,把它放在所有 Scorer 之后即可。

---

## MetricCounter

Expand Down Expand Up @@ -111,7 +109,6 @@ MetricCounter metricCounter = new MetricCounter() {
};
```

---

## AttributeCounter(归因分析 V1)

Expand Down Expand Up @@ -152,7 +149,6 @@ AttributeCounter attributeCounter = new AttributeCounter(myLLMService);
}
```

---

## AttributeCounterV2(归因分析 V2)

Expand Down Expand Up @@ -208,7 +204,6 @@ AttributeCounterV2 attributeCounterV2 = new AttributeCounterV2(myLLMService);
}
```

---

## 自定义统计器

Expand Down Expand Up @@ -238,7 +233,6 @@ Counter customCounter = new Counter() {
};
```

---

## 典型用法:组合多个统计器

Expand Down
Loading
Loading