Skip to content

Feat/cost web#35

Merged
kooksee merged 26 commits intov2from
feat/cost-web
Mar 24, 2026
Merged

Feat/cost web#35
kooksee merged 26 commits intov2from
feat/cost-web

Conversation

@kooksee
Copy link
Copy Markdown
Collaborator

@kooksee kooksee commented Mar 24, 2026

Summary by CodeRabbit

  • New Features

    • Added context-aware dependency injection APIs
    • Introduced provider execution timeout and slow provider threshold configuration options
    • Added four new HTTP diagnostic endpoints: runtime statistics, recent errors, diagnostic file queries, and trace event timeline
    • Implemented error tracking with recent error history
    • Added in-memory and file-based trace event logging with querying capabilities
  • Documentation

    • Added configuration guidance for provider timeouts and slow provider warnings
    • Documented trace logging and diagnostic collection via environment variables

kooksee and others added 22 commits March 19, 2026 15:13
- Introduced ProviderRuntimeStats struct to capture metrics such as call count, total duration, average duration, last duration, last error, and last run timestamp.
- Enhanced GetProviderDetails to include function file and line number for better debugging.
- Implemented GetProviderRuntimeStats method to return sorted runtime statistics for all providers.
- Added recordProviderStat method to track provider execution metrics.
- Updated the Dix struct to maintain provider statistics.
- Modified error handling to optionally print stack traces based on log level.
- Added tests for runtime statistics retrieval and validation of provider initialization.
- Created new server runtime test file to validate HTTP endpoint for runtime stats.
… logging

- Implemented a new tracing system in `dixtrace` package to manage spans and events.
- Added `RunDetachedSpan` and `BeginSpan` functions for span creation and management.
- Introduced `MemorySink` and `FileSink` for event storage and querying.
- Created tests for tracing functionality, ensuring correct span relationships and event logging.
- Updated HTTP server logs to include new trace endpoint.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 24, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7825b9d9-4621-491e-aff3-37e02e0bf5a6

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/cost-web

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the diagnostic and observability capabilities of the dix dependency injection framework. It introduces mechanisms to monitor provider execution times, detect slow or timed-out providers, and collect detailed trace and error information. These diagnostics are accessible both through console logging, a new JSONL diagnostic file, and a comprehensive HTTP API, making it easier to troubleshoot and understand dependency resolution flows during application startup and runtime. The changes aim to provide developers with deeper insights into the DI container's behavior, improving debugging and performance analysis.

Highlights

  • Enhanced Diagnostic Capabilities: Introduced new diagnostic features including provider timeout and slow provider warnings, detailed DI trace logging, diagnostic file collection, and in-memory trace querying for deeper insights into dependency resolution.
  • New HTTP API Endpoints: The dixhttp module now exposes new HTTP API endpoints (/api/runtime-stats, /api/errors, /api/diagnostics, /api/trace) to access these diagnostic data, facilitating real-time monitoring and troubleshooting.
  • Context-Aware Injection: Added InjectContext and InjectTContext methods to allow dependency injection operations to propagate context.Context, enabling better integration with tracing and request-scoped concerns.
  • Improved Error Reporting: Enhanced internal error handling and reporting to provide more granular information, including specific error types, root causes, and actionable hints, which are also exposed via the new diagnostic APIs.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces comprehensive diagnostic and tracing capabilities to the dependency injection framework. Key changes include adding provider timeout and slow provider warning options, implementing context-aware injection methods, and introducing a new dixtrace package for in-memory event tracing. It also adds support for collecting detailed diagnostic records to a file (DIX_DIAG_FILE) and exposes new HTTP API endpoints in dixhttp for querying runtime statistics, recent errors, diagnostic files, and trace timelines. The documentation has been updated to reflect these new features, and the InjectContext function was corrected to prevent silent failures when injecting into struct values by requiring a pointer.

@kooksee
Copy link
Copy Markdown
Collaborator Author

kooksee commented Mar 24, 2026

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive set of features for diagnostics, tracing, and observability. This includes provider timeouts, slow provider warnings, detailed error recording, and a new dixtrace package for lightweight tracing. The dixhttp web UI is enhanced with new endpoints to visualize this data. The changes are extensive and well-tested. I have a few suggestions regarding code consistency, particularly with hardcoded language strings and file handling behavior, to improve maintainability and user experience.

Comment on lines +616 to +643
func buildErrorHint(operation, stage string, timedOut bool) string {
if timedOut {
return "provider 初始化超时:可先优化初始化逻辑,或临时调大 WithProviderTimeout;排查时可用 /api/errors 与 /api/runtime-stats 联动定位"
}

switch operation {
case "provide", "try_provide":
return "检查 provider 签名(必须是函数且返回值数量合法);若不希望中断启动,请优先用 TryProvide 并继续启动 Web 诊断页"
case "inject", "try_inject":
switch stage {
case "cycle_check":
return "检测到循环依赖:请拆分相互引用的组件,或改为接口/延迟注入;可用 /api/dependencies 观察依赖环"
default:
return "注入失败:先确认依赖是否已注册、类型是否一致;生产路径建议改用 TryInject 避免进程直接退出"
}
case "provider_execute":
switch stage {
case "resolve_input":
return "provider 输入依赖未解析:检查对应输入类型是否有 provider、命名空间(map/list)是否匹配、是否遗漏导入"
case "return_error":
return "provider 主动返回 error:优先检查外部资源可用性(DB/Redis/HTTP)与配置参数;必要时在 provider 内增加上下文日志"
default:
return "provider 执行失败:查看 provider_function、output_type 与 input_types 定位具体构造链路"
}
default:
return "可先查看 /api/errors 获取最近错误详情,并结合 /api/runtime-stats 定位慢/失败 provider"
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error hints in buildErrorHint are hardcoded in Chinese. While helpful, this is inconsistent with the rest of the library's code and documentation, which is in English. This could pose a challenge for non-Chinese speaking users and for future maintenance. It would be better to either provide these hints in English or implement a simple internationalization (i18n) mechanism.

logStartupScenarioResult(di, "provider_timeout", err, before)
}

// 循环依赖演示使用临时容器,避免污染主容器导致后续所有注入失败。
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This comment is in Chinese, while the rest of the code and comments in this file are in English. For consistency and to make the example accessible to a wider audience, it would be better to write comments in English.

if dir != "." && dir != "" {
_ = os.MkdirAll(dir, 0o755)
}
fd, err := os.OpenFile(f.path, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, 0o644)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The FileSink for DIX_TRACE_FILE truncates the file on creation (O_TRUNC), while the sink for DIX_DIAG_FILE (in dixinternal/diag_file.go) appends to its file. This inconsistency can be surprising and lead to data loss if a user expects trace files to be appended across application restarts. It would be better to use O_APPEND for consistency with DIX_DIAG_FILE, or clearly document this truncating behavior.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
dixhttp/README.md (1)

405-414: ⚠️ Potential issue | 🟡 Minor

Markdown syntax issue: unclosed code block.

The JSON code block for /api/type/{typeName}?depth=2 (starting around line 395) is not properly closed before the next ### GET /api/group-rules section. There's also an extra closing triple-backtick at line 414 that appears orphaned.

📝 Suggested fix
   ]
 }
+```
 
 ### GET `/api/group-rules`
 Returns backend-registered group rules (used as UI defaults)
 
 ```json
 [
   {"name": "service", "prefixes": ["github.com/acme/app/service"]}
 ]

-```

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @dixhttp/README.md around lines 405 - 414, The README has an unclosed JSON
code block for the GET /api/type/{typeName}?depth=2 section and an orphaned
triple-backtick before the ### GET /api/group-rules heading; close the earlier
JSON block properly (add the missing json block terminator) and remove the extra standalone that follows the ### GET /api/group-rules section so both
code blocks are correctly delimited and the headings render properly.


</details>

</blockquote></details>
<details>
<summary>dixhttp/README_zh.md (1)</summary><blockquote>

`386-395`: _⚠️ Potential issue_ | _🟡 Minor_

**Markdown syntax issue: unclosed code block (same as English README).**

The JSON code block for `/api/type/{typeName}?depth=2` is not properly closed before the `/api/group-rules` section, and there's an orphaned closing triple-backtick at line 395.


<details>
<summary>📝 Suggested fix</summary>

```diff
   ]
 }
+```
 
 ### GET `/api/group-rules`
 返回后端注册的分组清单(前端默认配置)
 
 ```json
 [
   {"name": "service", "prefixes": ["github.com/acme/app/service"]}
 ]
 ```
-```
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@dixhttp/README_zh.md` around lines 386 - 395, There is an unclosed code block
in README_zh.md: close the JSON fenced code block introduced in the
`/api/type/{typeName}?depth=2` section so it does not bleed into the subsequent
`GET /api/group-rules` section; locate the code fences around the JSON example
(the triple-backtick starting the JSON block) and add the matching closing
triple-backtick immediately after the JSON example, ensuring the `GET
/api/group-rules` heading and its JSON example are outside that block.
```

</details>

</blockquote></details>
<details>
<summary>dixinternal/provider.go (1)</summary><blockquote>

`52-70`: _⚠️ Potential issue_ | _🟡 Minor_

**Consider bilingual or English-only hint message.**

The `hint` field on line 69 is in Chinese only. For international users, this may not be helpful. Consider providing an English version or bilingual message.


<details>
<summary>📝 Suggested fix</summary>

```diff
-				"hint", "provider 内部发生 panic:建议在 provider 内捕获异常并返回 error;可临时开启 debug 日志查看堆栈")
+				"hint", "provider panicked: catch exceptions inside provider and return error; enable debug logging for stack trace")
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@dixinternal/provider.go` around lines 52 - 70, The panic handler's
logger.Error call currently uses a Chinese-only hint string; change it to a
bilingual (Chinese + English) or English-only message and wire that into the
existing logger call: update the hint value in the defer recover block (the
anonymous func that calls maybePrintStack and sets err), e.g., replace the
Chinese-only literal passed as the "hint" field with a new variable or literal
containing both languages (or English), ensuring you preserve surrounding fields
(error, error_type from buildErrorType, fn_name via GetFnName(n.fn), fn_type,
input_data, etc.) so the log entry remains structured and only the hint text is
internationalized.
```

</details>

</blockquote></details>

</blockquote></details>
🧹 Nitpick comments (4)
dixinternal/option.go (1)

34-45: Clarification: Merge vs direct option application.

The Merge method's > 0 guard is intentional - it prevents merging zero values from overwriting existing non-zero values. However, since newDix applies options directly via the Option function type (not through Merge), WithProviderTimeout(0) correctly disables timeout by directly setting opts.ProviderTimeout = 0 after defaults are applied.

This design is correct but subtle. Consider adding a comment to Merge explaining when it's used vs direct option application.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dixinternal/option.go` around lines 34 - 45, The Merge method on Options uses
>0 guards to avoid overwriting existing non-zero values, which is intentional
and different from the direct Option application used by constructors like
newDix where Option functions (e.g., WithProviderTimeout) set fields directly
(so WithProviderTimeout(0) can disable timeouts after defaults are applied); add
a concise comment above func (o Options) Merge(opt Options) explaining this
distinction: state that Merge preserves existing non-zero values and is used for
merging two Options structs, whereas Option functions passed to newDix apply
directly and may intentionally set zero values, with references to Merge,
Options, newDix, and WithProviderTimeout for clarity.
dixtrace/trace_test.go (1)

144-160: Consider verifying truncated content is valid JSONL.

The test confirms the seed content is removed but doesn't verify the written event is valid JSON. This could miss encoding issues in FileSink.

💡 Optional: Add content format verification
 	if strings.Contains(string(b), "seed\n") {
 		t.Fatalf("expected existing content to be truncated, got: %q", string(b))
 	}
+	if !strings.Contains(string(b), `"event":"span.start"`) {
+		t.Fatalf("expected valid JSON event, got: %q", string(b))
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dixtrace/trace_test.go` around lines 144 - 160, Update
TestFileSinkAlwaysTruncatesExistingFile to not only assert the seed content was
removed but also validate that the new file contains valid JSONL for the written
event: after reading the file (using the existing os.ReadFile result), split
into lines and unmarshal the first non-empty line into the Event struct (from
the diff) and assert fields like ID==1, OccurredAt==1 and Event=="span.start";
this ensures NewFileSink/FileSink output is valid JSONL and guards against
encoding issues.
dixinternal/diag_file.go (2)

29-58: Consider consolidating duplicate struct definitions.

diagFileRecord and DiagFileRecord are identical except for visibility. Consider using a single exported type or embedding to reduce duplication.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dixinternal/diag_file.go` around lines 29 - 58, The two structs
diagFileRecord and DiagFileRecord are identical; remove the duplicate by keeping
a single canonical definition (prefer keeping the exported DiagFileRecord) and
replace the other with a type alias (e.g., add "type diagFileRecord =
DiagFileRecord") so callers within the package can still use the unexported
name; update any references to the removed struct name to use the alias or the
exported type and ensure JSON tags remain unchanged.

345-425: Memory concern: Entire file loaded before filtering.

ReadDiagFileRecords reads all matching records into memory before sorting and paginating. For large diagnostic files, this could cause memory pressure. Consider streaming/early-termination optimizations for production use.

💡 Consider adding a hard cap or streaming approach

For now, you could add a pre-filter cap to limit memory usage:

 	lineNo := 0
 	all := make([]DiagFileRecord, 0, 256)
+	const maxRecordsToLoad = 50000 // Hard cap to prevent OOM
 	for scanner.Scan() {
+		if len(all) >= maxRecordsToLoad {
+			break
+		}
 		lineNo++

This is a tradeoff - it may miss older records, but prevents unbounded memory growth.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dixinternal/diag_file.go` around lines 345 - 425, The current implementation
accumulates all matching records in the slice all before sorting/paginating
which can OOM on large files; change ReadDiagFileRecords to stream and keep only
up to limit matching records using a min-heap (priority queue) keyed by
OccurredAt (and RecordID as tiebreaker) so you maintain the top-N most recent
records without storing everything. Concretely: in the scanner loop increment a
match counter for result.Total, push each matching rec into a fixed-size
min-heap (use a small struct comparator based on rec.OccurredAt then RecordID),
and if heap size > limit pop the smallest; after scanning drain heap into a
slice, sort that slice descending (same comparator used today), set
result.Records, result.Returned, and result.NextBefore from the last returned
RecordID; keep error handling and normalizeDiagRecordID/diagRecordMatchesQuery
calls as-is and use defaultDiagReadLimit(query.Limit) for heap capacity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dixinternal/api.go`:
- Around line 255-305: GetProviderRuntimeStats (and similarly GetRecentErrors)
reads dix.providerStats and dix.recentErrors concurrently while
recordProviderStat and recordRecentErrorWithContext mutate them; snapshot these
structures under a Dix RWMutex to avoid concurrent map/slice panics and torn
snapshots: add an RWMutex field to Dix (e.g., statsMu), acquire a read lock in
GetProviderRuntimeStats, copy dix.providerStats (map) and any slices you iterate
(and recentErrors in GetRecentErrors) into local variables while holding the
lock, release the lock and then build/sort the ProviderRuntimeStats from the
local copies; ensure recordProviderStat and recordRecentErrorWithContext use the
same mutex for writes (write lock) when mutating providerStats/recentErrors.

In `@dixinternal/dix.go`:
- Around line 412-423: The provider runtime stat is using the expanded
registration type (p.output.typ) which can differ from the function's declared
return type; update the stat creation in the block that sets
dix.providerStats[p.fn] to use the function's declared return type obtained from
p.fn.Type().Out(0).String() for OutputType instead of p.output.typ.String(),
keeping other fields (FunctionName via GetFnName(p.fn) and map insertion)
unchanged so providerRuntimeStat reflects the original function signature.

In `@dixinternal/provider.go`:
- Around line 76-97: The timeout implementation in providerFn.callWithTimeout
currently leaves the goroutine running after the caller unblocks, so document
this behavior and recommend cooperative cancellation: update the
WithProviderTimeout option's documentation (and any public docs for provider
timeouts) to state that timeouts only unblock the caller and do not cancel or
stop the provider goroutine, that side effects and resource usage may continue,
and advise implementers to accept a context or implement their own
cancellation-aware logic; also add a note suggesting a future enhancement to add
a context parameter to provider functions for cooperative cancellation
(reference providerFn.callWithTimeout and the WithProviderTimeout option in your
docs).

In `@dixtrace/trace.go`:
- Around line 331-337: The directory and file creation in the trace persistence
code use world-readable perms; update the mkdir and file permissions to be
private and enforce them after creation: when creating the dir for f.path, use
0700 instead of 0755 (MkdirAll), open the file with flags as before but use 0600
instead of 0644, and immediately call Chmod(0o600) on the returned fd (or the
created path) to ensure permissions apply regardless of umask; apply the same
change pattern used around f.path/OpenFile/Chmod in dixtrace/trace.go and mirror
it in dixinternal/diag_file.go for the Event persistence logic.

---

Outside diff comments:
In `@dixhttp/README_zh.md`:
- Around line 386-395: There is an unclosed code block in README_zh.md: close
the JSON fenced code block introduced in the `/api/type/{typeName}?depth=2`
section so it does not bleed into the subsequent `GET /api/group-rules` section;
locate the code fences around the JSON example (the triple-backtick starting the
JSON block) and add the matching closing triple-backtick immediately after the
JSON example, ensuring the `GET /api/group-rules` heading and its JSON example
are outside that block.

In `@dixhttp/README.md`:
- Around line 405-414: The README has an unclosed JSON code block for the GET
`/api/type/{typeName}?depth=2` section and an orphaned triple-backtick before
the `### GET /api/group-rules` heading; close the earlier JSON block properly
(add the missing ```json block terminator) and remove the extra standalone ```
that follows the `### GET /api/group-rules` section so both code blocks are
correctly delimited and the headings render properly.

In `@dixinternal/provider.go`:
- Around line 52-70: The panic handler's logger.Error call currently uses a
Chinese-only hint string; change it to a bilingual (Chinese + English) or
English-only message and wire that into the existing logger call: update the
hint value in the defer recover block (the anonymous func that calls
maybePrintStack and sets err), e.g., replace the Chinese-only literal passed as
the "hint" field with a new variable or literal containing both languages (or
English), ensuring you preserve surrounding fields (error, error_type from
buildErrorType, fn_name via GetFnName(n.fn), fn_type, input_data, etc.) so the
log entry remains structured and only the hint text is internationalized.

---

Nitpick comments:
In `@dixinternal/diag_file.go`:
- Around line 29-58: The two structs diagFileRecord and DiagFileRecord are
identical; remove the duplicate by keeping a single canonical definition (prefer
keeping the exported DiagFileRecord) and replace the other with a type alias
(e.g., add "type diagFileRecord = DiagFileRecord") so callers within the package
can still use the unexported name; update any references to the removed struct
name to use the alias or the exported type and ensure JSON tags remain
unchanged.
- Around line 345-425: The current implementation accumulates all matching
records in the slice all before sorting/paginating which can OOM on large files;
change ReadDiagFileRecords to stream and keep only up to limit matching records
using a min-heap (priority queue) keyed by OccurredAt (and RecordID as
tiebreaker) so you maintain the top-N most recent records without storing
everything. Concretely: in the scanner loop increment a match counter for
result.Total, push each matching rec into a fixed-size min-heap (use a small
struct comparator based on rec.OccurredAt then RecordID), and if heap size >
limit pop the smallest; after scanning drain heap into a slice, sort that slice
descending (same comparator used today), set result.Records, result.Returned,
and result.NextBefore from the last returned RecordID; keep error handling and
normalizeDiagRecordID/diagRecordMatchesQuery calls as-is and use
defaultDiagReadLimit(query.Limit) for heap capacity.

In `@dixinternal/option.go`:
- Around line 34-45: The Merge method on Options uses >0 guards to avoid
overwriting existing non-zero values, which is intentional and different from
the direct Option application used by constructors like newDix where Option
functions (e.g., WithProviderTimeout) set fields directly (so
WithProviderTimeout(0) can disable timeouts after defaults are applied); add a
concise comment above func (o Options) Merge(opt Options) explaining this
distinction: state that Merge preserves existing non-zero values and is used for
merging two Options structs, whereas Option functions passed to newDix apply
directly and may intentionally set zero values, with references to Merge,
Options, newDix, and WithProviderTimeout for clarity.

In `@dixtrace/trace_test.go`:
- Around line 144-160: Update TestFileSinkAlwaysTruncatesExistingFile to not
only assert the seed content was removed but also validate that the new file
contains valid JSONL for the written event: after reading the file (using the
existing os.ReadFile result), split into lines and unmarshal the first non-empty
line into the Event struct (from the diff) and assert fields like ID==1,
OccurredAt==1 and Event=="span.start"; this ensures NewFileSink/FileSink output
is valid JSONL and guards against encoding issues.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 091174f1-f87a-40fb-bc02-08a187a18249

📥 Commits

Reviewing files that changed from the base of the PR and between cbd40cc and a425689.

📒 Files selected for processing (23)
  • .gitignore
  • README.md
  • README_zh.md
  • dix.go
  • dixhttp/README.md
  • dixhttp/README_zh.md
  • dixhttp/server.go
  • dixhttp/server_runtime_test.go
  • dixhttp/template.html
  • dixinternal/api.go
  • dixinternal/diag_file.go
  • dixinternal/diag_file_query_test.go
  • dixinternal/dix.go
  • dixinternal/dix_test.go
  • dixinternal/logger.go
  • dixinternal/logger_test.go
  • dixinternal/option.go
  • dixinternal/provider.go
  • dixinternal/trace_chain_test.go
  • dixinternal/util.go
  • dixtrace/trace.go
  • dixtrace/trace_test.go
  • example/http/main.go

Comment on lines +255 to +305
func (dix *Dix) GetProviderRuntimeStats() []ProviderRuntimeStats {
stats := make([]ProviderRuntimeStats, 0, len(dix.providers))
seen := make(map[reflect.Value]bool)

for _, providerList := range dix.providers {
for _, p := range providerList {
if p == nil || seen[p.fn] {
continue
}
seen[p.fn] = true

outputType := ""
if p.output != nil && p.output.typ != nil {
outputType = p.output.typ.String()
}

item := ProviderRuntimeStats{
FunctionName: GetFnName(p.fn),
OutputType: outputType,
}

if s, ok := dix.providerStats[p.fn]; ok && s != nil {
avg := time.Duration(0)
if s.CallCount > 0 {
avg = s.TotalDuration / time.Duration(s.CallCount)
}
item.FunctionName = s.FunctionName
if s.OutputType != "" {
item.OutputType = s.OutputType
}
item.CallCount = s.CallCount
item.TotalDuration = s.TotalDuration
item.AverageDuration = avg
item.LastDuration = s.LastDuration
item.LastError = s.LastError
item.LastRunAtUnixNano = s.LastRunAt.UnixNano()
}

stats = append(stats, item)
}
}

sort.Slice(stats, func(i, j int) bool {
if stats[i].TotalDuration == stats[j].TotalDuration {
return stats[i].FunctionName < stats[j].FunctionName
}
return stats[i].TotalDuration > stats[j].TotalDuration
})

return stats
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the Dix struct definition and understand its current fields
fd "dix.go" dixinternal/ --exec head -100 {}

Repository: pubgo/dix

Length of output: 2288


🏁 Script executed:

# Search for the Dix struct definition to see if there's already a mutex
rg "type Dix struct" -A 30 dixinternal/

Repository: pubgo/dix

Length of output: 1403


🏁 Script executed:

# Look at the second getter around lines 309-342
sed -n '309,342p' dixinternal/api.go

Repository: pubgo/dix

Length of output: 1001


🏁 Script executed:

# Search for recordProviderStat and recordRecentErrorWithContext implementations
rg "func.*recordProviderStat|func.*recordRecentErrorWithContext" -A 10 dixinternal/

Repository: pubgo/dix

Length of output: 995


🏁 Script executed:

# Search for HTTP endpoint handlers for /api/runtime-stats and /api/errors
rg "runtime-stats|/api/errors" -r dixinternal/

Repository: pubgo/dix

Length of output: 1418


🏁 Script executed:

# Get the full implementation of recordProviderStat
rg "func \(dix \*Dix\) recordProviderStat" -A 30 dixinternal/

Repository: pubgo/dix

Length of output: 1368


🏁 Script executed:

# Get the full implementation of recordRecentErrorWithContext
rg "func \(dix \*Dix\) recordRecentErrorWithContext" -A 40 dixinternal/

Repository: pubgo/dix

Length of output: 1981


🏁 Script executed:

# Check the HTTP handlers
rg "HandleRuntimeStats|HandleErrors" -A 15 dixhttp/

Repository: pubgo/dix

Length of output: 5412


🏁 Script executed:

# Search for where recordProviderStat is called to see if there's any locking context
rg "recordProviderStat" dixinternal/ -B 3 -A 1

Repository: pubgo/dix

Length of output: 1398


Snapshot diagnostics under a lock before serving them.

GetProviderRuntimeStats() and GetRecentErrors() read from dix.providerStats and dix.recentErrors while recordProviderStat() and recordRecentErrorWithContext() mutate them on every provider execution and injection failure. With HTTP handlers calling these getters concurrently, concurrent map reads and slice appends can trigger Go's concurrent map/slice panic or return torn snapshots. Add an RWMutex on Dix and copy both structures under the read lock before building the response.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dixinternal/api.go` around lines 255 - 305, GetProviderRuntimeStats (and
similarly GetRecentErrors) reads dix.providerStats and dix.recentErrors
concurrently while recordProviderStat and recordRecentErrorWithContext mutate
them; snapshot these structures under a Dix RWMutex to avoid concurrent
map/slice panics and torn snapshots: add an RWMutex field to Dix (e.g.,
statsMu), acquire a read lock in GetProviderRuntimeStats, copy dix.providerStats
(map) and any slices you iterate (and recentErrors in GetRecentErrors) into
local variables while holding the lock, release the lock and then build/sort the
ProviderRuntimeStats from the local copies; ensure recordProviderStat and
recordRecentErrorWithContext use the same mutex for writes (write lock) when
mutating providerStats/recentErrors.

Comment on lines +412 to +423
stat, ok := dix.providerStats[p.fn]
if !ok {
outputType := ""
if p.output != nil && p.output.typ != nil {
outputType = p.output.typ.String()
}
stat = &providerRuntimeStat{
FunctionName: GetFnName(p.fn),
OutputType: outputType,
}
dix.providerStats[p.fn] = stat
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's find and examine the relevant files
find . -type f -name "*.go" | grep -E "(dix|provider)" | head -20

Repository: pubgo/dix

Length of output: 543


🏁 Script executed:

# Look at the structure of the dix.go file around the lines mentioned
wc -l dixinternal/dix.go

Repository: pubgo/dix

Length of output: 76


🏁 Script executed:

# Examine the provider-related code and registration logic
rg "type.*Provider" dixinternal/ -A 5 | head -100

Repository: pubgo/dix

Length of output: 1744


🏁 Script executed:

# Look for how p.output.typ is set and what it represents
rg "output\.typ" dixinternal/ -B 3 -A 3

Repository: pubgo/dix

Length of output: 3120


🏁 Script executed:

# Check the GetProviderRuntimeStats function mentioned in the comment
rg "GetProviderRuntimeStats" dixinternal/ -A 20

Repository: pubgo/dix

Length of output: 2884


🏁 Script executed:

# Let's examine the provider structure and understand what p.fn and p.output are
sed -n '1,150p' dixinternal/provider.go | head -100

Repository: pubgo/dix

Length of output: 2475


🏁 Script executed:

# Look at the full provider struct definition
rg "type.*provider.*struct" dixinternal/ -A 20

Repository: pubgo/dix

Length of output: 3663


🏁 Script executed:

# Examine how providers with slice/map returns are registered
sed -n '700,900p' dixinternal/dix.go

Repository: pubgo/dix

Length of output: 7270


🏁 Script executed:

# Check what GetFnName and related functions do
rg "func GetFnName" dixinternal/ -A 5

Repository: pubgo/dix

Length of output: 297


🏁 Script executed:

# Look for any struct-return provider expansion logic
rg "struct" dixinternal/dix.go | head -20

Repository: pubgo/dix

Length of output: 1233


🏁 Script executed:

# Examine the complete registration logic for different output types
sed -n '600,750p' dixinternal/dix.go

Repository: pubgo/dix

Length of output: 4038


🏁 Script executed:

# Let's look at the complete registration logic to see how providers are registered
sed -n '500,650p' dixinternal/dix.go

Repository: pubgo/dix

Length of output: 4293


🏁 Script executed:

# Check if there are tests for slice/map/struct provider registration
rg "func.*\[\].*{" dixinternal/dix_test.go -B 2 -A 10 | head -50

Repository: pubgo/dix

Length of output: 142


🏁 Script executed:

# Let's look at the struct-return provider handling
sed -n '650,800p' dixinternal/dix.go

Repository: pubgo/dix

Length of output: 4777


🏁 Script executed:

# Verify what p.fn.Type().Out(0) would return vs p.output.typ
rg -A 30 "case reflect.Slice:" dixinternal/dix.go | head -50

Repository: pubgo/dix

Length of output: 2266


🏁 Script executed:

# Look at the Provide function to understand provider registration completely
sed -n '250,450p' dixinternal/dix.go

Repository: pubgo/dix

Length of output: 6473


🏁 Script executed:

# Check the struct-return provider registration
rg "reflect.Struct" dixinternal/dix.go -B 5 -A 15

Repository: pubgo/dix

Length of output: 4606


🏁 Script executed:

# Look at the API code to see if it has similar issues
sed -n '1,100p' dixinternal/api.go

Repository: pubgo/dix

Length of output: 4233


🏁 Script executed:

# Check the complete GetProviderRuntimeStats function
rg "func.*GetProviderRuntimeStats" dixinternal/api.go -A 50

Repository: pubgo/dix

Length of output: 1378


🏁 Script executed:

# Look for tests related to runtime stats for aggregate providers
rg "GetProviderRuntimeStats\|outputType\|OutputType" dixinternal/dix_test.go -B 3 -A 3

Repository: pubgo/dix

Length of output: 35


🏁 Script executed:

# Verify that p.fn always has at least one output (to validate the safety check necessity)
rg "Provide.*func" dixinternal/dix.go -B 5 -A 15 | head -80

Repository: pubgo/dix

Length of output: 2626


🏁 Script executed:

# Check if there's any validation in handleProvide about output count
sed -n '300,350p' dixinternal/dix.go

Repository: pubgo/dix

Length of output: 2137


🏁 Script executed:

# Check if providers can have zero outputs (validate necessity of NumOut() check)
rg "NumOut.*0\|NumOut.*<" dixinternal/dix.go

Repository: pubgo/dix

Length of output: 35


🏁 Script executed:

# Verify that all registered providers have at least one output
sed -n '165,200p' dixinternal/dix.go

Repository: pubgo/dix

Length of output: 1239


🏁 Script executed:

# Let's check the complete Provide validation to understand if providers can have zero outputs
rg "func.*Provide\(" dixinternal/dix.go -A 30 | head -80

Repository: pubgo/dix

Length of output: 1452


🏁 Script executed:

# Check what the actual validation is for providers
sed -n '900,1000p' dixinternal/dix.go

Repository: pubgo/dix

Length of output: 4287


🏁 Script executed:

# Verify that the provide function validates output count
rg "provide.*func" dixinternal/dix.go -B 5 -A 30 | grep -A 30 "func.*provide" | head -50

Repository: pubgo/dix

Length of output: 1724


🏁 Script executed:

# Check the provide method to see if it validates NumOut
sed -n '1200,1300p' dixinternal/dix.go

Repository: pubgo/dix

Length of output: 4075


Use the provider's declared return type, not the expanded lookup type.

p.output.typ is the expanded type after registration (element type for slices/maps, field type for structs), not the declared return type. This causes runtime stats to report misleading output types. Use p.fn.Type().Out(0).String() to capture the actual function signature.

Minimal fix for the recorded stat
 	if !ok {
 		outputType := ""
-		if p.output != nil && p.output.typ != nil {
-			outputType = p.output.typ.String()
+		if p.fn.IsValid() && p.fn.Type().NumOut() > 0 {
+			outputType = p.fn.Type().Out(0).String()
 		}
 		stat = &providerRuntimeStat{
 			FunctionName: GetFnName(p.fn),
 			OutputType:   outputType,
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dixinternal/dix.go` around lines 412 - 423, The provider runtime stat is
using the expanded registration type (p.output.typ) which can differ from the
function's declared return type; update the stat creation in the block that sets
dix.providerStats[p.fn] to use the function's declared return type obtained from
p.fn.Type().Out(0).String() for OutputType instead of p.output.typ.String(),
keeping other fields (FunctionName via GetFnName(p.fn) and map insertion)
unchanged so providerRuntimeStat reflects the original function signature.

Comment on lines +76 to +97
func (n providerFn) callWithTimeout(in []reflect.Value, timeout time.Duration) (outputs []reflect.Value, err error, timedOut bool) {
if timeout <= 0 {
outputs, err = n.call(in)
return outputs, err, false
}

resultCh := make(chan providerCallResult, 1)
go func() {
outputs, callErr := n.call(in)
resultCh <- providerCallResult{outputs: outputs, err: callErr}
}()

timer := time.NewTimer(timeout)
defer timer.Stop()

select {
case res := <-resultCh:
return res.outputs, res.err, false
case <-timer.C:
return nil, fmt.Errorf("provider execution timeout after %s", timeout), true
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Provider goroutine continues after timeout - document this behavior.

When a timeout occurs, the goroutine running n.call(in) continues executing in the background. The result is discarded via the buffered channel, but:

  1. The provider may still complete side effects (DB writes, API calls, etc.)
  2. Resources held by the provider won't be released until it completes
  3. Multiple timeouts can accumulate background goroutines

This is a fundamental Go limitation (no goroutine preemption), but users should be aware that timeout doesn't cancel the provider - it only unblocks the caller.

Consider documenting this behavior in the WithProviderTimeout option or adding a context parameter to providers for cooperative cancellation in the future.

Would you like me to draft documentation text explaining this timeout behavior, or open an issue to track adding context-based cancellation support?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dixinternal/provider.go` around lines 76 - 97, The timeout implementation in
providerFn.callWithTimeout currently leaves the goroutine running after the
caller unblocks, so document this behavior and recommend cooperative
cancellation: update the WithProviderTimeout option's documentation (and any
public docs for provider timeouts) to state that timeouts only unblock the
caller and do not cancel or stop the provider goroutine, that side effects and
resource usage may continue, and advise implementers to accept a context or
implement their own cancellation-aware logic; also add a note suggesting a
future enhancement to add a context parameter to provider functions for
cooperative cancellation (reference providerFn.callWithTimeout and the
WithProviderTimeout option in your docs).

Comment on lines +331 to +337
dir := filepath.Dir(f.path)
if dir != "." && dir != "" {
_ = os.MkdirAll(dir, 0o755)
}
fd, err := os.OpenFile(f.path, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, 0o644)
if err != nil {
return nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's explore the structure and find the trace.go file
find . -name "trace.go" -type f | head -20

Repository: pubgo/dix

Length of output: 72


🏁 Script executed:

# Get the file structure to understand the codebase
git ls-files | grep -E "(trace|dixtrace)" | head -20

Repository: pubgo/dix

Length of output: 125


🏁 Script executed:

# Get file size and read the relevant section
wc -l ./dixtrace/trace.go

Repository: pubgo/dix

Length of output: 76


🏁 Script executed:

# Read the section around lines 331-337 with context
sed -n '320,345p' ./dixtrace/trace.go

Repository: pubgo/dix

Length of output: 496


🏁 Script executed:

# Search for Event type definition to understand what data is traced
rg "type Event struct" ./dixtrace/trace.go -A 10

Repository: pubgo/dix

Length of output: 674


🏁 Script executed:

# Look at the Write method to see what's written
sed -n '347,370p' ./dixtrace/trace.go

Repository: pubgo/dix

Length of output: 385


🏁 Script executed:

# Search for what data is logged/written
rg "Write|Sink" ./dixtrace/trace.go -B 2 -A 5 | head -60

Repository: pubgo/dix

Length of output: 939


🏁 Script executed:

# Get complete Event struct definition
sed -n '40,100p' ./dixtrace/trace.go

Repository: pubgo/dix

Length of output: 1260


🏁 Script executed:

# Search for Event struct with more context
ast-grep --pattern 'type Event struct { $$$ }'

Repository: pubgo/dix

Length of output: 1674


🏁 Script executed:

# Check for other file creation patterns in the codebase
rg "OpenFile|MkdirAll" . --type go -B 2 -A 2

Repository: pubgo/dix

Length of output: 997


🏁 Script executed:

# Check if there are tests that might show what data is logged
head -50 ./dixtrace/trace_test.go

Repository: pubgo/dix

Length of output: 1503


🏁 Script executed:

# Search for examples or documentation about the trace data
rg "FileSink|MemorySink" . --type go --type md | head -30

Repository: pubgo/dix

Length of output: 1617


Use private file permissions for persisted trace data.

The Event struct includes Message, Error, and arbitrary Attrs, making these files sensitive. Creating directories as 0755 and files as 0644 allows other local users to read diagnostics on shared systems. Change to 0700 for directories and 0600 for files. Call Chmod(0o600) after OpenFile to ensure permissions apply regardless of umask.

Suggested fix
 	dir := filepath.Dir(f.path)
 	if dir != "." && dir != "" {
-		_ = os.MkdirAll(dir, 0o755)
+		_ = os.MkdirAll(dir, 0o700)
 	}
-	fd, err := os.OpenFile(f.path, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, 0o644)
+	fd, err := os.OpenFile(f.path, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, 0o600)
 	if err != nil {
 		return nil
 	}
+	_ = fd.Chmod(0o600)
 	f.f = fd
 	return f.f

Note: Similar permission issue exists in dixinternal/diag_file.go.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dixtrace/trace.go` around lines 331 - 337, The directory and file creation in
the trace persistence code use world-readable perms; update the mkdir and file
permissions to be private and enforce them after creation: when creating the dir
for f.path, use 0700 instead of 0755 (MkdirAll), open the file with flags as
before but use 0600 instead of 0644, and immediately call Chmod(0o600) on the
returned fd (or the created path) to ensure permissions apply regardless of
umask; apply the same change pattern used around f.path/OpenFile/Chmod in
dixtrace/trace.go and mirror it in dixinternal/diag_file.go for the Event
persistence logic.

@kooksee kooksee merged commit 637d99a into v2 Mar 24, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant