Skip to content

feat: 新增 dmesg 错误日志检查插件#1415

Merged
kongfei605 merged 5 commits intoflashcatcloud:mainfrom
0x0034:feat/addDmesgPlugins
Mar 11, 2026
Merged

feat: 新增 dmesg 错误日志检查插件#1415
kongfei605 merged 5 commits intoflashcatcloud:mainfrom
0x0034:feat/addDmesgPlugins

Conversation

@0x0034
Copy link
Contributor

@0x0034 0x0034 commented Mar 10, 2026

No description provided.

0x0034 added 2 commits March 10, 2026 17:42
Signed-off-by: ruochen <wanxialianwei@gmail.com>
Signed-off-by: ruochen <wanxialianwei@gmail.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

新增一个 dmesg input 插件,用于从 Linux 内核环形缓冲区(/dev/kmsg)读取并解析日志,并基于内置关键字与用户配置关键字上报错误命中指标。

Changes:

  • 新增 /dev/kmsg 读取与解析逻辑(dmesg decode)。
  • 新增 dmesg input 插件:遍历 dmesg 文本并按关键字上报指标。
  • 在 agent 自动注册列表中加入 inputs/dmesg

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File Description
inputs/dmesg/dmesg_decode.go 新增 /dev/kmsg 读取与消息解析实现,并提供对外 API。
inputs/dmesg/dmesg.go 新增 dmesg input:关键字匹配并上报采样。
agent/metrics_agent.go dmesg input 纳入自动注册。

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1 to +3
// Package dmesg provides interfaces to get log messages from linux kernel ring buffer like
// cmd util 'dmesg' by reading data from /dev/kmsg.
package dmesg
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This package reads /dev/kmsg and uses linux-specific syscall behavior, but it has no build constraints. Because agent/metrics_agent.go always blank-imports this input (like other linux-only inputs), non-Linux builds can fail (e.g., “build constraints exclude all Go files” or missing syscalls). Add //go:build linux (and // +build linux) to the real implementation files and add a dmesg_nolinux.go stub with //go:build !linux, following the pattern used by inputs/conntrack and inputs/kernel.

Copilot uses AI. Check for mistakes.

msg.Text = string(data[prefixEnd+1 : textEnd])
if textEnd == dataLen-1 {
return nil
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseData returns nil when the log line ends right after the newline (textEnd == dataLen-1). For the common /dev/kmsg format where a message is just ...;text\n (no extra device-info lines), this drops valid messages so the plugin won’t ever match keywords. Return the parsed Msg even when there’s no device info (and leave DeviceInfo nil/empty).

Suggested change
return nil
// No additional device info lines; return the parsed message as-is.
return &msg

Copilot uses AI. Check for mistakes.
val, _ := strconv.ParseInt(string(prefix), 10, 64)
msg.TsUsec = val
case 3:
msg.IsFragment = prefix[0] != '-'
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential panic: msg.IsFragment = prefix[0] != '-' assumes prefix is non-empty. If the prefix section has consecutive commas (missing fields) this will be an index-out-of-range panic. Guard with len(prefix) > 0 before indexing.

Suggested change
msg.IsFragment = prefix[0] != '-'
if len(prefix) > 0 {
msg.IsFragment = prefix[0] != '-'
}

Copilot uses AI. Check for mistakes.
Comment on lines +75 to +84
if info[0] != ' ' {
continue
}

kv := bytes.Split(info, []byte("="))
if len(kv) != 2 {
continue
}

msg.DeviceInfo[string(kv[0])] = string(kv[1])
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential panics while parsing device info: if info[0] != ' ' will panic on empty info, and bytes.Split(info, []byte("=")) can return more than 2 parts if the value contains '='. Add len(info) > 0 guard, use bytes.SplitN(..., 2), and consider trimming the leading space so map keys don’t include it.

Suggested change
if info[0] != ' ' {
continue
}
kv := bytes.Split(info, []byte("="))
if len(kv) != 2 {
continue
}
msg.DeviceInfo[string(kv[0])] = string(kv[1])
if len(info) == 0 || info[0] != ' ' {
continue
}
kv := bytes.SplitN(info, []byte("="), 2)
if len(kv) != 2 {
continue
}
key := bytes.TrimSpace(kv[0])
msg.DeviceInfo[string(key)] = string(kv[1])

Copilot uses AI. Check for mistakes.
Comment on lines +113 to +131
buf := make([]byte, bufSize)
_, err := syscall.Read(int(fd), buf)
if err != nil {
syscallError = err
// EINVAL means buf is not enough, data would be truncated, but still can continue.
if !errors.Is(err, syscall.EINVAL) {
return true
}
}

if fetchRaw {
d.raw = append(d.raw, buf)
} else {
msg := parseData(buf)
if msg == nil {
continue
}
d.msg = append(d.msg, *msg)
}
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syscall.Read’s return length is ignored, and the code appends/parses the full bufSize buffer (including trailing zero bytes). This can break parsing (e.g., textEnd == dataLen-1 never true) and makes RawDmesgWithBufSize return padded garbage. Use the returned n to slice (buf[:n]) before appending/parsing; also consider reusing the buffer to avoid allocating each loop iteration.

Copilot uses AI. Check for mistakes.
Comment on lines +28 to +33
const (
_ = iota // 跳过 0 和 1,让 iota 从 2 开始计数
ErrCode1 // 2
ErrCode2 // 3
ErrCode3 // 4
ErrCode4 // 5
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This iota block doesn’t do what the comment says: with _ = iota, ErrCode1 will be 1 (not 2), ErrCode2 will be 2, etc. If you need codes starting at 2, either skip two values or offset iota (e.g., ErrCode1 = iota + 2).

Copilot uses AI. Check for mistakes.
Comment on lines +112 to +114
for _, d := range dmesgs {
slist.PushFront(types.NewSample(inputName, "up", 1, nil))
for _, e := range errorList {
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

up is pushed once per dmesg message inside the loop, producing many duplicate up=1 samples per Gather. Move the success up=1 emission outside the loop so it’s reported once per collection interval.

Copilot uses AI. Check for mistakes.
Comment on lines +19 to +20
Level uint64 // SYSLOG lvel
Facility uint64 // SYSLOG facility
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: SYSLOG lvelSYSLOG level.

Copilot uses AI. Check for mistakes.
0x0034 added 3 commits March 10, 2026 23:59
Signed-off-by: ruochen <wanxialianwei@gmail.com>
Signed-off-by: ruochen <wanxialianwei@gmail.com>
Signed-off-by: ruochen <wanxialianwei@gmail.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +137 to +146
_, err := syscall.Read(int(fd), buf)
if err != nil {
syscallError = err
// EINVAL means buf is not enough, data would be truncated, but still can continue.
if !errors.Is(err, syscall.EINVAL) {
return true
}
}

msg := parseData(buf)
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The read loop ignores the byte count returned by syscall.Read and always passes the full fixed-size buffer to parseData. That means parseData is parsing trailing zero bytes and can mis-handle end-of-line logic. Capture n and call parseData(buf[:n]) (and consider skipping parse on non-nil errors like EINVAL if the message is known-truncated).

Suggested change
_, err := syscall.Read(int(fd), buf)
if err != nil {
syscallError = err
// EINVAL means buf is not enough, data would be truncated, but still can continue.
if !errors.Is(err, syscall.EINVAL) {
return true
}
}
msg := parseData(buf)
n, err := syscall.Read(int(fd), buf)
if err != nil {
syscallError = err
// EINVAL means buf is not enough, data would be truncated, but still can continue.
if errors.Is(err, syscall.EINVAL) {
// Skip parsing known-truncated data.
continue
}
return true
}
if n <= 0 {
continue
}
msg := parseData(buf[:n])

Copilot uses AI. Check for mistakes.
msg.DeviceInfo = make(map[string]string, 2)
deviceInfo := bytes.Split(data[textEnd+1:dataLen-1], []byte("\n"))
for _, info := range deviceInfo {
if info[0] != ' ' {
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a potential panic in device-info parsing: if info[0] != ' ' will panic if info is empty (e.g., consecutive newlines or an empty trailer slice). Guard len(info) > 0 before indexing.

Suggested change
if info[0] != ' ' {
if len(info) == 0 || info[0] != ' ' {

Copilot uses AI. Check for mistakes.
)

type Msg struct {
Level uint64 // SYSLOG lvel
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling: comment says "SYSLOG lvel"; should be "SYSLOG level".

Suggested change
Level uint64 // SYSLOG lvel
Level uint64 // SYSLOG level

Copilot uses AI. Check for mistakes.
Comment on lines +187 to +241
func parseData(data []byte) *Msg {
msg := Msg{}

dataLen := len(data)
prefixEnd := bytes.IndexByte(data, ';')
if prefixEnd == -1 {
return nil
}

for index, prefix := range bytes.Split(data[:prefixEnd], []byte(",")) {
switch index {
case 0:
val, _ := strconv.ParseUint(string(prefix), 10, 64)
msg.Level = val & levelMask
msg.Facility = val & (^levelMask)
case 1:
val, _ := strconv.ParseUint(string(prefix), 10, 64)
msg.Seq = val
case 2:
val, _ := strconv.ParseInt(string(prefix), 10, 64)
msg.TsUsec = val
case 3:
msg.IsFragment = prefix[0] != '-'
case 4:
msg.Caller = string(prefix)
}
}

textEnd := bytes.IndexByte(data, '\n')
if textEnd == -1 || textEnd <= prefixEnd {
return nil
}

msg.Text = string(data[prefixEnd+1 : textEnd])
if textEnd == dataLen-1 {
return nil
}

msg.DeviceInfo = make(map[string]string, 2)
deviceInfo := bytes.Split(data[textEnd+1:dataLen-1], []byte("\n"))
for _, info := range deviceInfo {
if info[0] != ' ' {
continue
}

kv := bytes.Split(info, []byte("="))
if len(kv) != 2 {
continue
}

msg.DeviceInfo[string(kv[0])] = string(kv[1])
}

return &msg
}
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new parser/keyword matching logic isn’t covered by tests. Since it’s easy to regress and interacts with tricky /dev/kmsg formatting (prefix parsing, no-device-info lines, empty segments), please add unit tests for parseData (at least: normal line without device info, line with device info, and malformed/empty segments).

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +15
package dmesg

import (
"bytes"
"errors"
"log"
"os"
"strconv"
"strings"
"syscall"

"flashcat.cloud/categraf/config"
"flashcat.cloud/categraf/inputs"
"flashcat.cloud/categraf/types"
)
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This input reads /dev/kmsg and uses Linux-only syscall flags (e.g., O_NONBLOCK). Without Linux build tags plus a !linux stub file, the repo will fail to compile on non-Linux platforms (and the blank import in agent/metrics_agent.go will also break builds). Please add //go:build linux to this file and add a corresponding dmesg_* !linux stub (see inputs/conntrack/conntrack.go and inputs/conntrack/conntrack_nolinux.go for the established pattern).

Copilot uses AI. Check for mistakes.
Comment on lines +221 to +237
if textEnd == dataLen-1 {
return nil
}

msg.DeviceInfo = make(map[string]string, 2)
deviceInfo := bytes.Split(data[textEnd+1:dataLen-1], []byte("\n"))
for _, info := range deviceInfo {
if info[0] != ' ' {
continue
}

kv := bytes.Split(info, []byte("="))
if len(kv) != 2 {
continue
}

msg.DeviceInfo[string(kv[0])] = string(kv[1])
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseData returns nil when the log line ends right after the message text (textEnd == dataLen-1). That drops the common /dev/kmsg case where there is no device-info trailer, so keyword detection will miss most messages. This function should return the parsed Msg even when there is no device info section.

Suggested change
if textEnd == dataLen-1 {
return nil
}
msg.DeviceInfo = make(map[string]string, 2)
deviceInfo := bytes.Split(data[textEnd+1:dataLen-1], []byte("\n"))
for _, info := range deviceInfo {
if info[0] != ' ' {
continue
}
kv := bytes.Split(info, []byte("="))
if len(kv) != 2 {
continue
}
msg.DeviceInfo[string(kv[0])] = string(kv[1])
// Only parse device info if there is data after the message text.
if textEnd < dataLen-1 {
msg.DeviceInfo = make(map[string]string, 2)
deviceInfo := bytes.Split(data[textEnd+1:dataLen-1], []byte("\n"))
for _, info := range deviceInfo {
if len(info) == 0 || info[0] != ' ' {
continue
}
kv := bytes.Split(info, []byte("="))
if len(kv) != 2 {
continue
}
msg.DeviceInfo[string(kv[0])] = string(kv[1])
}

Copilot uses AI. Check for mistakes.
val, _ := strconv.ParseInt(string(prefix), 10, 64)
msg.TsUsec = val
case 3:
msg.IsFragment = prefix[0] != '-'
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a potential panic in parseData: msg.IsFragment = prefix[0] != '-' assumes the 4th prefix field is non-empty. If the field is empty/malformed, this will index out of range. Add a length check before accessing prefix[0] and treat empty as non-fragment (or return nil).

Suggested change
msg.IsFragment = prefix[0] != '-'
if len(prefix) > 0 {
msg.IsFragment = prefix[0] != '-'
} else {
// Treat empty or malformed fragment field as non-fragment.
msg.IsFragment = false
}

Copilot uses AI. Check for mistakes.
_ "flashcat.cloud/categraf/inputs/dcgm"
_ "flashcat.cloud/categraf/inputs/disk"
_ "flashcat.cloud/categraf/inputs/diskio"
_ "flashcat.cloud/categraf/inputs/dmesg"
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blank-importing the dmesg input will break non-Linux builds unless the input package is guarded by Linux build tags and provides a //go:build !linux stub file. Please ensure the dmesg input follows the same cross-platform pattern as other Linux-only inputs (e.g., inputs/conntrack).

Suggested change
_ "flashcat.cloud/categraf/inputs/dmesg"

Copilot uses AI. Check for mistakes.
Comment on lines +131 to +172
msgs := make([]Msg, 0)

var syscallError error = nil
err := ins.conn.Read(func(fd uintptr) bool {
for {
buf := make([]byte, defaultBufSize)
_, err := syscall.Read(int(fd), buf)
if err != nil {
syscallError = err
// EINVAL means buf is not enough, data would be truncated, but still can continue.
if !errors.Is(err, syscall.EINVAL) {
return true
}
}

msg := parseData(buf)
if msg == nil {
continue
}
msgs = append(msgs, *msg)
}
})

// EAGAIN means no more data, should be treated as normal.
if syscallError != nil && !errors.Is(syscallError, syscall.EAGAIN) {
err = syscallError
}

if err != nil {
log.Println("Error reading from /dev/kmsg:", err)
slist.PushFront(types.NewSample(inputName, "up", 0, nil))
return
}

slist.PushFront(types.NewSample(inputName, "up", 1, nil))
for _, d := range msgs {
for keyword := range ins.errorList {
if strings.Contains(d.Text, keyword) {
ins.errorList[keyword]++
}
}
}
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Gather loop allocates a new 16KB buffer and appends Msg structs for every kmsg line, then does a second pass just to count keyword hits. This creates avoidable allocations and memory growth under heavy kernel logging. Consider reusing a single buffer (or a pool) and incrementing counters as each message is parsed, without storing all messages in msgs.

Copilot uses AI. Check for mistakes.
@kongfei605 kongfei605 merged commit f191b2f into flashcatcloud:main Mar 11, 2026
5 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants