feat: 新增 dmesg 错误日志检查插件 by 0x0034 · Pull Request #1415 · flashcatcloud/categraf

0x0034 · 2026-03-10T09:42:23Z

No description provided.

Signed-off-by: ruochen <wanxialianwei@gmail.com>

Copilot

Pull request overview

新增一个 dmesg input 插件，用于从 Linux 内核环形缓冲区（/dev/kmsg）读取并解析日志，并基于内置关键字与用户配置关键字上报错误命中指标。

Changes:

新增 /dev/kmsg 读取与解析逻辑（dmesg decode）。
新增 dmesg input 插件：遍历 dmesg 文本并按关键字上报指标。
在 agent 自动注册列表中加入 inputs/dmesg。

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File	Description
inputs/dmesg/dmesg_decode.go	新增 `/dev/kmsg` 读取与消息解析实现，并提供对外 API。
inputs/dmesg/dmesg.go	新增 `dmesg` input：关键字匹配并上报采样。
agent/metrics_agent.go	将 `dmesg` input 纳入自动注册。

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-10T09:54:41Z

inputs/dmesg/dmesg_decode.go

+// Package dmesg provides interfaces to get log messages from linux kernel ring buffer like
+// cmd util 'dmesg' by reading data from /dev/kmsg.
+package dmesg


This package reads /dev/kmsg and uses linux-specific syscall behavior, but it has no build constraints. Because agent/metrics_agent.go always blank-imports this input (like other linux-only inputs), non-Linux builds can fail (e.g., “build constraints exclude all Go files” or missing syscalls). Add //go:build linux (and // +build linux) to the real implementation files and add a dmesg_nolinux.go stub with //go:build !linux, following the pattern used by inputs/conntrack and inputs/kernel.

Copilot · 2026-03-10T09:54:41Z

inputs/dmesg/dmesg_decode.go

+
+	msg.Text = string(data[prefixEnd+1 : textEnd])
+	if textEnd == dataLen-1 {
+		return nil


parseData returns nil when the log line ends right after the newline (textEnd == dataLen-1). For the common /dev/kmsg format where a message is just ...;text\n (no extra device-info lines), this drops valid messages so the plugin won’t ever match keywords. Return the parsed Msg even when there’s no device info (and leave DeviceInfo nil/empty).

Suggested change

return nil

// No additional device info lines; return the parsed message as-is.

return &msg

Copilot · 2026-03-10T09:54:41Z

inputs/dmesg/dmesg_decode.go

+			val, _ := strconv.ParseInt(string(prefix), 10, 64)
+			msg.TsUsec = val
+		case 3:
+			msg.IsFragment = prefix[0] != '-'


Potential panic: msg.IsFragment = prefix[0] != '-' assumes prefix is non-empty. If the prefix section has consecutive commas (missing fields) this will be an index-out-of-range panic. Guard with len(prefix) > 0 before indexing.

Suggested change

msg.IsFragment = prefix[0] != '-'

if len(prefix) > 0 {

msg.IsFragment = prefix[0] != '-'

}

Copilot · 2026-03-10T09:54:42Z

inputs/dmesg/dmesg_decode.go

+		if info[0] != ' ' {
+			continue
+		}
+
+		kv := bytes.Split(info, []byte("="))
+		if len(kv) != 2 {
+			continue
+		}
+
+		msg.DeviceInfo[string(kv[0])] = string(kv[1])


Potential panics while parsing device info: if info[0] != ' ' will panic on empty info, and bytes.Split(info, []byte("=")) can return more than 2 parts if the value contains '='. Add len(info) > 0 guard, use bytes.SplitN(..., 2), and consider trimming the leading space so map keys don’t include it.

Suggested change

if info[0] != ' ' {

continue

}

kv := bytes.Split(info, []byte("="))

if len(kv) != 2 {

continue

}

msg.DeviceInfo[string(kv[0])] = string(kv[1])

if len(info) == 0 || info[0] != ' ' {

continue

}

kv := bytes.SplitN(info, []byte("="), 2)

if len(kv) != 2 {

continue

}

key := bytes.TrimSpace(kv[0])

msg.DeviceInfo[string(key)] = string(kv[1])

Copilot · 2026-03-10T09:54:42Z

inputs/dmesg/dmesg_decode.go

+			buf := make([]byte, bufSize)
+			_, err := syscall.Read(int(fd), buf)
+			if err != nil {
+				syscallError = err
+				// EINVAL means buf is not enough, data would be truncated, but still can continue.
+				if !errors.Is(err, syscall.EINVAL) {
+					return true
+				}
+			}
+
+			if fetchRaw {
+				d.raw = append(d.raw, buf)
+			} else {
+				msg := parseData(buf)
+				if msg == nil {
+					continue
+				}
+				d.msg = append(d.msg, *msg)
+			}


syscall.Read’s return length is ignored, and the code appends/parses the full bufSize buffer (including trailing zero bytes). This can break parsing (e.g., textEnd == dataLen-1 never true) and makes RawDmesgWithBufSize return padded garbage. Use the returned n to slice (buf[:n]) before appending/parsing; also consider reusing the buffer to avoid allocating each loop iteration.

Copilot · 2026-03-10T09:54:42Z

inputs/dmesg/dmesg.go

+const (
+	_         = iota // 跳过 0 和 1，让 iota 从 2 开始计数
+	ErrCode1         // 2
+	ErrCode2         // 3
+	ErrCode3         // 4
+	ErrCode4         // 5


This iota block doesn’t do what the comment says: with _ = iota, ErrCode1 will be 1 (not 2), ErrCode2 will be 2, etc. If you need codes starting at 2, either skip two values or offset iota (e.g., ErrCode1 = iota + 2).

Copilot · 2026-03-10T09:54:42Z

inputs/dmesg/dmesg.go

+	for _, d := range dmesgs {
+		slist.PushFront(types.NewSample(inputName, "up", 1, nil))
+		for _, e := range errorList {


up is pushed once per dmesg message inside the loop, producing many duplicate up=1 samples per Gather. Move the success up=1 emission outside the loop so it’s reported once per collection interval.

Copilot · 2026-03-10T09:54:42Z

inputs/dmesg/dmesg_decode.go

+	Level      uint64            // SYSLOG lvel
+	Facility   uint64            // SYSLOG facility


Typo in comment: SYSLOG lvel → SYSLOG level.

Signed-off-by: ruochen <wanxialianwei@gmail.com>

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 9 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-11T06:59:16Z

inputs/dmesg/dmesg.go

+			_, err := syscall.Read(int(fd), buf)
+			if err != nil {
+				syscallError = err
+				// EINVAL means buf is not enough, data would be truncated, but still can continue.
+				if !errors.Is(err, syscall.EINVAL) {
+					return true
+				}
+			}
+
+			msg := parseData(buf)


The read loop ignores the byte count returned by syscall.Read and always passes the full fixed-size buffer to parseData. That means parseData is parsing trailing zero bytes and can mis-handle end-of-line logic. Capture n and call parseData(buf[:n]) (and consider skipping parse on non-nil errors like EINVAL if the message is known-truncated).

Suggested change

_, err := syscall.Read(int(fd), buf)

if err != nil {

syscallError = err

// EINVAL means buf is not enough, data would be truncated, but still can continue.

if !errors.Is(err, syscall.EINVAL) {

return true

}

}

msg := parseData(buf)

n, err := syscall.Read(int(fd), buf)

if err != nil {

syscallError = err

// EINVAL means buf is not enough, data would be truncated, but still can continue.

if errors.Is(err, syscall.EINVAL) {

// Skip parsing known-truncated data.

continue

}

return true

}

if n <= 0 {

continue

}

msg := parseData(buf[:n])

Copilot · 2026-03-11T06:59:16Z

inputs/dmesg/dmesg.go

+	msg.DeviceInfo = make(map[string]string, 2)
+	deviceInfo := bytes.Split(data[textEnd+1:dataLen-1], []byte("\n"))
+	for _, info := range deviceInfo {
+		if info[0] != ' ' {


There is a potential panic in device-info parsing: if info[0] != ' ' will panic if info is empty (e.g., consecutive newlines or an empty trailer slice). Guard len(info) > 0 before indexing.

Suggested change

if info[0] != ' ' {

if len(info) == 0 || info[0] != ' ' {

Copilot · 2026-03-11T06:59:17Z

inputs/dmesg/dmesg.go

+)
+
+type Msg struct {
+	Level      uint64            // SYSLOG lvel


Spelling: comment says "SYSLOG lvel"; should be "SYSLOG level".

Suggested change

Level uint64 // SYSLOG lvel

Level uint64 // SYSLOG level

Copilot · 2026-03-11T06:59:17Z

inputs/dmesg/dmesg.go

+func parseData(data []byte) *Msg {
+	msg := Msg{}
+
+	dataLen := len(data)
+	prefixEnd := bytes.IndexByte(data, ';')
+	if prefixEnd == -1 {
+		return nil
+	}
+
+	for index, prefix := range bytes.Split(data[:prefixEnd], []byte(",")) {
+		switch index {
+		case 0:
+			val, _ := strconv.ParseUint(string(prefix), 10, 64)
+			msg.Level = val & levelMask
+			msg.Facility = val & (^levelMask)
+		case 1:
+			val, _ := strconv.ParseUint(string(prefix), 10, 64)
+			msg.Seq = val
+		case 2:
+			val, _ := strconv.ParseInt(string(prefix), 10, 64)
+			msg.TsUsec = val
+		case 3:
+			msg.IsFragment = prefix[0] != '-'
+		case 4:
+			msg.Caller = string(prefix)
+		}
+	}
+
+	textEnd := bytes.IndexByte(data, '\n')
+	if textEnd == -1 || textEnd <= prefixEnd {
+		return nil
+	}
+
+	msg.Text = string(data[prefixEnd+1 : textEnd])
+	if textEnd == dataLen-1 {
+		return nil
+	}
+
+	msg.DeviceInfo = make(map[string]string, 2)
+	deviceInfo := bytes.Split(data[textEnd+1:dataLen-1], []byte("\n"))
+	for _, info := range deviceInfo {
+		if info[0] != ' ' {
+			continue
+		}
+
+		kv := bytes.Split(info, []byte("="))
+		if len(kv) != 2 {
+			continue
+		}
+
+		msg.DeviceInfo[string(kv[0])] = string(kv[1])
+	}
+
+	return &msg
+}


This new parser/keyword matching logic isn’t covered by tests. Since it’s easy to regress and interacts with tricky /dev/kmsg formatting (prefix parsing, no-device-info lines, empty segments), please add unit tests for parseData (at least: normal line without device info, line with device info, and malformed/empty segments).

Copilot · 2026-03-11T06:59:17Z

inputs/dmesg/dmesg.go

+package dmesg
+
+import (
+	"bytes"
+	"errors"
+	"log"
+	"os"
+	"strconv"
+	"strings"
+	"syscall"
+
+	"flashcat.cloud/categraf/config"
+	"flashcat.cloud/categraf/inputs"
+	"flashcat.cloud/categraf/types"
+)


This input reads /dev/kmsg and uses Linux-only syscall flags (e.g., O_NONBLOCK). Without Linux build tags plus a !linux stub file, the repo will fail to compile on non-Linux platforms (and the blank import in agent/metrics_agent.go will also break builds). Please add //go:build linux to this file and add a corresponding dmesg_* !linux stub (see inputs/conntrack/conntrack.go and inputs/conntrack/conntrack_nolinux.go for the established pattern).

Copilot · 2026-03-11T06:59:17Z

inputs/dmesg/dmesg.go

+	if textEnd == dataLen-1 {
+		return nil
+	}
+
+	msg.DeviceInfo = make(map[string]string, 2)
+	deviceInfo := bytes.Split(data[textEnd+1:dataLen-1], []byte("\n"))
+	for _, info := range deviceInfo {
+		if info[0] != ' ' {
+			continue
+		}
+
+		kv := bytes.Split(info, []byte("="))
+		if len(kv) != 2 {
+			continue
+		}
+
+		msg.DeviceInfo[string(kv[0])] = string(kv[1])


parseData returns nil when the log line ends right after the message text (textEnd == dataLen-1). That drops the common /dev/kmsg case where there is no device-info trailer, so keyword detection will miss most messages. This function should return the parsed Msg even when there is no device info section.

Suggested change

if textEnd == dataLen-1 {

return nil

}

msg.DeviceInfo = make(map[string]string, 2)

deviceInfo := bytes.Split(data[textEnd+1:dataLen-1], []byte("\n"))

for _, info := range deviceInfo {

if info[0] != ' ' {

continue

}

kv := bytes.Split(info, []byte("="))

if len(kv) != 2 {

continue

}

msg.DeviceInfo[string(kv[0])] = string(kv[1])

// Only parse device info if there is data after the message text.

if textEnd < dataLen-1 {

msg.DeviceInfo = make(map[string]string, 2)

deviceInfo := bytes.Split(data[textEnd+1:dataLen-1], []byte("\n"))

for _, info := range deviceInfo {

if len(info) == 0 || info[0] != ' ' {

continue

}

kv := bytes.Split(info, []byte("="))

if len(kv) != 2 {

continue

}

msg.DeviceInfo[string(kv[0])] = string(kv[1])

}

Copilot · 2026-03-11T06:59:17Z

inputs/dmesg/dmesg.go

+			val, _ := strconv.ParseInt(string(prefix), 10, 64)
+			msg.TsUsec = val
+		case 3:
+			msg.IsFragment = prefix[0] != '-'


There is a potential panic in parseData: msg.IsFragment = prefix[0] != '-' assumes the 4th prefix field is non-empty. If the field is empty/malformed, this will index out of range. Add a length check before accessing prefix[0] and treat empty as non-fragment (or return nil).

Suggested change

msg.IsFragment = prefix[0] != '-'

if len(prefix) > 0 {

msg.IsFragment = prefix[0] != '-'

} else {

// Treat empty or malformed fragment field as non-fragment.

msg.IsFragment = false

}

Copilot · 2026-03-11T06:59:18Z

agent/metrics_agent.go

 	_ "flashcat.cloud/categraf/inputs/dcgm"
 	_ "flashcat.cloud/categraf/inputs/disk"
 	_ "flashcat.cloud/categraf/inputs/diskio"
+	_ "flashcat.cloud/categraf/inputs/dmesg"


Blank-importing the dmesg input will break non-Linux builds unless the input package is guarded by Linux build tags and provides a //go:build !linux stub file. Please ensure the dmesg input follows the same cross-platform pattern as other Linux-only inputs (e.g., inputs/conntrack).

Suggested change

_ "flashcat.cloud/categraf/inputs/dmesg"

Copilot · 2026-03-11T06:59:18Z

inputs/dmesg/dmesg.go

+	msgs := make([]Msg, 0)
+
+	var syscallError error = nil
+	err := ins.conn.Read(func(fd uintptr) bool {
+		for {
+			buf := make([]byte, defaultBufSize)
+			_, err := syscall.Read(int(fd), buf)
+			if err != nil {
+				syscallError = err
+				// EINVAL means buf is not enough, data would be truncated, but still can continue.
+				if !errors.Is(err, syscall.EINVAL) {
+					return true
+				}
+			}
+
+			msg := parseData(buf)
+			if msg == nil {
+				continue
+			}
+			msgs = append(msgs, *msg)
+		}
+	})
+
+	// EAGAIN means no more data, should be treated as normal.
+	if syscallError != nil && !errors.Is(syscallError, syscall.EAGAIN) {
+		err = syscallError
+	}
+
+	if err != nil {
+		log.Println("Error reading from /dev/kmsg:", err)
+		slist.PushFront(types.NewSample(inputName, "up", 0, nil))
+		return
+	}
+
+	slist.PushFront(types.NewSample(inputName, "up", 1, nil))
+	for _, d := range msgs {
+		for keyword := range ins.errorList {
+			if strings.Contains(d.Text, keyword) {
+				ins.errorList[keyword]++
+			}
+		}
+	}


The Gather loop allocates a new 16KB buffer and appends Msg structs for every kmsg line, then does a second pass just to count keyword hits. This creates avoidable allocations and memory growth under heavy kernel logging. Consider reusing a single buffer (or a pool) and incrementing counters as each message is parsed, without storing all messages in msgs.

0x0034 added 2 commits March 10, 2026 17:42

feat: 新增 dmesg 错误日志检查插件

1b73c93

Signed-off-by: ruochen <wanxialianwei@gmail.com>

fix: 补充成功采集指标

a55cbe4

Signed-off-by: ruochen <wanxialianwei@gmail.com>

kongfei605 requested a review from Copilot March 10, 2026 09:50

Copilot started reviewing on behalf of kongfei605 March 10, 2026 09:50 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

0x0034 added 3 commits March 10, 2026 23:59

fix: 优化 ringbuffer 采集

3cc6d8c

Signed-off-by: ruochen <wanxialianwei@gmail.com>

fix: 修改常量命名方式为驼峰

26a2fc4

Signed-off-by: ruochen <wanxialianwei@gmail.com>

fix: 优化 up 指标吐出,和部分代码逻辑

9c25f21

Signed-off-by: ruochen <wanxialianwei@gmail.com>

kongfei605 requested a review from Copilot March 11, 2026 06:53

Copilot started reviewing on behalf of kongfei605 March 11, 2026 06:54 View session

Copilot AI reviewed Mar 11, 2026

View reviewed changes

kongfei605 merged commit f191b2f into flashcatcloud:main Mar 11, 2026
5 of 7 checks passed

	return nil
	// No additional device info lines; return the parsed message as-is.
	return &msg

		Level uint64 // SYSLOG lvel
		Facility uint64 // SYSLOG facility

Conversation

0x0034 commented Mar 10, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants