Skip to content

ipmi-lite: Fix IPMI SEL rollover (Record ID 0 collision with SEL_RECID_FIRST)#274

Open
louis-nexthop wants to merge 1 commit into
facebook:heliumfrom
nexthop-ai:fix-ipmi-sel-rollover
Open

ipmi-lite: Fix IPMI SEL rollover (Record ID 0 collision with SEL_RECID_FIRST)#274
louis-nexthop wants to merge 1 commit into
facebook:heliumfrom
nexthop-ai:fix-ipmi-sel-rollover

Conversation

@louis-nexthop

Copy link
Copy Markdown

Description

Fixes #273

Context

IPMI Specification: https://www.intel.la/content/dam/www/public/us/en/documents/specification-updates/ipmi-intelligent-platform-mgt-interface-spec-2nd-gen-v2-0-spec-update.pdf

The standard IPMI SEL "iterator" protocol

An IPMI client (e.g. ipmitool sel list) walks the log by:

  1. Issuing Get SEL Entry with read_rec_id = 0x0000 (SEL_RECID_FIRST).
  2. Receiving the entry plus a next_rec_id.
  3. Re-issuing Get SEL Entry with read_rec_id = next_rec_id.
  4. Stopping when the returned next_rec_id == 0xFFFF (SEL_RECID_LAST).

What was done

Fix index and record ID calculations in sel_get_entry() and sel_add_entry(), so they are consistent with these conventions from the specs:

  • index is 0-indexed.
  • record ID is 1-indexed.
    • record ID = 0x0000 (SEL_RECID_FIRST) is reserved with a special meaning for access command (GET FIRST ENTRY; "give me the first record")
    • record ID = 0xFFFF (SEL_RECID_LAST) is reserved with a special meaning for access command (GET LAST ENTRY; "give me the last record")
  • g_sel_hdr[node].begin is the first index of the buffer
  • g_sel_hdr[node].end is the next empty slot of the buffer

Motivation

The rollover logic had a flaw.

sel.c used Record ID 0x0000 which should be reserved.

After ring-buffer rollover, ipmitool's Get SEL Entry chain hit a *next_rec_id = 0 (post-wrap), re-issued Get SEL Entry(0), and sel_get_entry() returned the first record again — infinite loop, broken by ipmitool's cycle-detection. Symptom: host's ipmitool sel list collapsed to a single row after rollover.

Test Plan

Before:

# Initial: have 129 SEL records
[root@fboss ~]# ipmitool sel list | wc -l
129

# Bug: ID "1" is duplicated
[root@fboss ~]# ipmitool sel list | head -4
   1 | 05/14/2026 | 20:40:26 | Processor #0x40 | Uncorrectable machine check exception | Asserted
   1 | 05/14/2026 | 20:40:26 | Processor #0x40 | Uncorrectable machine check exception | Asserted
   2 | 05/14/2026 | 20:42:44 | Processor #0x40 | Uncorrectable machine check exception | Asserted
   3 | 05/14/2026 | 20:45:02 | Processor #0x40 | Uncorrectable machine check exception | Asserted
[root@fboss ~]# ipmitool sel list | tail -3
  7e | 05/28/2026 | 05:01:09 | Temperature #0x17 |  | Asserted
  7f | 05/28/2026 | 05:03:36 | Temperature #0x17 |  | Asserted
  80 | 05/28/2026 | 05:03:50 | Temperature #0x17 |  | Asserted

# Then: add 1 more record
[root@fboss ~]# ipmitool raw 0x0a 0x44 0x01 0x00 0x02 0xab 0xcd 0xef 0x00 0x01 0x00 0x04 0x01 0x17 0x00 0xa0 0x04 0x07
 00 00

# Bug: new entry is not shown. It only output the record with ID "2" which was existing
[root@fboss ~]# ipmitool sel list | wc -l
1
[root@fboss ~]# ipmitool sel list
   2 | 05/14/2026 | 20:42:44 | Processor #0x40 | Uncorrectable machine check exception | Asserted

After (flash BMC with a new image):

# Correctly report maximum of 128 SEL records
[root@fboss ~]# ipmitool sel list | wc -l
128

# The order is correct (although, the incorrect record ID "0" is still shown from the previous image)
[root@fboss ~]# ipmitool sel list | head -3
   2 | 05/14/2026 | 20:42:44 | Processor #0x40 | Uncorrectable machine check exception | Asserted
   3 | 05/14/2026 | 20:45:02 | Processor #0x40 | Uncorrectable machine check exception | Asserted
   4 | 05/14/2026 | 20:47:20 | Processor #0x40 | Uncorrectable machine check exception | Asserted
[root@fboss ~]# ipmitool sel list | tail -3
  7f | 05/28/2026 | 05:03:36 | Temperature #0x17 |  | Asserted
  80 | 05/28/2026 | 05:03:50 | Temperature #0x17 |  | Asserted
   0 | 05/28/2026 | 05:05:29 | Temperature #0x17 |  | Asserted

# Try to inject 129 more entries to overflow again
[root@fboss ~]# for i in {1..129}; do \
    ipmitool raw 0x0a 0x44 0x01 0x00 0x02 0xab 0xcd 0xef 0x00 0x01 0x00 0x04 0x01 0x17 0x00 0xa0 0x04 0x07; \
    sleep 0.5s; \
    done;

# Report correct number of entries (128 max)
[root@fboss ~]# ipmitool sel list | wc -l
128

# Order looks correct (it's expected that Record ID 2 is evicted; array has 129 slots but 128 entries max)
[root@fboss ~]# ipmitool sel list | head -2
   3 | 05/28/2026 | 05:48:15 | Temperature #0x17 |  | Asserted
   4 | 05/28/2026 | 05:48:16 | Temperature #0x17 |  | Asserted
[root@fboss ~]# ipmitool sel list | tail -2
  81 | 05/28/2026 | 05:49:21 | Temperature #0x17 |  | Asserted
   1 | 05/28/2026 | 05:50:30 | Temperature #0x17 |  | Asserted

Signed-off-by: Louis Maliyam <louis@nexthop.ai>
@meta-cla meta-cla Bot added the CLA Signed label Jun 11, 2026
@meta-codesync

meta-codesync Bot commented Jun 11, 2026

Copy link
Copy Markdown

This pull request has been imported. If you are a Meta employee, you can view this in D108252417. (Because this pull request was imported automatically, there will not be any future comments.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

IPMI: after SEL overflow, ipmitool sel list collapsed to a single row

1 participant