ibmveth: Disable GSO for packets with small MSS by Ryan-Chen74 · Pull Request #3 · SUSE/kernel

Ryan-Chen74 · 2026-06-09T15:06:46Z

{"bug": "214307", "email": "rchen1@linux.ibm.com", "label": "backport"}

Some physical adapters on Power systems do not support segmentation offload when the MSS is less than 224 bytes. Attempting to send such packets causes the adapter to freeze, stopping all traffic until manually reset. Implement ndo_features_check to disable GSO for packets with small MSS values. The network stack will perform software segmentation instead. The 224-byte minimum matches ibmvnic commit <f10b09ef687f> ("ibmvnic: Enforce stronger sanity checks on GSO packets") which uses the same physical adapters in SEA configurations. The issue occurs specifically when the hardware attempts to perform segmentation (gso_segs > 1) with a small MSS. Single-segment GSO packets (gso_segs == 1) do not trigger the problematic LSO code path and are transmitted normally without segmentation. Add an ndo_features_check callback to disable GSO when MSS < 224 bytes. Also call vlan_features_check() to ensure proper handling of VLAN packets, particularly QinQ (802.1ad) configurations where the hardware parser may not support certain offload features. Validated using iptables to force small MSS values. Without the fix, the adapter freezes. With the fix, packets are segmented in software and transmission succeeds. Comprehensive regression testing completedd (MSS tests, performance, stability). Fixes: 8641dd8 ("ibmveth: Add support for TSO") Cc: stable@vger.kernel.org Reviewed-by: Brian King <bjking1@linux.ibm.com> Tested-by: Shaik Abdulla <shaik.abdulla1@ibm.com> Tested-by: Naveed Ahmed <naveedaus@in.ibm.com> Signed-off-by: Mingming Cao <mmc@linux.ibm.com> Link: https://patch.msgid.link/20260424162917.65725-1-mmc@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>

…equeue_peeked [ Upstream commit 1b9bc71 ] When sfb has children (eg qfq qdisc) whose peek() callback is qdisc_peek_dequeued(), we could get a kernel panic. When the parent of such qdiscs (eg illustrated in patch #3 as tbf) wants to retrieve an skb from its child (sfb in this case), it will do the following: 1a. do a peek() - and when sensing there's an skb the child can offer, then - the child in this case(sfb) calls its child's (qfq) peek. qfq does the right thing and will return the gso_skb queue packet. Note: if there wasnt a gso_skb entry then qfq will store it there. 1b. invoke a dequeue() on the child (sfb). And herein lies the problem. - sfb will call the child's dequeue() which will essentially just try to grab something of qfq's queue. [ 127.594489][ T453] KASAN: null-ptr-deref in range [0x0000000000000048-0x000000000000004f] [ 127.594741][ T453] CPU: 2 UID: 0 PID: 453 Comm: ping Not tainted 7.1.0-rc1-00035-gac961974495b-dirty #793 PREEMPT(full) [ 127.595059][ T453] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 127.595254][ T453] RIP: 0010:qfq_dequeue+0x35c/0x1650 [sch_qfq] [ 127.595461][ T453] Code: 00 fc ff df 80 3c 02 00 0f 85 17 0e 00 00 4c 8d 73 48 48 89 9d b8 02 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f 85 76 0c 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b [ 127.596081][ T453] RSP: 0018:ffff88810e5af440 EFLAGS: 00010216 [ 127.596337][ T453] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: dffffc0000000000 [ 127.596623][ T453] RDX: 0000000000000009 RSI: 0000001880000000 RDI: ffff888104fd82b0 [ 127.596917][ T453] RBP: ffff888104fd8000 R08: ffff888104fd8280 R09: 1ffff110211893a3 [ 127.597165][ T453] R10: 1ffff110211893a6 R11: 1ffff110211893a7 R12: 0000001880000000 [ 127.597404][ T453] R13: ffff888104fd82b8 R14: 0000000000000048 R15: 0000000040000000 [ 127.597644][ T453] FS: 00007fc380cbfc40(0000) GS:ffff88816f2a8000(0000) knlGS:0000000000000000 [ 127.597956][ T453] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 127.598160][ T453] CR2: 00005610aa9890a8 CR3: 000000010369e000 CR4: 0000000000750ef0 [ 127.598390][ T453] PKRU: 55555554 [ 127.598509][ T453] Call Trace: [ 127.598629][ T453] <TASK> [ 127.598718][ T453] ? mark_held_locks+0x40/0x70 [ 127.598890][ T453] ? srso_alias_return_thunk+0x5/0xfbef5 [ 127.599053][ T453] sfb_dequeue+0x88/0x4d0 [ 127.599174][ T453] ? ktime_get+0x137/0x230 [ 127.599328][ T453] ? srso_alias_return_thunk+0x5/0xfbef5 [ 127.599480][ T453] ? qdisc_peek_dequeued+0x7b/0x350 [sch_qfq] [ 127.599670][ T453] ? srso_alias_return_thunk+0x5/0xfbef5 [ 127.599831][ T453] tbf_dequeue+0x6b1/0x1098 [sch_tbf] [ 127.599988][ T453] __qdisc_run+0x169/0x1900 The right thing to do in #1b is to grab the skb off gso_skb queue. This patchset fixes that issue by changing #1b to use qdisc_dequeue_peeked() method instead. Fixes: e13e02a ("net_sched: SFB flow scheduler") Signed-off-by: Victor Nogueria <victor@mojatatu.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260430152957.194015-3-jhs@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit 6d34594 ] Since the start of the git history, brport_store() always acquired the bridge lock. Back then this decision made sense: The bridge lock protects the STP state of the bridge and its ports and at that time the function was only used by two STP related attributes (cost and priority). Nowadays, brport_store() processes a lot more attributes and most of them do not need the bridge lock: * Bridge flags: Only require RTNL. Read locklessly by the data path. Annotations can be added in net-next. * FDB port flushing: Only requires the FDB lock. * Multicast attributes: Only require the multicast lock. * Group forward mask: Only requires RTNL. Read locklessly by the data path. Annotations can be added in net-next. * Backup port: Only requires RTNL. Read locklessly by the data path. This is a problem as the bridge calls dev_set_promiscuity() when certain bridge port flags change and this function can sleep since the commit cited below, resulting in a splat such as [1]. Fix this by reducing the scope of the bridge lock and only take it when processing the two STP related attributes that require it. Remove the now stale comment from br_switchdev_set_port_flag(). The SWITCHDEV_F_DEFER flag can be removed in net-next. [1] BUG: sleeping function called from invalid context at net/core/dev_addr_lists.c:1262 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 372, name: bash preempt_count: 201, expected: 0 RCU nest depth: 0, expected: 0 5 locks held by bash/372: #0: ffff88810c51c3f0 (sb_writers#7){.+.+}-{0:0}, at: ksys_write (fs/read_write.c:740) #1: ffff888115ce9480 (&of->mutex){+.+.}-{4:4}, at: kernfs_fop_write_iter (fs/kernfs/file.c:343) #2: ffff88810b9fd330 (kn->active#37){.+.+}-{0:0}, at: kernfs_fop_write_iter (fs/kernfs/file.c:80 fs/kernfs/file.c:344) #3: ffffffffa59473a0 (rtnl_mutex){+.+.}-{4:4}, at: brport_store (net/bridge/br_sysfs_if.c:326) #4: ffff8881099d2d58 (&br->lock){+...}-{3:3}, at: brport_store (./include/linux/spinlock.h:348 net/bridge/br_sysfs_if.c:345) Preemption disabled at: 0x0 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:94 lib/dump_stack.c:120) __might_resched.cold (kernel/sched/core.c:9163) netif_rx_mode_run (net/core/dev_addr_lists.c:1262) netif_rx_mode_sync (net/core/dev_addr_lists.c:1428) dev_set_promiscuity (net/core/dev_api.c:289) br_manage_promisc (net/bridge/br_if.c:135 net/bridge/br_if.c:172) br_port_flags_change (net/bridge/br_if.c:242 net/bridge/br_if.c:747) store_learning (net/bridge/br_sysfs_if.c:79 net/bridge/br_sysfs_if.c:235) brport_store (net/bridge/br_sysfs_if.c:346) kernfs_fop_write_iter (fs/kernfs/file.c:352) new_sync_write (fs/read_write.c:595) vfs_write (fs/read_write.c:688) ksys_write (fs/read_write.c:740) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121) Fixes: 78cd408 ("net: add missing instance lock to dev_set_promiscuity") Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260526064818.272516-3-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

…equeue_peeked [ Upstream commit 1b9bc71 ] When sfb has children (eg qfq qdisc) whose peek() callback is qdisc_peek_dequeued(), we could get a kernel panic. When the parent of such qdiscs (eg illustrated in patch #3 as tbf) wants to retrieve an skb from its child (sfb in this case), it will do the following: 1a. do a peek() - and when sensing there's an skb the child can offer, then - the child in this case(sfb) calls its child's (qfq) peek. qfq does the right thing and will return the gso_skb queue packet. Note: if there wasnt a gso_skb entry then qfq will store it there. 1b. invoke a dequeue() on the child (sfb). And herein lies the problem. - sfb will call the child's dequeue() which will essentially just try to grab something of qfq's queue. [ 127.594489][ T453] KASAN: null-ptr-deref in range [0x0000000000000048-0x000000000000004f] [ 127.594741][ T453] CPU: 2 UID: 0 PID: 453 Comm: ping Not tainted 7.1.0-rc1-00035-gac961974495b-dirty #793 PREEMPT(full) [ 127.595059][ T453] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 127.595254][ T453] RIP: 0010:qfq_dequeue+0x35c/0x1650 [sch_qfq] [ 127.595461][ T453] Code: 00 fc ff df 80 3c 02 00 0f 85 17 0e 00 00 4c 8d 73 48 48 89 9d b8 02 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f 85 76 0c 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b [ 127.596081][ T453] RSP: 0018:ffff88810e5af440 EFLAGS: 00010216 [ 127.596337][ T453] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: dffffc0000000000 [ 127.596623][ T453] RDX: 0000000000000009 RSI: 0000001880000000 RDI: ffff888104fd82b0 [ 127.596917][ T453] RBP: ffff888104fd8000 R08: ffff888104fd8280 R09: 1ffff110211893a3 [ 127.597165][ T453] R10: 1ffff110211893a6 R11: 1ffff110211893a7 R12: 0000001880000000 [ 127.597404][ T453] R13: ffff888104fd82b8 R14: 0000000000000048 R15: 0000000040000000 [ 127.597644][ T453] FS: 00007fc380cbfc40(0000) GS:ffff88816f2a8000(0000) knlGS:0000000000000000 [ 127.597956][ T453] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 127.598160][ T453] CR2: 00005610aa9890a8 CR3: 000000010369e000 CR4: 0000000000750ef0 [ 127.598390][ T453] PKRU: 55555554 [ 127.598509][ T453] Call Trace: [ 127.598629][ T453] <TASK> [ 127.598718][ T453] ? mark_held_locks+0x40/0x70 [ 127.598890][ T453] ? srso_alias_return_thunk+0x5/0xfbef5 [ 127.599053][ T453] sfb_dequeue+0x88/0x4d0 [ 127.599174][ T453] ? ktime_get+0x137/0x230 [ 127.599328][ T453] ? srso_alias_return_thunk+0x5/0xfbef5 [ 127.599480][ T453] ? qdisc_peek_dequeued+0x7b/0x350 [sch_qfq] [ 127.599670][ T453] ? srso_alias_return_thunk+0x5/0xfbef5 [ 127.599831][ T453] tbf_dequeue+0x6b1/0x1098 [sch_tbf] [ 127.599988][ T453] __qdisc_run+0x169/0x1900 The right thing to do in #1b is to grab the skb off gso_skb queue. This patchset fixes that issue by changing #1b to use qdisc_dequeue_peeked() method instead. Fixes: e13e02a ("net_sched: SFB flow scheduler") Signed-off-by: Victor Nogueria <victor@mojatatu.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260430152957.194015-3-jhs@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit 6d34594 ] Since the start of the git history, brport_store() always acquired the bridge lock. Back then this decision made sense: The bridge lock protects the STP state of the bridge and its ports and at that time the function was only used by two STP related attributes (cost and priority). Nowadays, brport_store() processes a lot more attributes and most of them do not need the bridge lock: * Bridge flags: Only require RTNL. Read locklessly by the data path. Annotations can be added in net-next. * FDB port flushing: Only requires the FDB lock. * Multicast attributes: Only require the multicast lock. * Group forward mask: Only requires RTNL. Read locklessly by the data path. Annotations can be added in net-next. * Backup port: Only requires RTNL. Read locklessly by the data path. This is a problem as the bridge calls dev_set_promiscuity() when certain bridge port flags change and this function can sleep since the commit cited below, resulting in a splat such as [1]. Fix this by reducing the scope of the bridge lock and only take it when processing the two STP related attributes that require it. Remove the now stale comment from br_switchdev_set_port_flag(). The SWITCHDEV_F_DEFER flag can be removed in net-next. [1] BUG: sleeping function called from invalid context at net/core/dev_addr_lists.c:1262 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 372, name: bash preempt_count: 201, expected: 0 RCU nest depth: 0, expected: 0 5 locks held by bash/372: #0: ffff88810c51c3f0 (sb_writers#7){.+.+}-{0:0}, at: ksys_write (fs/read_write.c:740) #1: ffff888115ce9480 (&of->mutex){+.+.}-{4:4}, at: kernfs_fop_write_iter (fs/kernfs/file.c:343) #2: ffff88810b9fd330 (kn->active#37){.+.+}-{0:0}, at: kernfs_fop_write_iter (fs/kernfs/file.c:80 fs/kernfs/file.c:344) #3: ffffffffa59473a0 (rtnl_mutex){+.+.}-{4:4}, at: brport_store (net/bridge/br_sysfs_if.c:326) #4: ffff8881099d2d58 (&br->lock){+...}-{3:3}, at: brport_store (./include/linux/spinlock.h:348 net/bridge/br_sysfs_if.c:345) Preemption disabled at: 0x0 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:94 lib/dump_stack.c:120) __might_resched.cold (kernel/sched/core.c:9163) netif_rx_mode_run (net/core/dev_addr_lists.c:1262) netif_rx_mode_sync (net/core/dev_addr_lists.c:1428) dev_set_promiscuity (net/core/dev_api.c:289) br_manage_promisc (net/bridge/br_if.c:135 net/bridge/br_if.c:172) br_port_flags_change (net/bridge/br_if.c:242 net/bridge/br_if.c:747) store_learning (net/bridge/br_sysfs_if.c:79 net/bridge/br_sysfs_if.c:235) brport_store (net/bridge/br_sysfs_if.c:346) kernfs_fop_write_iter (fs/kernfs/file.c:352) new_sync_write (fs/read_write.c:595) vfs_write (fs/read_write.c:688) ksys_write (fs/read_write.c:740) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121) Fixes: 78cd408 ("net: add missing instance lock to dev_set_promiscuity") Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260526064818.272516-3-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ibmveth: Disable GSO for packets with small MSS#3

ibmveth: Disable GSO for packets with small MSS#3
Ryan-Chen74 wants to merge 1 commit into
SUSE:SLE15-SP7from
Ryan-Chen74:SLE15-SP7

Ryan-Chen74 commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Ryan-Chen74 commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant