Add null checks for iface before accessing index in mpnh_compare_node by matteyeux · Pull Request #117 · projectcalico/bird

matteyeux · 2026-01-26T13:39:38Z

Description

This PR fixes #116, which is a segfault in mpnh_compare_node by validating that x->iface and y->iface are not null before accessing their index field.

Null interfaces are treated as greater than non-null ones, consistent with the existing null handling pattern for x and y nodes.

Tested on my machine that was crashing. After replacing the binary in the running pod by the patched one from my branch, there is no more segfault (it was crashing every second in my pod)

CLAassistant · 2026-01-26T13:39:45Z

All committers have signed the CLA.

nest/rt-attr.c

nelljerram · 2026-01-27T17:38:55Z

/sem-approve

nelljerram · 2026-01-27T17:40:37Z

Looks like we also need to update the Semaphore machine type. I'll do that in a separate PR.

nelljerram · 2026-01-28T15:28:29Z

@matteyeux Please merge master in order to pick up CI fixes from #118

matteyeux · 2026-01-28T15:48:45Z

@nelljerram master seems out of date, do you mean feature-ipinip ? It seems to be the main branch

…pinip

nelljerram · 2026-01-28T16:07:34Z

I'm so sorry. Yes, I meant feature-ipinip.

nelljerram · 2026-01-28T16:13:32Z

/sem-approve

matteyeux · 2026-01-28T16:19:10Z

Thanks, CI is green

nelljerram · 2026-01-28T16:26:29Z

@matteyeux I'm sorry to ask another question - but do you know if there is something unusual about your clusters that hits this problem? I've just checked the upstream BIRD code - even the latest 3.2.0 release - and it does not appear to contain a fix like this, which feels very surprising given how widely used BIRD is. Have you also proposed this fix to upstream? It's possible that in the nearish future we might revert to using upstream BIRD instead of our own fork, because I believe we now have different solutions for the scenarios (mainly IP-in-IP) that prompted us to fork originally. Hence it would make sense to propose this fix upstream as well, to eliminate the possibility of your own clusters regressing when we switch to using upstream.

nelljerram · 2026-01-28T16:29:29Z

FTR, in more recent BIRD code mpnh_compare_node has become nexthop_compare_node.

matteyeux · 2026-01-28T16:49:50Z

I do not know what is wrong with my cluster to have such segfaults. I have some nodes that have these segfault and some others don't (some are k3s workers some other are k3s masters), I also tested on virtual machine (in the same network as the other nodes) and same behaviour.

Disabling IPv6 on my nodes fixed the segfault, but I wanted to also address the segfault.

Thanks for pointing to the new name of the function, I will propose a patch to the upstream BIRD project.

nelljerram · 2026-02-12T14:47:04Z

Many thanks @matteyeux . Given that we're expecting to retire this fork in the next year, we've decided to transition to an "upstream first" policy in advance of that - so please do let us know when this fix has been accepted upstream and then we'll pick it here too.

marenamat · 2026-02-17T16:50:56Z

Hello, Maria from BIRD here, we don't have this fix basically because it makes no sense to us to have a nexthop without an interface. If such a thing happens, the nexthop is already semantically invalid.

Or I'm wrong, and then please tell me how such a nexthop is expected to be handled by Netlink.

nelljerram · 2026-02-20T09:28:42Z

Thanks @marenamat for your take on this. I agree that it's important first to understand why this occurs, before applying an apparent fix.

@matteyeux If this segfault is easily reproducible in your setup, I think you will need to add more logging to the code to make progress on understanding that.

matteyeux · 2026-02-23T10:40:08Z

Hello,

My first intend was to avoid the NULL-deref, as I am not sure why/how this happens. I'll continue to investigate to get the full picture of this.

marenamat · 2026-02-23T12:02:32Z

Your original patch may help you show the offending route table contents, anyway. I would suggest checking the code which generates the nexthop for that offending route. There may be something like wrongly marked route destination, or an omission in some code branch where the nexthop would be otherwise set.

It may also be a deeper problem of some invariant violation, causing this one. I have not studied your data structures, and thus I can't point at the exact location.

Add null checks for iface before accessing index in mpnh_compare_node

1887a21

marvin-tigera added release-note-required docs-pr-required labels Jan 26, 2026

nelljerram reviewed Jan 27, 2026

View reviewed changes

nest/rt-attr.c Show resolved Hide resolved

Merge remote-tracking branch 'upstream/feature-ipinip' into feature-i…

35418d8

…pinip

Conversation

matteyeux commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

CLAassistant commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

nelljerram commented Jan 27, 2026

Uh oh!

nelljerram commented Jan 27, 2026

Uh oh!

nelljerram commented Jan 28, 2026

Uh oh!

matteyeux commented Jan 28, 2026

Uh oh!

nelljerram commented Jan 28, 2026

Uh oh!

nelljerram commented Jan 28, 2026

Uh oh!

matteyeux commented Jan 28, 2026

Uh oh!

nelljerram commented Jan 28, 2026

Uh oh!

nelljerram commented Jan 28, 2026

Uh oh!

matteyeux commented Jan 28, 2026

Uh oh!

nelljerram commented Feb 12, 2026

Uh oh!

marenamat commented Feb 17, 2026

Uh oh!

nelljerram commented Feb 20, 2026

Uh oh!

matteyeux commented Feb 23, 2026

Uh oh!

marenamat commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

matteyeux commented Jan 26, 2026 •

edited

Loading

CLAassistant commented Jan 26, 2026 •

edited

Loading