Use a repr(C) flow ID to reduce repeated per-packet conversion costs#475
Merged
FelixMcFelix merged 20 commits intomasterfrom May 10, 2024
Merged
Use a repr(C) flow ID to reduce repeated per-packet conversion costs#475FelixMcFelix merged 20 commits intomasterfrom
repr(C) flow ID to reduce repeated per-packet conversion costs#475FelixMcFelix merged 20 commits intomasterfrom
Conversation
`uintptr_t` maybe be easy to stick in a probe, but it is also a footgun. This fixdes the display for e.g. port-process-return, but does not affect the actual panics mentioned in #476.
It appears that JMP'd dtrace probes are being rewritten at load time without a RTN. Near as I can tell, rustc offers no way to say 'don't tail-call optimise this thing', so I'm forcing in a ZST with a drop impl to achieve similar behaviour. Hopefully this suffices.
Collaborator
Author
|
Testing on sn9/14 over a single 100GbE link (
Not sure whether/if we'd expect more improvement with |
FelixMcFelix
commented
Apr 25, 2024
lib/opte/src/engine/flow_table.rs
Outdated
| last_hit: Option<Moment>, | ||
| now: Option<Moment>, | ||
| ) { | ||
| let a = crate::NopDrop; |
Collaborator
Author
There was a problem hiding this comment.
All NopDrop code will be removed once illumos#16480 is merged in and available on the helios-2.0 lab image.
FelixMcFelix
commented
Apr 25, 2024
| } | ||
|
|
||
| #[derive(Debug)] | ||
| // TODO: Represent `name` as joint C+RStr to implement fully. |
Collaborator
Author
There was a problem hiding this comment.
This would fall under the work remaining for #460.
FelixMcFelix
commented
Apr 25, 2024
mkeeter
reviewed
Apr 25, 2024
mkeeter
reviewed
Apr 25, 2024
luqmana
reviewed
Apr 25, 2024
Realised when I was out that this struct only has 2B alignment, so there was a possibility of passing up an unaligned reference to an SDT.
Given that we're now using these to handle nested `Ok(...)` types rather than just error variants, a rename was a bit in order. `DError` might also benefit?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR replaces
InnerFlowIdwith an equivalent C-ABI friendly struct, from which we can directly pass pointers to our dtrace SDTs. This has proven necessary as we are reconstructing this many times per packet in both the UFT slowpath and fastpath.Unfortunately, this has unearthed #476 for which we have a temporary hack,
NopDrop, which uses ablack_box'dDropimplementation to prevent problem SDTs from being tail-called into.Additionally, this makes partial progress on #460 by using the new
DErrormachinery whenever a packet isOk(Modified | Hairpin | Bypass)to elide string formats on layer/port processing. TheErr(_)andOk(Drop{ .. })cases remain open work, but are less common and in those cases we have at least removed a superfluous realloc + memcpy on the formatted string.From local IGB<->IGB results and comparing 4e2ad7e against master in CI, this seems like O(10%) reduction in packet latency spent in XDE. Certainly less of a total improvement once we account for the rest of MAC, and throughput impact on T6s is TBD.
Closes #462.