Skip to content

feat: Rate limiting in router#3951

Open
rklaehn wants to merge 15 commits intomainfrom
rate-limiting-in-router
Open

feat: Rate limiting in router#3951
rklaehn wants to merge 15 commits intomainfrom
rate-limiting-in-router

Conversation

@rklaehn
Copy link
Copy Markdown
Contributor

@rklaehn rklaehn commented Feb 17, 2026

Description

Add an optional mechanism for rate limiting and requesting requests in the iroh router.

The reason to have something like this is if we ever have a single iroh endpoint exposed to the world that can be overloaded or hit by a DOS attack. E.g. a public irpc service, n0des, ...

Unfortunately this has quite some API surface: a trait for the callbacks, and enums for the fn returns.

I also added an example that hammers a server with lots of clients and measures the CPU load. Here is the output of the latest version:

Simplified the sample. The hammering tool now lives in a separate repo, since it got a bit too big for an example.

Connection filter benchmark, 100 attempts per filter

Direct connections:

             no filter:   0 accepted,   0 rejected, 100 closed  (client: 40.63ms | server: 97.30ms cpu,     1028 ops/s, 41.00ms wall)
           ignore addr:   0 accepted, 100 rejected,   0 closed  (client:   5.01s | server:  3.46ms cpu,    28868 ops/s, 10.53ms wall)
           reject addr:   0 accepted, 100 rejected,   0 closed  (client:  8.04ms | server:  5.61ms cpu,    17835 ops/s, 10.08ms wall)
        retry + reject:   0 accepted, 100 rejected,   0 closed  (client: 37.18ms | server: 12.77ms cpu,     7833 ops/s, 39.96ms wall)
           reject alpn:   0 accepted,   0 rejected, 100 closed  (client: 31.34ms | server: 75.25ms cpu,     1329 ops/s, 33.84ms wall)

Relay connections (https://127.0.0.1:53395/):

             no filter:   0 accepted,   0 rejected, 100 closed  (client:   4.10s | server: 174.75ms cpu,      572 ops/s,   4.09s wall)
    reject endpoint id:   0 accepted, 100 rejected,   0 closed  (client:   4.04s | server: 21.50ms cpu,     4652 ops/s,   4.04s wall)
           reject alpn:   0 accepted,   0 rejected, 100 closed  (client:   4.08s | server: 88.07ms cpu,     1135 ops/s,   4.08s wall)

ignore is fastest, reject just a tad slower. reject alpn doesn't buy much over just closing the connection. It is just a convenient place to throttle or reject by alpn.

For relay connections, the cheapest is to reject by endpoint id. This is quite a bit faster than closing the connection since we get the endpoint id early. Rejecting by alpn in this case helps a bit more.

The example starts a server in a subprocess and then hammers it with n connections. The subprocess is measuring its own CPU time.

Breaking Changes

None

Notes & open questions

Change checklist

  • Self-review.
  • Documentation updates following the style guide, if relevant.
  • Tests if relevant.
  • All breaking changes documented.
    • List all breaking changes in the above "Breaking Changes" section.
    • Open an issue or PR on any number0 repos that are affected by this breaking change. Give guidance on how the updates should be handled or do the actual updates themselves. The major ones are:

@github-actions
Copy link
Copy Markdown

github-actions bot commented Feb 17, 2026

Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/3951/docs/iroh/

Last updated: 2026-04-13T15:16:27Z

@n0bot n0bot bot added this to iroh Feb 17, 2026
@github-project-automation github-project-automation bot moved this to 🚑 Needs Triage in iroh Feb 17, 2026
@rklaehn rklaehn self-assigned this Feb 18, 2026
@rklaehn rklaehn requested a review from Arqu February 18, 2026 10:34
@github-actions
Copy link
Copy Markdown

github-actions bot commented Feb 19, 2026

Netsim report & logs for this PR have been generated and is available at: LOGS
This report will remain available for 3 days.

Last updated for commit: c4e751d

);

// ALPN filter: runs after the connection is established for both
// relay and direct connections.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the EndpointId filter should be run here for IncomingAddr::Ip now that the endpoint id is known for ip connections as well? If not, it should be clearly documented that the endpoint id filter is not run for ip connections. I think it would be simpler if we'd run it here, so that you can impl just that filter if all you want is to filter by endpoint id.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just naming? Call the other relay filter?

This is the one that is going to be called for every request, the one that takes just endpoint id is the fastest way to intercept a relay request.

You can of course use this one to filter or rate limit by endpoint id.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it just now occurs to me that maybe all this filtering could be done through EndpointHooks with some new hooks instead of adding it as a new concept to the Router?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we need more hooks. I am using Irpc-iroh and I need a hook to validate JWT token on each request.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it just now occurs to me that maybe all this filtering could be done through EndpointHooks with some new hooks instead of adding it as a new concept to the Router?

I will take a look, but I am not sure about this. We already have all the stages of the state machine exposed with the Incoming -> Accepting -> Connection, with various ways to retry and reject. A state machine is vastly superior to a hook if you want maximum flexibility. And if you want something predefined and more opinionated, that's the router.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am using irpc-iroh with router. I want to verify/decode a JWT token on each method request (not connection request) and pass that data(user ID, endpoint Id, user role etc) to RPC method for request authorization. Is this somehow possible?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the JWT token comes with each request, or on a separate APLN?

The latter is already possible. You could have an auth ALPN where you just send the JWT token, then have a map from endpoint id to data that you share between the auth ALPN and the service ALPN.

But I am not sure what exactly you want to do.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to authorize each RPC method request and pass decoded user info(id, role) to rpc method, So the JWT token comes with each request. Is there any better method?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is preferable to do this separately. Otherwise you will have to extend every single protocol with auth.

Basically, you have a server side state that keeps track of endpoints and their data. Then you do a request using ALPN "auth" or whatever that includes the JWT token. At this point you update the shared state and store the JWT data for this endpoint (indefinitely or for some time).

Then in the actual RPC handler you don't need any extra bytes per request. You just check if the request is allowed as soon as you have the endpoint id, otherwise fail.

This is safe because the endpoint id can not be forged unless you have exposed the private key or Ed25519 is broken.

If you store the credentials only for a limited time, you have to have logic on the client side to re-send a token either proactively or when a RPC request fails.

But this is a matter of taste really. I would find having a JWT on each possibly tiny RPC request not just cumbersome but also inefficient.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think separate ALPN solution will not work for my use case. My client app is a full stack web app accessible on local LAN. Users will login with their own user name/password by using the same client endpoint. Irpc-Iroh server has a login method for user auth. Login method sends a JWT token which will be included in every request. so I need a per request hook/middleware.

@dignifiedquire dignifiedquire moved this from 🚑 Needs Triage to 🏗 In progress in iroh Feb 23, 2026
@rklaehn
Copy link
Copy Markdown
Contributor Author

rklaehn commented Feb 24, 2026

https://github.com/n0-computer/quinn would add another place to filter incoming packets by ALPN

Copy link
Copy Markdown
Collaborator

@Arqu Arqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a solid improvement over the current state. I do agree that given we're working against adverse conditions here, we should maybe have a check/guard against attempts of Relay like conns via IP and inverse.

@rklaehn rklaehn force-pushed the rate-limiting-in-router branch from d1f2aee to 4cf66a9 Compare February 25, 2026 08:26
@rklaehn rklaehn force-pushed the rate-limiting-in-router branch from 0a4e3fa to 8fc9301 Compare February 27, 2026 10:13
@rklaehn rklaehn force-pushed the rate-limiting-in-router branch from d357c42 to a8188b8 Compare April 10, 2026 13:41
@rklaehn rklaehn force-pushed the rate-limiting-in-router branch from a8188b8 to ac4fd73 Compare April 10, 2026 14:00
@rklaehn rklaehn force-pushed the rate-limiting-in-router branch 2 times, most recently from d092a6b to 5ef38a9 Compare April 13, 2026 10:08
@rklaehn rklaehn marked this pull request as ready for review April 13, 2026 10:11
@rklaehn rklaehn force-pushed the rate-limiting-in-router branch from 5ef38a9 to b2b9757 Compare April 13, 2026 10:57
Copy link
Copy Markdown
Contributor

@flub flub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only reviewed the public api, not the implementation. But that looks nice to me and makes sense. I even love the docs on them!

Copy link
Copy Markdown
Member

@Frando Frando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general.

I'm wondering a bit still if this should be an endpoint hook. Or, how we intend to evolve both APIs (endpoint hooks and router hooks) - we have a similar "can be done on both levels" with access control.

But this does not have to hold up the PR. We still have some time to settle until 1.0, and I also don't have a strong opinion in either direction.

/// "validation" itself has no security meaning. However, the retry
/// still imposes a real cost on the client: an extra round trip
/// through the relay plus the work of sending a fresh ClientHello
/// with the token. This filters out spray-and-pray clients that
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's a "spray and pray client"?

/// Two direct endpoints with a filtered router on the first.
///
/// Binds to IPv4 loopback only so retry-token validation works on
/// multi-homed CI hosts (tokens are tied to the source address).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean retrying is broken when there's multipe interfaces?
If so that should be prominently documented, because this can not only happen in CI but also for regular endpoints, right?
If I have an EndpointAddr with two IPs that are both reachable, and the remote asks for address validation retry, and it doesn't work, this is bad. It basically would mean that you cannot use address validation reliably if your server is reachable over two IP addresses?
Maybe we should add a patchbay test to verify current behavior, and/or document the limitation clearly. I also wonder, though, if we shouldn't fix this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, retrying under some circumstances is broken. I had this happen in CI and had to make sure I only subscribe to a single interface:

https://discord.com/channels/949724860232392765/950683937661935667/1493210239116251157

If we would fix it it should not hold up this PR, since it only exposes existing functionality in the iroh Endpoint at the ProtocolHandler level.

Copy link
Copy Markdown
Member

@Frando Frando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes for the debug_assert (needs to be documented or removed IMO) and the unused governor dep

modify some tests, remove unused governor dependency.
@rklaehn rklaehn force-pushed the rate-limiting-in-router branch from 3e4e913 to 12ea8ed Compare April 13, 2026 14:46
@rklaehn
Copy link
Copy Markdown
Contributor Author

rklaehn commented Apr 13, 2026

I'm wondering a bit still if this should be an endpoint hook. Or, how we intend to evolve both APIs (endpoint hooks and router hooks) - we have a similar "can be done on both levels" with access control.

It is already more hook like than before, but I think it is fine to have this only on the Router. If you use the endpoint state machine manually you can already do all of this without a hook.

But this does not have to hold up the PR. We still have some time to settle until 1.0, and I also don't have a strong opinion in either direction.

@rklaehn rklaehn requested a review from Frando April 13, 2026 15:13
);
}
if let Err(err) = incoming.retry() {
err.into_incoming().refuse();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we log this err too?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we should definitely log this error, otherwise real users not following the protocol to the dot won't know why they just keep getting refused.

};

if let Some(filter) = &incoming_filter {
match filter(&incoming) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if this errors, will it break the entire loop?

Edit: thinking more about it, given we're working against adversarial setups, somebody trying to fumble this can just fuzz against the filter until it finds a panic and just kill the entire accept loop, maybe turning the whole thing into not just a flood attack but outright silent nuke.

continue;
}
IncomingFilterOutcome::Ignore => {
incoming.ignore();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any lingering state from these ignored ones on the noq side? Otherwise this might lead to piling up maliciously or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 🏗 In progress

Development

Successfully merging this pull request may close these issues.

6 participants