Skip to content

multiple remote addresses in stats #4

@timdrysdale

Description

@timdrysdale

The problem: in the stats topic, remote addresses are being reported with multiple IPs in the new dev server instance at app.practable.io/dev running on GCE behind a load balancer (this also affects relay), e.g. pend18 returns

    "remote_address": "129.215.182.72, 34.117.155.39, 35.191.16.24",

Why is this happening?

In crossbar.go we extract the remote address from a header

remoteAddr:     r.Header.Get("X-Forwarded-For"),

The three IP addresses we got in the example above are
129.215.182.72 is w7545.see.ed.ac.uk, the address we want
34.117.155.39 belongs to google (domain: 39.155.117.34.bc.googleusercontent.com)
35.191.16.24 belongs to google (domain: 24-16-191-35.1e100.net)

The two google addresses are added to the header by the load balancer setup.

This is consistent with the expected behaviour where the order of the IP addresses is specified as client first, then each successive proxy.

X-Forwarded-For: <client>, <proxy1>, <proxy2>

The combinations of proxy addresses present are different for different streams sent from the same experiment

relay stats

   "topic": "test00-st-data",
<snip>
    "remoteAddr": "92.239.205.252, 34.117.155.39, 35.191.12.183",
<snip>
    "topic": "test00-st-video",
<snip>
    "remoteAddr": "92.239.205.252, 34.117.155.39, 35.191.19.106",

jump stats

    "remote_address": "92.239.205.252, 34.117.155.39, 35.191.19.101",
<snip>
    "topic": "test00",

If we select only the left-most IP address, without regard for the number of proxies included, the client address could be spoofed by additional untrusted proxies.
If we select only the right-most trusted IP address (the 3rd from right IP), then we could be selecting a proxy that would be the same for multiple experiments and multiple clients, e.g. an institutional proxy, and so we would be unable to discern an experiment from that institution versus a user (potentially), leading to incorrect status information (e.g, thinking a viable client stream is in fact a viable experiment stream, when the experiment is down, leading to a false positive, or vice versa)
We'd also need to configure the service with information about the number of load balancers -doable, but ideally avoided.

Either way, it seems we cannot use the circumstantial approach currently proposed in status to identify experiment stream connects from user stream connections using the IP address of the connection to the host's jump topic (the one without the slash).

Experiment vs user cannot be inferred from the read/write privileges because these can be true for both experiments and users.

Instead, it seems that we should include information on whether connection is from experiment or user in the trusted JWT token for the connection.

The jump JWT already includes client/host scopes but these are not reported in stats (only the read/write)

Proposed solution: report client/host scope in stats.

[edit: fix bold formatting]

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestinvalidThis doesn't seem right

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions