Skip to content

Disconnect invalid and inactive peers#431

Open
jmozah wants to merge 6 commits intodevelopfrom
fix/p2p-dubious-peers
Open

Disconnect invalid and inactive peers#431
jmozah wants to merge 6 commits intodevelopfrom
fix/p2p-dubious-peers

Conversation

@jmozah
Copy link
Copy Markdown

@jmozah jmozah commented Feb 27, 2023

This PR adds checks to identify and ban peers that pass the P2P handshake and are accepted into the application protocol but has other application-level issues.

  • Some clients have valid caps (opera/63, fsnap/1) but invalid client names such as Efireal, go-corex, Geth etc.
  • Progress message is checked if the Epoch increases for a nominal duration.
  • Application message should be received within a threshold.
  • Recurring Application error now results in banning the peer.

These checks have shown that peers that are valid and working honestly get priority.

Depends on Fantom-foundation/go-ethereum#44

@jmozah jmozah self-assigned this Feb 27, 2023
@jmozah jmozah marked this pull request as ready for review March 16, 2023 11:15
@jmozah jmozah requested a review from andrecronje as a code owner March 16, 2023 11:15
@jmozah jmozah requested a review from uprendis March 16, 2023 11:16
Comment thread gossip/handler.go Outdated
useless = true

// Some clients have compatible caps and thus pass discovery checks and seep in to
// protocol handler. We should band these clients immediately.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: little typo

Comment thread gossip/handler.go Outdated
txChanSize = 4096

// percentage of useless peer nodes to allow
uselessPeerPercentage = 20 // 20%
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Why don't we use just a factor, e.g. 0.2, instead of then having to calculate each time the percentage?

Comment thread gossip/handler.go Outdated

// A useless peer is the one which does not support protocols opera/63 & fsnap/1.
useless := !eligibleForSnap(p.Peer)
if !p.Peer.Info().Network.Trusted && useless && h.peers.UselessNum() >= (h.maxPeers*(uselessPeerPercentage/100)) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: I am not yet familiar with this useless stuff, but why do we even allow a percentage of useless peers at all? Why don't we just disconnect them all?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the peer is useless in the context of sync, i.e. it doesn't support fsnap/1 and opera/63.
But old peers supporting opera/62 should still be allowed to participate.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah so I assume useless then already checked that the peer is a opera/62 peer. It's not just any peer. That would make sense.

Comment thread gossip/handler.go Outdated
return err
// progress and application
progressWatchDogTimer := time.NewTimer(noProgressTime)
applicationWatchDogTimer := time.NewTimer(noAppMessageTime)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't we recreating the timer on each for iteration here? Therefore the Resets later are useless? It looks to me that either we have to create the timers outside of the for loop, and then Reset them as you do now, or recreating them in each loop iteration and just break when we Reset, although this then results in a lot of garbage collected timers? Or am I missing something?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops... the timer should be outside the loop.

Comment thread gossip/handler.go
err := h.handleMsg(p)
if err != nil {
p.Log().Debug("Message handling failed", "err", err)
if strings.Contains(err.Error(), errorToString[ErrPeerNotProgressing]) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use errors.Is here instead of comparing strings?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use errors.Is() only to compare errors. But in this place, the error is defined as a string.
If we want to change it, we should define all the errors as errors.New().

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes agreed. If there are more such string based errors instead of errors.New() based ones (which I believe would be better) - then this should go into a separate PR to address. So up to you if you want to do anything in this PR.

Comment thread gossip/handler.go Outdated
p.SetProgress(progress)
// If peer has not progressed for noProgressTime minutes, then disconnect the peer.
if !p.IsPeerProgressing() {
return errResp(ErrPeerNotProgressing, "%v: %v %v", "epoch is not progressing for ", noProgressTime, "minutes")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: As noProgressTime is a duration, this would print "epoch is not progressing for 3m0s minutes", I think

Comment thread gossip/handler.go
return errResp(ErrInvalidMsgCode, "%v", msg.Code)
}

if msg.Code != ProgressMsg {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not yet familiar with all message codes, but is ProgressMsg the only message which signals that there is progress?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Comment thread gossip/peer.go

func (p *peer) setPeerAsProgressing(x PeerProgress) {
p.progress = x
p.progressTime = time.Now()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any specific reason why p.appMessageTime is locked, but p.progressTime isn't?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's locked in SetProgress() where setPeerAsProgressing() is called.

Comment thread gossip/peer_test.go Outdated
newPeer := getPeer()
ep1 := PeerProgress{Epoch: 1}
newPeer.SetProgress(ep1)
time.Sleep(2 * time.Second) //set the threshold to 2 second
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these Sleep acctumulate to 9 seconds - making test runs 9 seconds slower as I understand. Isn't there a different way to test this? Do we actually even need to sleep?

Copy link
Copy Markdown

@holisticode holisticode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if I should already be the only person approving, but I want to signal that this looks good to me now (at least).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants