Requesting a System Administrator or Network Engineer to help solve an issue

NoneTaken

Premium
Feedback score
1
Posts
19
Reactions
6
Resources
0
Hey,

We're looking for an experienced system admin or network engineer to help us solve an issue we've been struggling with for a few months. We will try to describe the problem and everything we know about it so far below:

Every few minutes, a player will (seemingly randomly) be kicked from the server. An error can be seen in the logs of our proxy when it occurs,
Code:
[16:42:31] [Netty Worker IO Thread #44/WARN]: [/xxx:54631|ttryy] -> UpstreamBridge - NativeIoException: recvAddress(..) failed: Connection reset by peer 
[16:42:31] [Netty Worker IO Thread #44/INFO]: [xxx] disconnected with: NativeIoException : recvAddress(..) failed: Connection reset by peer
There are no errors on our main production server, nor in the clients' log files. We can confidently rule out this issue being on the clients' side for a few reasons:
1. It happens regardless of the player's region & ping
2. It happens regardless of how long the player has been online
3. It happens to players on all different versions (1.8.x <-> 1.20.x)

This leads us to believe that the cause of the issue is somewhere in our setup, whether that be on the production server, the proxy, or the machine itself. Here is a brief overview of our setup:

Setup:
  • 5 Hub servers, 1 production (Prison) server
  • Flamecord (Java proxy) & Bungeecord with Floodgate (Bedrock/Minehut proxy)
  • 1.8 Server jar
  • Base pterodactyl install
  • UFW for firewall
  • Databases ran within docker
  • Ubuntu 22.04
  • Hosted at OVH

The issue occurs a lot on our Java and (less so) our Minehut proxy, but hasn't occurred once on our Bedrock proxy recently. It's difficult to know whether this is because the issue isn't present there, or whether there haven't been enough players on the Bedrock proxy for it to occur.

Here's a list of what we have tried in the last couple months to fix the issue:

  • Updated all plugins
  • Switched dedicated servers
- From Hetzner to OVH
- Switched DDoS protection
- From TCPShield to Papyrus to Cosmic Guard to now bare OVH
- Changed server jar
- From Vortex Spigot to a custom spigot to bare Paperspigot
- Changed proxy software
- From Flamecord to Waterfall, and from Bungee based to Velocity
- Attempted no proxy at all (?)
- For about a day, we ran with no proxy at all. We don't know for certain whether or not this fixed our issue because we had network compression disabled on the standalone server, so players were having ping issues and being kicked as a result. By the time we realised that was the issue, we had already switched back to a proxy (Velocity). There were no errors in the logs, but it is also possible that the stack trace was just not printed on the standalone server. We don't know. The only pointer we have is that we didn't see a rise in the player count during this time, which we would expect if the kicking issue were fixed.
  • Tried removing unnecessary plugins
  • Tried reverting old gameplay changes we had recently made
  • Reached out to the Via team to confirm it wasn't caused by ViaVersion
  • Tried running MTRs from clients to the server
  • Checking configuration for firewall (no rate limits / bad rules)
- It is plausible we overlooked something wrong here, having this checked again is welcome.
- Ran WireShark on clients (nothing was found)

Given what we know about the issue, we can make some educated guesses about what is likely not responsible. We can rule these out because we have already attempted to switch hosts, software and network provider:
  • Network stability
  • Network bandwidth issue
  • Hardware issue with the dedi
  • Server / proxy software (potentially the setup though)

Safe to say, we are almost out of ideas. These are some wild guesses about what it could be caused by, but we have no idea anymore:
  • Firewall/related
  • Switching MC version
  • Switching Java version
  • Reinstalling our dedi and changing our setup in some way
  • Some database connection issue

If you're interested in trying to solve this issue with us, please leave your discord and any desired compensation below so I can get in contact.

Thanks in advance
 
Type
Requesting
Provided by
Team
Operating system
  1. Ubuntu
Last edited:
PebbleHost
High performance, consistent uptime and fast support. Minecraft hosting that just works.

JasmeowTheCat

Systems Administrator
Supreme
Feedback score
36
Posts
564
Reactions
267
Resources
0
Hey,

We're looking for an experienced system admin or network engineer to help us solve an issue we've been struggling with for a few months. We will try to describe the problem and everything we know about it so far below:

Every few minutes, a player will (seemingly randomly) be kicked from the server. An error can be seen in the logs of our proxy when it occurs,
Code:
[16:42:31] [Netty Worker IO Thread #44/WARN]: [/xxx:54631|ttryy] -> UpstreamBridge - NativeIoException: recvAddress(..) failed: Connection reset by peer
[16:42:31] [Netty Worker IO Thread #44/INFO]: [xxx] disconnected with: NativeIoException : recvAddress(..) failed: Connection reset by peer
There are no errors on our main production server, nor in the clients' log files. We can confidently rule out this issue being on the clients' side for a few reasons:
1. It happens regardless of the player's region & ping
2. It happens regardless of how long the player has been online
3. It happens to players on all different versions (1.8.x <-> 1.20.x)

This leads us to believe that the cause of the issue is somewhere in our setup, whether that be on the production server, the proxy, or the machine itself. Here is a brief overview of our setup:

Setup:
  • 5 Hub servers, 1 production (Prison) server
  • Flamecord (Java proxy) & Bungeecord with Floodgate (Bedrock/Minehut proxy)
  • 1.8 Server jar
  • Base pterodactyl install
  • UFW for firewall
  • Databases ran within docker
  • Ubuntu 22.04
  • Hosted at OVH

The issue occurs a lot on our Java and (less so) our Minehut proxy, but hasn't occurred once on our Bedrock proxy recently. It's difficult to know whether this is because the issue isn't present there, or whether there haven't been enough players on the Bedrock proxy for it to occur.

Here's a list of what we have tried in the last couple months to fix the issue:

  • Updated all plugins
  • Switched dedicated servers
- From Hetzner to OVH
- Switched DDoS protection
- From TCPShield to Papyrus to Cosmic Guard to now bare OVH
- Changed server jar
- From Vortex Spigot to a custom spigot to bare Paperspigot
- Changed proxy software
- From Flamecord to Waterfall, and from Bungee based to Velocity
- Attempted no proxy at all (?)
- For about a day, we ran with no proxy at all. We don't know for certain whether or not this fixed our issue because we had network compression disabled on the standalone server, so players were having ping issues and being kicked as a result. By the time we realised that was the issue, we had already switched back to a proxy (Velocity). There were no errors in the logs, but it is also possible that the stack trace was just not printed on the standalone server. We don't know. The only pointer we have is that we didn't see a rise in the player count during this time, which we would expect if the kicking issue were fixed.
  • Tried removing unnecessary plugins
  • Tried reverting old gameplay changes we had recently made
  • Reached out to the Via team to confirm it wasn't caused by ViaVersion
  • Tried running MTRs from clients to the server
  • Checking configuration for firewall (no rate limits / bad rules)
- It is plausible we overlooked something wrong here, having this checked again is welcome.
- Ran WireShark on clients (nothing was found)

Given what we know about the issue, we can make some educated guesses about what is likely not responsible. We can rule these out because we have already attempted to switch hosts, software and network provider:
  • Network stability
  • Network bandwidth issue
  • Hardware issue with the dedi
  • Server / proxy software (potentially the setup though)

Safe to say, we are almost out of ideas. These are some wild guesses about what it could be caused by, but we have no idea anymore:
  • Firewall/related
  • Switching MC version
  • Switching Java version
  • Reinstalling our dedi and changing our setup in some way
  • Some database connection issue

If you're interested in trying to solve this issue with us, please leave your discord and any desired compensation below so I can get in contact.

Thanks in advance
Please contact Shanny to respond to his chat. We are discussing this there. Feel free to DM me on Discord also and I will add you.
 

NoneTaken

Premium
Feedback score
1
Posts
19
Reactions
6
Resources
0
Please contact Shanny to respond to his chat. We are discussing this there. Feel free to DM me on Discord also and I will add you.
I will ask to be added to the ticket, we are making this post in addition so we can reach the maximum amount of people as possible as we need this fixed as soon as possible.
 

Mechanic

Configurator
Premium
Feedback score
1
Posts
178
Reactions
23
Resources
1
What Java version are you running? Had this happen before with Java 11 and switched to 8 and it solved the issue.
 

NoneTaken

Premium
Feedback score
1
Posts
19
Reactions
6
Resources
0
Bumping, issue is still present - looking for someone else to help.

Additional info: We have switched our hub servers to 1.20, didn't fix the issue on them.
 
Top