Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load active peers from previous session saved into file. #2230

Merged
merged 3 commits into from
Apr 10, 2024

Conversation

asoto-iov
Copy link
Contributor

A client node would be able to re-connect to previously connected peers on restart.

Description

Persist discovered peers across node restarts

Motivation and Context

As of now, RSKj node keeps discovered peers in a so-called distance table in memory, so that after node restart those are being lost. Next time that the node starts over it has to start peer discovery process from scratch by having only list of boot nodes.

To improve, and most importantly speed the process up, we want to preserve a list of already discovered peers on a disk.

Note: it looks like we could make use of the org.ethereum.util.MapSnapshot<> class for that purpose.

Expected behaviour:

when node stops, it should save all discovered peers to a file

when node starts, it should check if the file with discovered peers exists in a database directory, and if so, load peer list from there. Otherwise, do nothing and proceed as before

Notes:

this functionality is part of the peer discovery protocol (see the PeerExplorer class)

the node should continue using bootstrap nodes the same way as it does now

How Has This Been Tested?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • Tests for the changes have been added (for bug fixes / features)
  • Requires Activation Code (Hard Fork)
  • Other information:

@asoto-iov asoto-iov force-pushed the reconect_previous_peers branch from 8be6a43 to a1e85d5 Compare January 18, 2024 12:01
Copy link
Contributor

@fmacleal fmacleal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job!

I just have some few recommendations. Let me know what you think about them.

@asoto-iov asoto-iov force-pushed the reconect_previous_peers branch from 666a418 to 87a3e6a Compare February 5, 2024 17:14
fmacleal
fmacleal previously approved these changes Feb 6, 2024
Copy link
Contributor

@fmacleal fmacleal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! I see that all recommendations were handled, I am approving it. Well done! :)

@@ -145,7 +145,11 @@ public synchronized void stop() {
logger.info("Shutting down RSK node");

for (int i = internalServices.size() - 1; i >= 0; i--) {
internalServices.get(i).stop();
try {
internalServices.get(i).stop();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you elaborate a bit more on why you added this try-catch? any specific need?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I see - so this is not required by the feature itself

I guess it's a tradeoff: non-properly finished service may sometimes result to data inconsistency in next services. We should be very careful with such changes in shutdown process

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(previous deleted reply) If the stop of any of the services fails throwing an exception the execution would stop and the missing services won't stop nicely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I see - so this is not required by the feature itself

I guess it's a tradeoff: non-properly finished service may sometimes result to data inconsistency in next services. We should be very careful with such changes in shutdown process

Huh if it could happen I think it is something we should fix. A wrong/different ending of one service shouldn't affect to the others

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if that happens then there's a bug in the code which has to be fixed. We don't expect service.stop() to throw any checked exception. Only unchecked exceptions can happen here (eg. RuntimeException), and if that happens we should propagate it call stack up. Not sure it's a good idea to catch an exception and continue stopping other services as node is already in inconsistent state - best we can do here is to catch it at upper level and log it. So I'd rather not to add this try-catch in here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok got it, I'll remove it. But In my opinion the services should be independent and in case there are dependencies we could create inner groups and close them together. Maybe is something we can try to do in the future.
As an example, in the case of this service. If the service throws an error due to the file persist or any other thing processing an unexpected input data or so, the others services shouldn't be affected and should be stopped properly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that was my point. If a service for any reason throws an unchecked exception, which means that the node most probably cannot recover from that state. If the service can recover from an exception that was thrown in the stop() method, then it should not be propagated up but rather handled in it.

@@ -61,6 +61,7 @@ public class RskSystemProperties extends SystemProperties {
private static final int CHUNK_SIZE = 192;

public static final String PROPERTY_SYNC_TOP_BEST = "sync.topBest";
public static final String USE_PEERS_FROM_LAST_SESSION = "peer.usePeersFromLastSession";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about keeping it under peer.discovery config section?

@Vovchyk
Copy link
Contributor

Vovchyk commented Feb 15, 2024

pipeline:run

rskj-core/src/main/java/co/rsk/RskContext.java Outdated Show resolved Hide resolved
rskj-core/src/main/java/co/rsk/RskContext.java Outdated Show resolved Hide resolved
rskj-core/src/main/resources/expected.conf Outdated Show resolved Hide resolved

import static org.junit.jupiter.api.Assertions.assertEquals;

class SimpleFileWriterTest {

Check notice

Code scanning / CodeQL

Unused classes and interfaces Note test

Unused class: SimpleFileWriterTest is not referenced within this codebase. If not used as an external API it should be removed.
}

public void savePropertiesIntoFile(Properties properties, Path filePath) throws IOException {
File tempFile = File.createTempFile(filePath.toString(), TMP);

Check warning

Code scanning / CodeQL

Local information disclosure in a temporary directory Medium

Local information disclosure vulnerability due to use of file readable by other local users.
}
public void saveDataIntoFile(String data, Path filePath) throws IOException {

File tempFile = File.createTempFile(filePath.toString(), TMP);

Check warning

Code scanning / CodeQL

Local information disclosure in a temporary directory Medium

Local information disclosure vulnerability due to use of file readable by other local users.
Copy link

Copy link
Contributor

@rmoreliovlabs rmoreliovlabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@asoto-iov asoto-iov force-pushed the reconect_previous_peers branch from 3d37038 to 8e827eb Compare April 8, 2024 08:48
…ed into peerExplorer

Updating known peers origin, adding tests and small fixes
Removing unnecesary log to avoid log injection
Load active peers from previous session saved into file.
@asoto-iov asoto-iov force-pushed the reconect_previous_peers branch from 8e827eb to aa6bce5 Compare April 9, 2024 12:36
Copy link

sonarqubecloud bot commented Apr 9, 2024

@Vovchyk Vovchyk merged commit 12c9b32 into master Apr 10, 2024
10 checks passed
@Vovchyk Vovchyk deleted the reconect_previous_peers branch April 10, 2024 09:42
@aeidelman aeidelman added this to the Arrowhead 6.2.0 milestone May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants