Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graceful Shutdown Missing for makerd and DNS Servers on Termination & Errors #322

Open
KnowWhoami opened this issue Dec 12, 2024 · 0 comments · May be fixed by #361
Open

Graceful Shutdown Missing for makerd and DNS Servers on Termination & Errors #322

KnowWhoami opened this issue Dec 12, 2024 · 0 comments · May be fixed by #361
Assignees
Labels
bug Something isn't working enhancement This enhances the code and improves stuffs
Milestone

Comments

@KnowWhoami
Copy link
Collaborator

Description:

Currently our DNS & makerd server works well -> they start with similar fashion as we expect and successfully completes the shutdown procedure whenever they are set to shutdown.
But there are few edge cases left to be handled regarding successfull shutting down of these servers:

Signal Termination :

  • Suppose , our makerd server is running well -> and someone just kill that makerd process i.e SIGTERM or just ctrl+c i.e SIGINT -> so this process will abruptly terminates -> thus it can't undergo shutdown procedure and thus leads to in-memory data.

Error occured while running these servers:

pub fn start_maker_server(maker: Arc<Maker>) -> Result<(), MakerError> {
log::info!("Starting Maker Server");
// Initialize network connections.
let (maker_address, tor_thread) = network_bootstrap(maker.clone())?;
let port = maker.config.port;
let listener =
TcpListener::bind((Ipv4Addr::LOCALHOST, maker.config.port)).map_err(NetError::IO)?;
log::info!(
"[{}] Listening for client conns at: {}",
maker.config.port,
listener.local_addr()?
);
listener.set_nonblocking(true)?; // Needed to not block a thread waiting for incoming connection.
log::info!(
"[{}] Maker Server Address: {}",
maker.config.port,
maker_address
);
let heart_beat_interval = HEART_BEAT_INTERVAL_SECS; // All maker internal threads loops at this frequency.
// Setup the wallet with fidelity bond.
let network = maker.get_wallet().read()?.store.network;
let balance = maker.get_wallet().read()?.balance()?;
log::info!("[{}] Currency Network: {:?}", port, network);
log::info!("[{}] Total Wallet Balance: {:?}", port, balance);
setup_fidelity_bond(&maker, &maker_address)?;
maker.wallet.write()?.refresh_offer_maxsize_cache()?;
// Global server Mutex, to switch on/off p2p network.
let accepting_clients = Arc::new(AtomicBool::new(false));
// Spawn Server threads.
// All thread handles are stored in the thread_pool, which are all joined at server shutdown.
let mut thread_pool = Vec::new();
if !maker.shutdown.load(Relaxed) {
// 1. Bitcoin Core Connection checker thread.
// Ensures that Bitcoin Core connection is live.
// If not, it will block p2p connections until Core works again.
let maker_clone = maker.clone();
let acc_client_clone = accepting_clients.clone();
let conn_check_thread: thread::JoinHandle<Result<(), MakerError>> = thread::Builder::new()
.name("Bitcoin Core Connection Checker Thread".to_string())
.spawn(move || {
log::info!("[{}] Spawning Bitcoin Core connection checker thread", port);
check_connection_with_core(maker_clone, acc_client_clone)
})?;
thread_pool.push(conn_check_thread);
// 2. Idle Client connection checker thread.
// This threads check idelness of peer in live swaps.
// And takes recovery measure if the peer seems to have disappeared in middlle of a swap.
let maker_clone = maker.clone();
let idle_conn_check_thread = thread::Builder::new()
.name("Idle Client Checker Thread".to_string())
.spawn(move || {
log::info!(
"[{}] Spawning Client connection status checker thread",
port
);
check_for_idle_states(maker_clone.clone())
})?;
thread_pool.push(idle_conn_check_thread);
// 3. Watchtower thread.
// This thread checks for broadcasted contract transactions, which usually means violation of the protocol.
// When contract transaction detected in mempool it will attempt recovery.
// This can get triggered even when contracts of adjacent hops are published. Implying the whole swap route is disrupted.
let maker_clone = maker.clone();
let contract_watcher_thread = thread::Builder::new()
.name("Contract Watcher Thread".to_string())
.spawn(move || {
log::info!("[{}] Spawning contract-watcher thread", port);
check_for_broadcasted_contracts(maker_clone.clone())
})?;
thread_pool.push(contract_watcher_thread);
// 4: The RPC server thread.
// User for responding back to `maker-cli` apps.
let maker_clone = maker.clone();
let rpc_thread = thread::Builder::new()
.name("RPC Thread".to_string())
.spawn(move || {
log::info!("[{}] Spawning RPC server thread", port);
start_rpc_server(maker_clone)
})?;
thread_pool.push(rpc_thread);
sleep(Duration::from_secs(heart_beat_interval)); // wait for 1 beat, to complete spawns of all the threads.
maker.is_setup_complete.store(true, Relaxed);
log::info!("[{}] Maker setup is ready", maker.config.port);
}
// The P2P Client connection loop.
// Each client connection will spawn a new handler thread, which is added back in the global thread_pool.
// This loop beats at `maker.config.heart_beat_interval_secs`

  • For eg: we call start_maker_server api to start the makerd process and this will keep running in background untill we set flag the server to shutdown(In normal cases).
  • But there's a other possibility for this: Suppose in this api , after spawning some threads -> we get some error and propagate it back to the function caller -> So there are chances that it creates some in-memory data and it doesn't get saved to disk just because we get some error and thus we skipped that shutdown procedure.

Solution:

  • Implement signal termination logic for handling sudden server's process termination
  • Implement Drop trait for Maker & DirectoryServer structs -> this will ensures that whenever the last instance of these structs goes out of scope -> Their corresponding server must shutdown successfully.
@KnowWhoami KnowWhoami self-assigned this Dec 12, 2024
@KnowWhoami KnowWhoami added bug Something isn't working enhancement This enhances the code and improves stuffs labels Dec 12, 2024
@KnowWhoami KnowWhoami added this to the v0.1.0 milestone Dec 12, 2024
@mojoX911 mojoX911 moved this to todo in core lib Dec 19, 2024
@mojoX911 mojoX911 moved this from todo to WIP in core lib Dec 26, 2024
@KnowWhoami KnowWhoami linked a pull request Jan 2, 2025 that will close this issue
@mojoX911 mojoX911 modified the milestones: v0.1.0, v0.1.1 Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement This enhances the code and improves stuffs
Projects
Status: WIP
Development

Successfully merging a pull request may close this issue.

2 participants