Skip to content

Commit

Permalink
[WIP] member_registry is ready
Browse files Browse the repository at this point in the history
  • Loading branch information
songweijia committed Feb 8, 2024
2 parents f9526b3 + ddebab0 commit d9a10bb
Show file tree
Hide file tree
Showing 19 changed files with 367 additions and 265 deletions.
22 changes: 9 additions & 13 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ set(ignoreMe "${CMAKE_WARN_DEPRECATED}")

set(CMAKE_MODULE_PATH "${CMAKE_SOURCE_DIR}/cmake/Modules")

set(ENABLE_LEADER_REGISTRY 1)
set(ENABLE_MEMBER_REGISTRY 1)

include(GNUInstallDirs)

Expand Down Expand Up @@ -63,7 +63,7 @@ find_package(OpenSSL 1.1.1 REQUIRED)
# Target: nlohmann_json::nlohmann_json
find_package(nlohmann_json 3.9.0 REQUIRED)

if(${ENABLE_LEADER_REGISTRY})
if(${ENABLE_MEMBER_REGISTRY})
# provides the import target rpclib::rpc
find_package(rpclib 2.3.0 REQUIRED)
endif()
Expand Down Expand Up @@ -94,24 +94,20 @@ add_library(derecho SHARED
$<TARGET_OBJECTS:tcp>
$<TARGET_OBJECTS:persistent>
$<TARGET_OBJECTS:openssl_wrapper>)
if (${USE_VERBS_API})
target_link_libraries(derecho
rdmacm ibverbs rt pthread atomic stdc++fs
spdlog::spdlog
${libfabric_LIBRARIES}
mutils::mutils
mutils::mutils-containers
OpenSSL::Crypto
nlohmann_json::nlohmann_json)
else()
target_link_libraries(derecho
rt pthread atomic stdc++fs
spdlog::spdlog
${libfabric_LIBRARIES}
mutils::mutils
mutils::mutils-containers
OpenSSL::Crypto
nlohmann_json::nlohmann_json)
if (${USE_VERBS_API})
target_link_libraries(derecho rdmacm ibverbs)
else()
target_link_libraries(derecho ${libfabric_LIBRARIES})
endif()
if (${ENABLE_MEMBER_REGISTRY})
target_link_libraries(derecho rpclib::rpc)
endif()
set_target_properties(derecho PROPERTIES
SOVERSION ${derecho_VERSION}
Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ Derecho is aimed at supporting what are called "cloud micro-services", meaning p

The functionality of Derecho centers on:
* Forming a "group" of servers ("processes"). Membership is automatically tracked and changes are reported via upcalls to your code.
* Structuring your server group into subgroups. You would typically have one subgroup for each distinct functionality used. For example, suppose you were implementing a storage service similar to Apache's Hadoop file system (HDFS). HDFS has a name-node service, a meta-data service and a storage service. In Derecho, each would be implemented by a subgroup. Subgroups can have distinct or overlapping membership, under your control. For scalability, a subgroup would typically be sharded into even smaller subgroups. For example, a storage service might have an "array" of mini-storage servers, each handling a subset of the files. There would be one shard per subset, and the members of that shard would replicate the identical contents. Unlike in Paxos-based systems you may have read about elsewhere, Derecho supports a full replication model: every active (non-failed) replica has identical, complete contents.
* Structuring your server group into subgroups. You would typically have one subgroup for each distinct functionality used. For example, suppose you were implementing a storage service similar to Apache's Hadoop file system (HDFS). HDFS has a name-node service, a meta-data service and a storage service. In Derecho, each would be implemented by a subgroup. Subgroups can have distinct or overlapping membership, under your control. For scalability, a subgroup would typically be sharded into even smaller subgroups. For example, a storage service might have an "array" of mini-storage servers, each handling a subset of the files. There would be one shard per subset, and the members of that shard would replicate the identical contents. Unlike in Paxos-based systems you may have read about elsewhere, Derecho supports a full replication model: every active (non-failed) replica has identical, complete contents.
* Automated repair after crashes, recoveries, and automated initialization (from a checkpoint) when a new member joins.
* Automated persisted storage (append-only logging) if desired.

Expand All @@ -52,12 +52,12 @@ Derecho does not have any specific O/S dependency. We've tested most extensivel
This project is organized as a standard CMake-built C++ library: all headers are within the include/derecho/ directory, all CPP files are within the src/ directory, and each subdirectory within src/ contains its own CMakeLists.txt that is included by the root directory's CMakeLists.txt. Within the src/ and include/derecho/ directories, there is a subdirectory for each separate module of Derecho, such as RDMC, SST, and Persistence. Some sample applications and test cases that are built on the Derecho library are included in the src/applications/ directory, which is not included when building the Derecho library itself.

## Installation
Derecho is a library that helps you build replicated, fault-tolerant services in a datacenter with RDMA networking. Here's how to start using it in your projects.
Derecho is a library that helps you build replicated, fault-tolerant services in a datacenter with RDMA networking. Here's how to start using it in your projects.
* You will start by verifying that you have a compatible operating system (we do our development on Ubuntu but CentOS should be fine) and network (we recommend RDMA or TCP).
* Next, decide if you prefer to use a pre-compiled container, which is easier, or would like to build from source.
* If building from source, you will clone the code base in the place you plan to create the release binaries, then follow the instructions for creating a folder in which the binaries will reside. Then from "prerequisites" run the .sh shell scripts one by one, for example "sudo ./install-foo.sh". These should complete without error messages -- if you get warnings or errors, stop and post a question about it on the discussions page.
* Next, if building from source, follow the remainder of the "installing Derecho" instructions.
* Last, clone and build the Cascade code base.
* Last, clone and build the Cascade code base.
* You should now be able to set up a configuration file and run our demos.

### Network requirements (important!)
Expand Down Expand Up @@ -140,7 +140,7 @@ To use Derecho in your code, you simply need to
The configuration file consists of three sections: **DERECHO**, **RDMA**, and **PERS**. The **DERECHO** section includes core configuration options for a Derecho instance, which every application will need to customize. The **RDMA** section includes options for RDMA hardware specifications. The **PERS** section allows you to customize the persistent layer's behavior.

#### Configuring Core Derecho
Applications need to tell the Derecho library which node is the initial leader with the options **leader_ip** and **leader_gms_port**. Each node then specifies its own ID (**local_id**) and the IP address and ports it will use for Derecho component services (**local_ip**, **gms_port**, **state_transfer_port**, **sst_port**, and **rdmc_port**). Also, if using external clients, applications need to specify the ports serving external clients (**leader_external_port** and **external_port**);
Applications need to tell the Derecho library which node is the initial leader with the options **leader_ip** and **leader_gms_port**. Each node then specifies its own ID (**local_id**) and the IP address and ports it will use for Derecho component services (**local_ip**, **gms_port**, **state_transfer_port**, **sst_port**, and **rdmc_port**). Also, if using external clients, applications need to specify the ports serving external clients (**external_port**);

The other important parameters are the message sizes. Since Derecho pre-allocates buffers for RDMA communication, each application should decide on an optimal buffer size based on the amount of data it expects to send at once. If the buffer size is much larger than the messages an application actually sends, Derecho will pin a lot of memory and leave it underutilized. If the buffer size is smaller than the application's actual message size, it will have to split messages into segments before sending them, causing unnecessary overhead.

Expand Down
2 changes: 1 addition & 1 deletion config.h.in
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
#pragma once
#cmakedefine ENABLE_LEADER_REGISTRY
#cmakedefine ENABLE_MEMBER_REGISTRY
41 changes: 18 additions & 23 deletions include/derecho/conf/conf.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,16 +23,14 @@ class Conf {
public:

//String constants for config options
#ifdef ENABLE_LEADER_REGISTRY
static constexpr const char* DERECHO_LEADER_REGISTRY_IP = "DERECHO/leader_registry_ip";
static constexpr const char* DERECHO_LEADER_REGISTRY_PORT = "DERECHO/leader_registry_port";
#else
static constexpr const char* DERECHO_LEADER_IP = "DERECHO/leader_ip";
static constexpr const char* DERECHO_LEADER_GMS_PORT = "DERECHO/leader_gms_port";
static constexpr const char* DERECHO_LEADER_EXTERNAL_PORT = "DERECHO/leader_external_port";
static constexpr const char* DERECHO_RESTART_LEADERS = "DERECHO/restart_leaders";
static constexpr const char* DERECHO_RESTART_LEADER_PORTS = "DERECHO/restart_leader_ports";
static constexpr const char* DERECHO_CONTACT_IP = "DERECHO/contact_ip";
static constexpr const char* DERECHO_CONTACT_PORT = "DERECHO/contact_port";
#ifdef ENABLE_MEMBER_REGISTRY
static constexpr const char* DERECHO_MEMBER_REGISTRY_IP = "DERECHO/view_registry_ip";
static constexpr const char* DERECHO_MEMBER_REGISTRY_PORT = "DERECHO/view_registry_port";
#endif
static constexpr const char* DERECHO_RESTART_LEADERS = "DERECHO/restart_leaders";
static constexpr const char* DERECHO_RESTART_LEADER_PORTS = "DERECHO/restart_leader_ports";
static constexpr const char* DERECHO_LOCAL_ID = "DERECHO/local_id";
static constexpr const char* DERECHO_LOCAL_IP = "DERECHO/local_ip";
static constexpr const char* DERECHO_GMS_PORT = "DERECHO/gms_port";
Expand Down Expand Up @@ -86,16 +84,14 @@ class Conf {
// config name --> default value
std::map<const std::string, std::string> config = {
// [DERECHO]
#ifdef ENABLE_LEADER_REGISTRY
{DERECHO_LEADER_REGISTRY_IP, "127.0.0.1"},
{DERECHO_LEADER_REGISTRY_PORT, "50182"},
#else
{DERECHO_LEADER_IP, "127.0.0.1"},
{DERECHO_LEADER_GMS_PORT, "23580"},
{DERECHO_LEADER_EXTERNAL_PORT, "32645"},
#ifdef ENABLE_MEMBER_REGISTRY
{DERECHO_MEMBER_REGISTRY_IP, "127.0.0.1"},
{DERECHO_MEMBER_REGISTRY_PORT, "50182"},
#endif
{DERECHO_CONTACT_IP, "127.0.0.1"},
{DERECHO_CONTACT_PORT, "23580"},
{DERECHO_RESTART_LEADERS, "127.0.0.1"},
{DERECHO_RESTART_LEADER_PORTS, "23580"},
#endif
{DERECHO_LOCAL_ID, "0"},
{DERECHO_LOCAL_IP, "127.0.0.1"},
{DERECHO_GMS_PORT, "23580"},
Expand Down Expand Up @@ -221,20 +217,19 @@ class Conf {
return (this->config.find(key) != this->config.end());
}

#ifdef ENABLE_MEMBER_REGISTRY
/**
* @brief Get leader's ip
* @return A tuple of <leader ip,gms port,external port> representing the information of the current leader.
* @brief Get published
* @return A list of active members.
*/
const std::tuple<std::string,uint16_t,uint16_t> get_leader() const;

#ifdef ENABLE_LEADER_REGISTRY
const std::vector<std::tuple<std::string,uint16_t>> get_active_members() const;
/**
* Push a leader to leader registry
* @param ip The ip address of the new leader.
* @param gms_port The GMS port of the new leader.
* @param ext_port The EXT port of the new leader.
*/
void push_leader(std::string ip,uint16_t gms_port,uint16_t ext_port);
void push_active_members(const std::vector<std::tuple<std::string,uint16_t>>&);
#endif

// Initialize the singleton from the command line and the configuration file.
Expand Down
9 changes: 4 additions & 5 deletions include/derecho/core/detail/external_group_impl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -254,7 +254,7 @@ ExternalGroupClient<ReplicatedTypes...>::~ExternalGroupClient() {

// wait confirmation from server
bool remove_confirmed;
sock.read(remove_confirmed);
sock.read(remove_confirmed);
} catch(tcp::socket_error&) {
dbg_default_error("Failed to gracefully exit: socket error while sending join request.");
dbg_default_flush();
Expand All @@ -273,24 +273,23 @@ ExternalGroupClient<ReplicatedTypes...>::~ExternalGroupClient() {
template <typename... ReplicatedTypes>
bool ExternalGroupClient<ReplicatedTypes...>::get_view(const node_id_t nid) {
try {
auto leader_tuple = Conf::get()->get_leader();
tcp::socket sock = (nid == INVALID_NODE_ID)
? tcp::socket(std::get<0>(leader_tuple), std::get<1>(leader_tuple))
? tcp::socket(getConfString(Conf::DERECHO_CONTACT_IP), getConfUInt16(Conf::DERECHO_CONTACT_PORT))
: tcp::socket(curr_view->member_ips_and_ports[curr_view->rank_of(nid)].ip_address,
curr_view->member_ips_and_ports[curr_view->rank_of(nid)].gms_port, false);

JoinResponse leader_response;
uint64_t leader_version_hashcode;
sock.exchange(my_version_hashcode, leader_version_hashcode);
if(leader_version_hashcode != my_version_hashcode) {
dbg_default_error("Leader refused connection because Derecho or compiler version did not match! Local version hashcode = {}, leader version hashcode = {}", my_version_hashcode, leader_version_hashcode);
dbg_default_error("Derecho member refused connection because Derecho or compiler version did not match! Local version hashcode = {}, member version hashcode = {}", my_version_hashcode, leader_version_hashcode);
dbg_default_flush();
return false;
}
sock.write(JoinRequest{my_id, true});
sock.read(leader_response);
if(leader_response.code == JoinResponseCode::ID_IN_USE) {
dbg_default_error("Leader refused connection because ID {} is already in use!", my_id);
dbg_default_error("Derecho member refused connection because ID {} is already in use!", my_id);
dbg_default_flush();
return false;
}
Expand Down
5 changes: 0 additions & 5 deletions include/derecho/core/detail/restart_state.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -67,10 +67,6 @@ struct RestartState {
* replicated state.
*/
std::vector<std::vector<int64_t>> restart_shard_leaders;
#ifdef ENABLE_LEADER_REGISTRY
// Leader registry will tell who is the current leader. We don't need to test it in the order of
// "restart leaders".
#else
/**
* List of IP addresses of potential restart leaders (of the overall process)
* in descending priority order
Expand All @@ -85,7 +81,6 @@ struct RestartState {
* into restart_leader_ips.
*/
uint32_t num_leader_failures;
#endif
/**
* Reads the logs stored at this node and initializes logged_ragged_trim
*/
Expand Down
25 changes: 20 additions & 5 deletions include/derecho/core/detail/view_manager.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -543,11 +543,26 @@ class ViewManager {
/** Performs one-time global initialization of RDMC and SST, using the current view's membership. */
void initialize_rdmc_sst();
/**
* Helper for joining an existing group; receives the View and parameters from the leader.
* @return true if the leader successfully sent the View, false if the leader crashed
* (i.e. a socket operation to it failed) before completing the process
* Helper for joining an existing group; sends a join request to the configured
* contact node, handles redirects to the leader, and sends the logged View if
* the leader announces that the group is restarting.
* @return true if the join request interaction completed successfully, false if
* there was a network error while communicating with the leader
*/
bool receive_initial_view();
bool send_join_request();

/**
* Simple wrapper for leader_connection->try_connect that implements a
* timeout correctly. Due to the way sockets work, tcp::socket's try_connect
* will (annoyingly) return immediately if the connection is refused, which
* is the usual result while waiting for a node to start up.
*
* @param ip_address The IP address to try to connect leader_connection to
* @param port The port to try to connect leader_connection to
* @param timeout_ms The time, in milliseconds, to wait for a connection
* @return true if leader_connection connected successfully, false if it timed out
*/
bool try_connect_to_leader(const ip_addr_t& ip_address, uint16_t port, int timeout_ms);

/**
* Constructor helper that initializes TCP connections (for state transfer)
Expand Down Expand Up @@ -797,7 +812,7 @@ class ViewManager {
void register_add_external_connection_upcall(const std::function<void(uint32_t)>& upcall) {
add_external_connection_upcall = upcall;
}

void register_remove_external_connection_upcall(const std::function<void(uint32_t)>& upcall) {
remove_external_connection_upcall = upcall;
}
Expand Down
2 changes: 1 addition & 1 deletion include/derecho/core/external_group.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ class ExternalGroupClient {

/**
* requests a new view from group member nid
* if nid is -1, then request a view from Conf::DERECHO_LEADER_IP
* if nid is -1, then request a view from Conf::DERECHO_CONTACT_IP
* defined in derecho.cfg
*/
bool get_view(const node_id_t nid);
Expand Down
Loading

0 comments on commit d9a10bb

Please sign in to comment.