Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement] Return early in fast_round when the amount of KeyConflict large than or equal to recover quorum #825

Closed
Phoenix500526 opened this issue May 23, 2024 · 2 comments · Fixed by #835
Assignees
Labels
good first issue Good for newcomers Improvement Refactor or optimization, including process, performance or something like that

Comments

@Phoenix500526
Copy link
Collaborator

Currently, the fast_round in crates/curp/src/client/unary.rs looks like

/// Send proposal to all servers
pub(super) async fn fast_round(
    &self,
    propose_id: ProposeId,
    cmd: &C,
    token: Option<&String>,
) -> Result<Result<C::ER, C::Error>, CurpError> {
    let req = ProposeRequest::new(propose_id, cmd, self.state.cluster_version().await);
    let timeout = self.config.propose_timeout;

    let mut responses = self
        .state
        .for_each_server(|conn| {
            let req_c = req.clone();
            let token_c = token.cloned();
            async move { (conn.id(), conn.propose(req_c, token_c, timeout).await) }
        })
        .await;
    let super_quorum = super_quorum(responses.len());

    let mut err: Option<CurpError> = None;
    let mut execute_result: Option<C::ER> = None;
    let mut ok_cnt = 0;

    while let Some((id, resp)) = responses.next().await {
        let resp = match resp {
            Ok(resp) => resp.into_inner(),
            Err(e) => {
                warn!("propose cmd({propose_id}) to server({id}) error: {e:?}");
                if e.should_abort_fast_round() {
                    return Err(e);
                }
                if let Some(old_err) = err.as_ref() {
                    if old_err.priority() <= e.priority() {
                        err = Some(e);
                    }
                } else {
                    err = Some(e);
                }
                continue;
            }
        };
        let deserialize_res = resp.map_result::<C, _, Result<(), C::Error>>(|res| {
            let er = match res {
                Ok(er) => er,
                Err(cmd_err) => return Err(cmd_err),
            };
            if let Some(er) = er {
                assert!(execute_result.is_none(), "should not set exe result twice");
                execute_result = Some(er);
            }
            ok_cnt.add_assign(1);
            Ok(())
        });
        let dr = match deserialize_res {
            Ok(dr) => dr,
            Err(ser_err) => {
                warn!("serialize error: {ser_err}");
                // We blame this error to the server, although it may be a local error.
                // We need to retry as same as a server error.
                err = Some(CurpError::from(ser_err));
                continue;
            }
        };
        if let Err(cmd_err) = dr {
            // got a command execution error early, abort the next requests and return the cmd error
            return Ok(Err(cmd_err));
        }
        // if the propose meets the super quorum and we got the execute result,
        // that means we can safely abort the next requests
        if ok_cnt >= super_quorum {
            if let Some(er) = execute_result {
                debug!("fast round for cmd({}) succeed", propose_id);
                return Ok(Ok(er));
            }
        }
    }

    if let Some(err) = err {
        return Err(err);
    }

    // We will at least send the request to the leader if no `WrongClusterVersion` returned.
    // If no errors occur, the leader should return the ER
    // If it is because the super quorum has not been reached, an error will definitely occur.
    // Otherwise, there is no leader in the cluster state currently, return wrong cluster version
    // and attempt to retrieve the cluster state again.
    Err(CurpError::wrong_cluster_version())
}

As you can see, all the KeyConflict errors are ignored. We can count the KeyConflict errors. If the number of KeyConflict errors is equal to or larger than the recover_quorum, which is about a quarter of the number of nodes in a cluster, we can return early from the fast round since the superquorum condition will never be met.

@Phoenix500526 Phoenix500526 added good first issue Good for newcomers Improvement Refactor or optimization, including process, performance or something like that labels May 23, 2024
Copy link

👋 Thanks for opening this issue!

Reply with the following command on its own line to get help or engage:

  • /contributing-agreement : to print Contributing Agreements.
  • /assignme : to assign this issue to you.

@poltao
Copy link
Contributor

poltao commented Jun 3, 2024

/assignme

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers Improvement Refactor or optimization, including process, performance or something like that
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants