Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[reconfigurator] Retrieve keeper lgif information #6549

Merged
merged 20 commits into from
Sep 19, 2024
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

17 changes: 15 additions & 2 deletions clickhouse-admin/api/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,10 @@
// file, You can obtain one at https://mozilla.org/MPL/2.0/.

use clickhouse_admin_types::config::{KeeperConfig, ReplicaConfig};
use clickhouse_admin_types::{KeeperSettings, ServerSettings};
use dropshot::{HttpError, HttpResponseCreated, RequestContext, TypedBody};
use clickhouse_admin_types::{KeeperSettings, Lgif, ServerSettings};
use dropshot::{
HttpError, HttpResponseCreated, HttpResponseOk, RequestContext, TypedBody,
};
use omicron_common::api::external::Generation;
use schemars::JsonSchema;
use serde::Deserialize;
Expand Down Expand Up @@ -50,4 +52,15 @@ pub trait ClickhouseAdminApi {
rqctx: RequestContext<Self::Context>,
body: TypedBody<KeeperConfigurableSettings>,
) -> Result<HttpResponseCreated<KeeperConfig>, HttpError>;

/// Retrieve a logically grouped information file from a keeper node.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this command may be good for debugging/introspection aside from the inventory we need for reconfiguration and we should keep it.

I just want to point out that for inventory we'll likely not use this API command directly, but rather use the underlying keeper request along with another request to /keeper/config to get the current committed configuration. That will eliminate an extra network round trip.

Unfortunately, keeper doesn't seem to provide a way to get the configuration and its log index together in one call. If it did that we could use that for inventory. I think that would be a good patch to make to keeper and the keeper cli, but until then we'll need to make two separate calls. And because of the inherent race condition of two calls, we'll likely have to call at least one endpoint twice to see that the config hasn't changed while polling. We would want to do that local to the sled-agent without a round trip from nexus.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah definitely! I realise there would be another endpoint to collect all the information we need for inventory itself. My reasoning was that the lgif command is useful for many things, and in the spirit of keeping PRs as small and easily digestible as possible, I was going to make a separate PR with that endpoint. On hindsight I should have written that on the PRs description. Sorry about that!

/// This information is used internally by ZooKeeper to manage snapshots
/// and logs for consistency and recovery.
#[endpoint {
method = GET,
path = "/keeper/lgif",
}]
async fn lgif(
rqctx: RequestContext<Self::Context>,
) -> Result<HttpResponseOk<Lgif>, HttpError>;
}
26 changes: 19 additions & 7 deletions clickhouse-admin/src/bin/clickhouse-admin.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
use anyhow::anyhow;
use camino::Utf8PathBuf;
use clap::Parser;
use omicron_clickhouse_admin::{Clickward, Config};
use omicron_clickhouse_admin::{ClickhouseCli, Clickward, Config};
use omicron_common::cmd::fatal;
use omicron_common::cmd::CmdError;
use std::net::{SocketAddr, SocketAddrV6};
Expand All @@ -27,6 +27,14 @@ enum Args {
/// Path to the server configuration file
#[clap(long, short, action)]
config: Utf8PathBuf,

/// Address on which the clickhouse server or keeper is listening on
#[clap(long, short = 'l', action)]
listen_address: SocketAddrV6,

/// Path to the clickhouse binary
#[clap(long, short, action)]
binary_path: Utf8PathBuf,
},
}

Expand All @@ -41,17 +49,21 @@ async fn main_impl() -> Result<(), CmdError> {
let args = Args::parse();

match args {
Args::Run { http_address, config } => {
Args::Run { http_address, config, listen_address, binary_path } => {
let mut config = Config::from_file(&config)
.map_err(|err| CmdError::Failure(anyhow!(err)))?;
config.dropshot.bind_address = SocketAddr::V6(http_address);

let clickward = Clickward::new();
let clickhouse_cli =
ClickhouseCli::new(binary_path, listen_address);

let server =
omicron_clickhouse_admin::start_server(clickward, config)
.await
.map_err(|err| CmdError::Failure(anyhow!(err)))?;
let server = omicron_clickhouse_admin::start_server(
clickward,
clickhouse_cli,
config,
)
.await
.map_err(|err| CmdError::Failure(anyhow!(err)))?;
server.await.map_err(|err| {
CmdError::Failure(anyhow!(
"server failed after starting: {err}"
Expand Down
130 changes: 130 additions & 0 deletions clickhouse-admin/src/clickhouse_cli.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
// This Source Code Form is subject to the terms of the Mozilla Public
// License, v. 2.0. If a copy of the MPL was not distributed with this
// file, You can obtain one at https://mozilla.org/MPL/2.0/.

use anyhow::Result;
use camino::Utf8PathBuf;
use clickhouse_admin_types::Lgif;
use dropshot::HttpError;
use illumos_utils::{output_to_exec_error, ExecutionError};
use slog::Logger;
use slog_error_chain::{InlineErrorChain, SlogInlineError};
use std::ffi::OsStr;
use std::io;
use std::net::SocketAddrV6;
use tokio::process::Command;

#[derive(Debug, thiserror::Error, SlogInlineError)]
pub enum ClickhouseCliError {
#[error("failed to run `clickhouse {subcommand}`")]
Run {
description: &'static str,
subcommand: String,
#[source]
err: io::Error,
},
#[error(transparent)]
ExecutionError(#[from] ExecutionError),
#[error("failed to parse command output")]
Parse {
description: &'static str,
stdout: String,
stderr: String,
#[source]
err: anyhow::Error,
},
}

impl From<ClickhouseCliError> for HttpError {
fn from(err: ClickhouseCliError) -> Self {
match err {
ClickhouseCliError::Run { .. }
| ClickhouseCliError::Parse { .. }
| ClickhouseCliError::ExecutionError(_) => {
let message = InlineErrorChain::new(&err).to_string();
HttpError {
status_code: http::StatusCode::INTERNAL_SERVER_ERROR,
error_code: Some(String::from("Internal")),
external_message: message.clone(),
internal_message: message,
}
}
}
}
}

#[derive(Debug)]
pub struct ClickhouseCli {
/// Path to where the clickhouse binary is located
pub binary_path: Utf8PathBuf,
/// Address on where the clickhouse keeper is listening on
pub listen_address: SocketAddrV6,
pub log: Option<Logger>,
}

impl ClickhouseCli {
pub fn new(binary_path: Utf8PathBuf, listen_address: SocketAddrV6) -> Self {
Self { binary_path, listen_address, log: None }
}

pub fn with_log(mut self, log: Logger) -> Self {
self.log = Some(log);
self
}

pub async fn lgif(&self) -> Result<Lgif, ClickhouseCliError> {
self.keeper_client_non_interactive(
"lgif",
"Retrieve logically grouped information file",
Lgif::parse,
self.log.clone().unwrap(),
)
.await
}

async fn keeper_client_non_interactive<'a, F, T>(
karencfv marked this conversation as resolved.
Show resolved Hide resolved
&self,
query: &str,
subcommand_description: &'static str,
parse: F,
log: Logger,
) -> Result<T, ClickhouseCliError>
where
F: FnOnce(&Logger, &[u8]) -> Result<T>,
{
let mut command = Command::new(&self.binary_path);
command
.arg("keeper-client")
.arg("--host")
.arg(&format!("[{}]", self.listen_address.ip()))
.arg("--port")
.arg(&format!("{}", self.listen_address.port()))
.arg("--query")
.arg(query);

let output = command.output().await.map_err(|err| {
let args: Vec<&OsStr> = command.as_std().get_args().collect();
let args_parsed: Vec<String> = args
.iter()
.map(|&os_str| os_str.to_str().unwrap().to_owned())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I wasn't clear. I don't think we should panic here if the string isn't UTF8. Instead we should return an error, especially since this function already returns a Result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry again. For some reason your comments didn't show up on github last I looked. I realize this is unwrapping now because it's arguments we pass. Nonetheless I'd rather be on the safe side and return an error if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, to_str doesn't really return an error, it's an Option, is the intention to return an error on None?. The only purpose of args_parsed is to provide information for the ClickhouseCliError::Run error. That's why I thought to_str_lossy made sense, that way we can see in the error exactly what arguments are being passed. So, in a way, I was already returning an error?

Maybe I'm misunderstanding, but it feels a bit strange to return an error because of a malformed error? It seems simpler to collect all the arguments, however they were passed and return a single error with that information like it was in the beginning with to_str_lossy. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, we don't even need these args_parsed at all. I just thought it'd be useful to have them as part of the error message when returning an error, so an user could see exactly what arguments had been passed when running the command. The more I think about it, the more I think it's best to have to_str_lossy to see exactly what's being passed even in somehow a non-unicode character made it's way in somehow.

Unless this information isn't useful as part of the error message? In that case I could remove them altogether 🤔

Copy link
Contributor

@andrewjstone andrewjstone Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, looks like I totally misinterpreted what was happening here. Sorry about that @karencfv! I didn't realize this was just for the error message. You were right using to_str_lossy in the first place. It's useful information to have, even if there's some weird non-utf8 char that gets excluded, which is doubtful anyway. That's better than panicking on unwrap. Sorry for the hassle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😄 nw! I've changed the names of the variables, so it's clearer what those are for

.collect();
let args_str = args_parsed.join(" ");
ClickhouseCliError::Run {
description: subcommand_description,
subcommand: args_str,
err,
}
})?;

if !output.status.success() {
return Err(output_to_exec_error(command.as_std(), &output).into());
}

parse(&log, &output.stdout).map_err(|err| ClickhouseCliError::Parse {
description: subcommand_description,
stdout: String::from_utf8_lossy(&output.stdout).to_string(),
stderr: String::from_utf8_lossy(&output.stdout).to_string(),
err,
})
}
}
15 changes: 12 additions & 3 deletions clickhouse-admin/src/context.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,29 @@
// License, v. 2.0. If a copy of the MPL was not distributed with this
// file, You can obtain one at https://mozilla.org/MPL/2.0/.

use crate::Clickward;
use crate::{ClickhouseCli, Clickward};
use slog::Logger;

pub struct ServerContext {
clickward: Clickward,
clickhouse_cli: ClickhouseCli,
_log: Logger,
}

impl ServerContext {
pub fn new(clickward: Clickward, _log: Logger) -> Self {
Self { clickward, _log }
pub fn new(
clickward: Clickward,
clickhouse_cli: ClickhouseCli,
_log: Logger,
) -> Self {
Self { clickward, clickhouse_cli, _log }
}

pub fn clickward(&self) -> &Clickward {
&self.clickward
}

pub fn clickhouse_cli(&self) -> &ClickhouseCli {
&self.clickhouse_cli
}
}
13 changes: 12 additions & 1 deletion clickhouse-admin/src/http_entrypoints.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,10 @@
use crate::context::ServerContext;
use clickhouse_admin_api::*;
use clickhouse_admin_types::config::{KeeperConfig, ReplicaConfig};
use dropshot::{HttpError, HttpResponseCreated, RequestContext, TypedBody};
use clickhouse_admin_types::Lgif;
use dropshot::{
HttpError, HttpResponseCreated, HttpResponseOk, RequestContext, TypedBody,
};
use std::sync::Arc;

type ClickhouseApiDescription = dropshot::ApiDescription<Arc<ServerContext>>;
Expand Down Expand Up @@ -44,4 +47,12 @@ impl ClickhouseAdminApi for ClickhouseAdminImpl {
let output = ctx.clickward().generate_keeper_config(keeper.settings)?;
Ok(HttpResponseCreated(output))
}

async fn lgif(
rqctx: RequestContext<Self::Context>,
) -> Result<HttpResponseOk<Lgif>, HttpError> {
let ctx = rqctx.context();
let output = ctx.clickhouse_cli().lgif().await?;
Ok(HttpResponseOk(output))
}
}
5 changes: 5 additions & 0 deletions clickhouse-admin/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,13 @@ use std::error::Error;
use std::io;
use std::sync::Arc;

mod clickhouse_cli;
mod clickward;
mod config;
mod context;
mod http_entrypoints;

pub use clickhouse_cli::ClickhouseCli;
pub use clickward::Clickward;
pub use config::Config;

Expand All @@ -34,6 +36,7 @@ pub type Server = dropshot::HttpServer<Arc<ServerContext>>;
/// Start the dropshot server
pub async fn start_server(
clickward: Clickward,
clickhouse_cli: ClickhouseCli,
server_config: Config,
) -> Result<Server, StartError> {
let (drain, registration) = slog_dtrace::with_drain(
Expand All @@ -56,6 +59,8 @@ pub async fn start_server(

let context = ServerContext::new(
clickward,
clickhouse_cli
.with_log(log.new(slog::o!("component" => "ClickhouseCli"))),
log.new(slog::o!("component" => "ServerContext")),
);
let http_server_starter = dropshot::HttpServerStarter::new(
Expand Down
5 changes: 5 additions & 0 deletions clickhouse-admin/types/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,9 @@ omicron-workspace-hack.workspace = true
schemars.workspace = true
serde.workspace = true
serde_json.workspace = true
slog.workspace = true
expectorate.workspace = true

[dev-dependencies]
slog-async.workspace = true
slog-term.workspace = true
Loading
Loading