Skip to content

Commit

Permalink
pageserver: only throttle pagestream requests & bring back throttling…
Browse files Browse the repository at this point in the history
… deduction for smgr latency metrics (#9962)

## Problem

In the batching PR 
- #9870

I stopped deducting the time-spent-in-throttle fro latency metrics,
i.e.,
- smgr latency metrics (`SmgrOpTimer`)
- basebackup latency (+scan latency, which I think is part of
basebackup).

The reason for stopping the deduction was that with the introduction of
batching, the trick with tracking time-spent-in-throttle inside
RequestContext and swap-replacing it from the `impl Drop for
SmgrOpTimer` no longer worked with >1 requests in a batch.

However, deducting time-spent-in-throttle is desirable because our
internal latency SLO definition does not account for throttling.

## Summary of changes

- Redefine throttling to be a page_service pagestream request throttle
instead of a throttle for repository `Key` reads through `Timeline::get`
/ `Timeline::get_vectored`.
- This means reads done by `basebackup` are no longer subject to any
throttle.
- The throttle applies after batching, before handling of the request.
- Drive-by fix: make throttle sensitive to cancellation.
- Rename metric label `kind` from `timeline_get` to `pagestream` to
reflect the new scope of throttling.

To avoid config format breakage, we leave the config field named
`timeline_get_throttle` and ignore the `task_kinds` field.
This will be cleaned up in a future PR.

## Trade-Offs

Ideally, we would apply the throttle before reading a request off the
connection, so that we queue the minimal amount of work inside the
process.
However, that's not possible because we need to do shard routing.

The redefinition of the throttle to limit pagestream request rate
instead of repository `Key` rate comes with several downsides:
- We're no longer able to use the throttle mechanism for other other
tasks, e.g. image layer creation.
  However, in practice, we never used that capability anyways.
- We no longer throttle basebackup.
  • Loading branch information
problame authored Dec 3, 2024
1 parent bbe4dfa commit 4d422b9
Show file tree
Hide file tree
Showing 9 changed files with 198 additions and 131 deletions.
58 changes: 56 additions & 2 deletions libs/pageserver_api/src/models.rs
Original file line number Diff line number Diff line change
Expand Up @@ -501,18 +501,48 @@ pub struct EvictionPolicyLayerAccessThreshold {

#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)]
pub struct ThrottleConfig {
pub task_kinds: Vec<String>, // TaskKind
/// See [`ThrottleConfigTaskKinds`] for why we do the serde `rename`.
#[serde(rename = "task_kinds")]
pub enabled: ThrottleConfigTaskKinds,
pub initial: u32,
#[serde(with = "humantime_serde")]
pub refill_interval: Duration,
pub refill_amount: NonZeroU32,
pub max: u32,
}

/// Before <https://github.com/neondatabase/neon/pull/9962>
/// the throttle was a per `Timeline::get`/`Timeline::get_vectored` call.
/// The `task_kinds` field controlled which Pageserver "Task Kind"s
/// were subject to the throttle.
///
/// After that PR, the throttle is applied at pagestream request level
/// and the `task_kinds` field does not apply since the only task kind
/// that us subject to the throttle is that of the page service.
///
/// However, we don't want to make a breaking config change right now
/// because it means we have to migrate all the tenant configs.
/// This will be done in a future PR.
///
/// In the meantime, we use emptiness / non-emptsiness of the `task_kinds`
/// field to determine if the throttle is enabled or not.
#[derive(Serialize, Deserialize, Debug, Clone, PartialEq, Eq)]
#[serde(transparent)]
pub struct ThrottleConfigTaskKinds(Vec<String>);

impl ThrottleConfigTaskKinds {
pub fn disabled() -> Self {
Self(vec![])
}
pub fn is_enabled(&self) -> bool {
!self.0.is_empty()
}
}

impl ThrottleConfig {
pub fn disabled() -> Self {
Self {
task_kinds: vec![], // effectively disables the throttle
enabled: ThrottleConfigTaskKinds::disabled(),
// other values don't matter with emtpy `task_kinds`.
initial: 0,
refill_interval: Duration::from_millis(1),
Expand All @@ -526,6 +556,30 @@ impl ThrottleConfig {
}
}

#[cfg(test)]
mod throttle_config_tests {
use super::*;

#[test]
fn test_disabled_is_disabled() {
let config = ThrottleConfig::disabled();
assert!(!config.enabled.is_enabled());
}
#[test]
fn test_enabled_backwards_compat() {
let input = serde_json::json!({
"task_kinds": ["PageRequestHandler"],
"initial": 40000,
"refill_interval": "50ms",
"refill_amount": 1000,
"max": 40000,
"fair": true
});
let config: ThrottleConfig = serde_json::from_value(input).unwrap();
assert!(config.enabled.is_enabled());
}
}

/// A flattened analog of a `pagesever::tenant::LocationMode`, which
/// lists out all possible states (and the virtual "Detached" state)
/// in a flat form rather than using rust-style enums.
Expand Down
95 changes: 59 additions & 36 deletions pageserver/src/metrics.rs
Original file line number Diff line number Diff line change
Expand Up @@ -217,31 +217,16 @@ impl<'a> ScanLatencyOngoingRecording<'a> {
ScanLatencyOngoingRecording { parent, start }
}

pub(crate) fn observe(self, throttled: Option<Duration>) {
pub(crate) fn observe(self) {
let elapsed = self.start.elapsed();
let ex_throttled = if let Some(throttled) = throttled {
elapsed.checked_sub(throttled)
} else {
Some(elapsed)
};
if let Some(ex_throttled) = ex_throttled {
self.parent.observe(ex_throttled.as_secs_f64());
} else {
use utils::rate_limit::RateLimit;
static LOGGED: Lazy<Mutex<RateLimit>> =
Lazy::new(|| Mutex::new(RateLimit::new(Duration::from_secs(10))));
let mut rate_limit = LOGGED.lock().unwrap();
rate_limit.call(|| {
warn!("error deducting time spent throttled; this message is logged at a global rate limit");
});
}
self.parent.observe(elapsed.as_secs_f64());
}
}

pub(crate) static GET_VECTORED_LATENCY: Lazy<GetVectoredLatency> = Lazy::new(|| {
let inner = register_histogram_vec!(
"pageserver_get_vectored_seconds",
"Time spent in get_vectored, excluding time spent in timeline_get_throttle.",
"Time spent in get_vectored.",
&["task_kind"],
CRITICAL_OP_BUCKETS.into(),
)
Expand All @@ -264,7 +249,7 @@ pub(crate) static GET_VECTORED_LATENCY: Lazy<GetVectoredLatency> = Lazy::new(||
pub(crate) static SCAN_LATENCY: Lazy<ScanLatency> = Lazy::new(|| {
let inner = register_histogram_vec!(
"pageserver_scan_seconds",
"Time spent in scan, excluding time spent in timeline_get_throttle.",
"Time spent in scan.",
&["task_kind"],
CRITICAL_OP_BUCKETS.into(),
)
Expand Down Expand Up @@ -1227,11 +1212,44 @@ pub(crate) struct SmgrOpTimer {
per_timeline_latency_histo: Option<Histogram>,

start: Instant,
throttled: Duration,
op: SmgrQueryType,
}

impl SmgrOpTimer {
pub(crate) fn deduct_throttle(&mut self, throttle: &Option<Duration>) {
let Some(throttle) = throttle else {
return;
};
self.throttled += *throttle;
}
}

impl Drop for SmgrOpTimer {
fn drop(&mut self) {
let elapsed = self.start.elapsed().as_secs_f64();
let elapsed = self.start.elapsed();

let elapsed = match elapsed.checked_sub(self.throttled) {
Some(elapsed) => elapsed,
None => {
use utils::rate_limit::RateLimit;
static LOGGED: Lazy<Mutex<enum_map::EnumMap<SmgrQueryType, RateLimit>>> =
Lazy::new(|| {
Mutex::new(enum_map::EnumMap::from_array(std::array::from_fn(|_| {
RateLimit::new(Duration::from_secs(10))
})))
});
let mut guard = LOGGED.lock().unwrap();
let rate_limit = &mut guard[self.op];
rate_limit.call(|| {
warn!(op=?self.op, ?elapsed, ?self.throttled, "implementation error: time spent throttled exceeds total request wall clock time");
});
elapsed // un-throttled time, more info than just saturating to 0
}
};

let elapsed = elapsed.as_secs_f64();

self.global_latency_histo.observe(elapsed);
if let Some(per_timeline_getpage_histo) = &self.per_timeline_latency_histo {
per_timeline_getpage_histo.observe(elapsed);
Expand Down Expand Up @@ -1491,6 +1509,8 @@ impl SmgrQueryTimePerTimeline {
global_latency_histo: self.global_latency[op as usize].clone(),
per_timeline_latency_histo,
start: started_at,
op,
throttled: Duration::ZERO,
}
}

Expand Down Expand Up @@ -3299,7 +3319,7 @@ pub(crate) mod tenant_throttling {
use once_cell::sync::Lazy;
use utils::shard::TenantShardId;

use crate::tenant::{self, throttle::Metric};
use crate::tenant::{self};

struct GlobalAndPerTenantIntCounter {
global: IntCounter,
Expand All @@ -3318,7 +3338,7 @@ pub(crate) mod tenant_throttling {
}
}

pub(crate) struct TimelineGet {
pub(crate) struct Metrics<const KIND: usize> {
count_accounted_start: GlobalAndPerTenantIntCounter,
count_accounted_finish: GlobalAndPerTenantIntCounter,
wait_time: GlobalAndPerTenantIntCounter,
Expand Down Expand Up @@ -3391,40 +3411,41 @@ pub(crate) mod tenant_throttling {
.unwrap()
});

const KIND: &str = "timeline_get";
const KINDS: &[&str] = &["pagestream"];
pub type Pagestream = Metrics<0>;

impl TimelineGet {
impl<const KIND: usize> Metrics<KIND> {
pub(crate) fn new(tenant_shard_id: &TenantShardId) -> Self {
let per_tenant_label_values = &[
KIND,
KINDS[KIND],
&tenant_shard_id.tenant_id.to_string(),
&tenant_shard_id.shard_slug().to_string(),
];
TimelineGet {
Metrics {
count_accounted_start: {
GlobalAndPerTenantIntCounter {
global: COUNT_ACCOUNTED_START.with_label_values(&[KIND]),
global: COUNT_ACCOUNTED_START.with_label_values(&[KINDS[KIND]]),
per_tenant: COUNT_ACCOUNTED_START_PER_TENANT
.with_label_values(per_tenant_label_values),
}
},
count_accounted_finish: {
GlobalAndPerTenantIntCounter {
global: COUNT_ACCOUNTED_FINISH.with_label_values(&[KIND]),
global: COUNT_ACCOUNTED_FINISH.with_label_values(&[KINDS[KIND]]),
per_tenant: COUNT_ACCOUNTED_FINISH_PER_TENANT
.with_label_values(per_tenant_label_values),
}
},
wait_time: {
GlobalAndPerTenantIntCounter {
global: WAIT_USECS.with_label_values(&[KIND]),
global: WAIT_USECS.with_label_values(&[KINDS[KIND]]),
per_tenant: WAIT_USECS_PER_TENANT
.with_label_values(per_tenant_label_values),
}
},
count_throttled: {
GlobalAndPerTenantIntCounter {
global: WAIT_COUNT.with_label_values(&[KIND]),
global: WAIT_COUNT.with_label_values(&[KINDS[KIND]]),
per_tenant: WAIT_COUNT_PER_TENANT
.with_label_values(per_tenant_label_values),
}
Expand All @@ -3447,15 +3468,17 @@ pub(crate) mod tenant_throttling {
&WAIT_USECS_PER_TENANT,
&WAIT_COUNT_PER_TENANT,
] {
let _ = m.remove_label_values(&[
KIND,
&tenant_shard_id.tenant_id.to_string(),
&tenant_shard_id.shard_slug().to_string(),
]);
for kind in KINDS {
let _ = m.remove_label_values(&[
kind,
&tenant_shard_id.tenant_id.to_string(),
&tenant_shard_id.shard_slug().to_string(),
]);
}
}
}

impl Metric for TimelineGet {
impl<const KIND: usize> tenant::throttle::Metric for Metrics<KIND> {
#[inline(always)]
fn accounting_start(&self) {
self.count_accounted_start.inc();
Expand Down
45 changes: 43 additions & 2 deletions pageserver/src/page_service.rs
Original file line number Diff line number Diff line change
Expand Up @@ -574,6 +574,41 @@ enum BatchedFeMessage {
},
}

impl BatchedFeMessage {
async fn throttle(&mut self, cancel: &CancellationToken) -> Result<(), QueryError> {
let (shard, tokens, timers) = match self {
BatchedFeMessage::Exists { shard, timer, .. }
| BatchedFeMessage::Nblocks { shard, timer, .. }
| BatchedFeMessage::DbSize { shard, timer, .. }
| BatchedFeMessage::GetSlruSegment { shard, timer, .. } => {
(
shard,
// 1 token is probably under-estimating because these
// request handlers typically do several Timeline::get calls.
1,
itertools::Either::Left(std::iter::once(timer)),
)
}
BatchedFeMessage::GetPage { shard, pages, .. } => (
shard,
pages.len(),
itertools::Either::Right(pages.iter_mut().map(|(_, _, timer)| timer)),
),
BatchedFeMessage::RespondError { .. } => return Ok(()),
};
let throttled = tokio::select! {
throttled = shard.pagestream_throttle.throttle(tokens) => { throttled }
_ = cancel.cancelled() => {
return Err(QueryError::Shutdown);
}
};
for timer in timers {
timer.deduct_throttle(&throttled);
}
Ok(())
}
}

impl PageServerHandler {
pub fn new(
tenant_manager: Arc<TenantManager>,
Expand Down Expand Up @@ -1157,13 +1192,18 @@ impl PageServerHandler {
Ok(msg) => msg,
Err(e) => break e,
};
let msg = match msg {
let mut msg = match msg {
Some(msg) => msg,
None => {
debug!("pagestream subprotocol end observed");
return ((pgb_reader, timeline_handles), Ok(()));
}
};

if let Err(cancelled) = msg.throttle(&self.cancel).await {
break cancelled;
}

let err = self
.pagesteam_handle_batched_message(pgb_writer, msg, &cancel, ctx)
.await;
Expand Down Expand Up @@ -1321,12 +1361,13 @@ impl PageServerHandler {
return Ok(());
}
};
let batch = match batch {
let mut batch = match batch {
Ok(batch) => batch,
Err(e) => {
return Err(e);
}
};
batch.throttle(&self.cancel).await?;
self.pagesteam_handle_batched_message(pgb_writer, batch, &cancel, &ctx)
.await?;
}
Expand Down
Loading

1 comment on commit 4d422b9

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6506 tests run: 6206 passed, 0 failed, 300 skipped (full report)


Flaky tests (4)

Postgres 16

Postgres 15

Postgres 14

Code coverage* (full report)

  • functions: 30.7% (8265 of 26917 functions)
  • lines: 47.7% (65164 of 136574 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
4d422b9 at 2024-12-03T17:52:42.988Z :recycle:

Please sign in to comment.