Skip to content

Commit

Permalink
monotonic counters should store values as uint64_t
Browse files Browse the repository at this point in the history
We have use cases in the environment, where monotonic counters are stored as
unit64_t values, particularly for Open Connect Appliances. Since the monotonic
counter data types in spectatord are convenience wrappers that help users avoid
the need to calculate their own deltas, they should be uint64_t types here. The
actual deltas that are calculated and sent to the backend are doubles, which
matches the backend data type used by Atlas. This is acceptable, because the
deltas are expected to be much smaller values.

Some of the newer thin clients, such as spectator-go, specify the uint64 data
type for monotonic counters.
  • Loading branch information
copperlight committed Jun 3, 2024
1 parent 02c19b3 commit d30a967
Show file tree
Hide file tree
Showing 12 changed files with 165 additions and 109 deletions.
27 changes: 15 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,18 +89,21 @@ echo -e "t:server.requestLatency:0.042\nd:server.responseSizes:1024" | nc -u -w0

### Metric Types

| Symbol | Metric Type | Description |
|--------|------------|-------------|
| `c` | Counter | The value is the number of increments that have occurred since the last time it was recorded. |
| `d` | Distribution Summary | The value tracks the distribution of events. It is similar to a Timer, but more general, because the size does not have to be a period of time. <br><br> For example, it can be used to measure the payload sizes of requests hitting a server or the number of records returned from a query. |
| `g` | Gauge | The value is a number that was sampled at a point in time. The default time-to-live (TTL) for gauges is 900 seconds (15 minutes) - they will continue reporting the last value set for this duration of time. <br><br> Optionally, the TTL may be specified in seconds, with a minimum TTL of 5 seconds. For example, `g,120:gauge:42.0` spcifies a gauge with a 120 second (2 minute) TTL. |
| `m` | Max Gauge | The value is a number that was sampled at a point in time, but it is reported as a maximum gauge value to the backend. |
| `t` | Timer | The value is the number of seconds that have elapsed for an event. |
| `A` | Age Gauge | The value is the time in seconds since the epoch at which an event has successfully occurred, or `0` to use the current time in epoch seconds. After an Age Gauge has been set, it will continue reporting the number of seconds since the last time recorded, for as long as the spectatord process runs. The purpose of this metric type is to enable users to more easily implement the Time Since Last Success alerting pattern. <br><br> To set a specific time as the last success: `A:time.sinceLastSuccess:1611081000`. <br><br> To set `now()` as the last success: `A:time.sinceLastSuccess:0`. <br><br> By default, a maximum of `1000` Age Gauges are allowed per `spectatord` process, because there is no mechanism for cleaning them up. This value may be tuned with the `--age_gauge_limit` flag on the spectatord binary. |
| `C` | Monotonic Counter | The value is a monotonically increasing number. A minimum of two samples must be received in order for `spectatord` to calculate a delta value and report it to the backend. <br><br> A variety of networking metrics may be reported monotically and this metric type provides a convenient means of recording these values, at the expense of a slower time-to-first metric. |
| `D` | Percentile Distribution Summary | The value tracks the distribution of events, with percentile estimates. It is similar to a Percentile Timer, but more general, because the size does not have to be a period of time. <br><br> For example, it can be used to measure the payload sizes of requests hitting a server or the number of records returned from a query. <br><br> In order to maintain the data distribution, they have a higher storage cost, with a worst-case of up to 300X that of a standard Distribution Summary. Be diligent about any additional dimensions added to Percentile Distribution Summaries and ensure that they have a small bounded cardinality. |
| `T` | Percentile Timer | The value is the number of seconds that have elapsed for an event, with percentile estimates. <br><br> This metric type will track the data distribution by maintaining a set of Counters. The distribution can then be used on the server side to estimate percentiles, while still allowing for arbitrary slicing and dicing based on dimensions. <br><br> In order to maintain the data distribution, they have a higher storage cost, with a worst-case of up to 300X that of a standard Timer. Be diligent about any additional dimensions added to Percentile Timers and ensure that they have a small bounded cardinality. |
| `X` | Monotonic Counter with Millisecond Timestamps | The value is a monotonically increasing number, sampled at a specified number of milliseconds since the epoch. A minimum of two samples must be received in order for `spectatord` to calculate a delta value and report it to the backend. <br><br> This is an experimental metric type that can be used to track monotonic sources that were sampled in the recent past, with the value normalized over the reported time period. <br><br> The timestamp in milliseconds since the epoch when the value was sampled must be included as a metric option: `X,1543160297100:monotonic.Source:42` |
| Symbol | Metric Type | Description |
|--------|-----------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `c` | Counter | The value is the number of increments that have occurred since the last time it was recorded. |
| `d` | Distribution Summary | The value tracks the distribution of events. It is similar to a Timer, but more general, because the size does not have to be a period of time. <br><br> For example, it can be used to measure the payload sizes of requests hitting a server or the number of records returned from a query. |
| `g` | Gauge | The value is a number that was sampled at a point in time. The default time-to-live (TTL) for gauges is 900 seconds (15 minutes) - they will continue reporting the last value set for this duration of time. <br><br> Optionally, the TTL may be specified in seconds, with a minimum TTL of 5 seconds. For example, `g,120:gauge:42.0` spcifies a gauge with a 120 second (2 minute) TTL. |
| `m` | Max Gauge | The value is a number that was sampled at a point in time, but it is reported as a maximum gauge value to the backend. |
| `t` | Timer | The value is the number of seconds that have elapsed for an event. |
| `A` | Age Gauge | The value is the time in seconds since the epoch at which an event has successfully occurred, or `0` to use the current time in epoch seconds. After an Age Gauge has been set, it will continue reporting the number of seconds since the last time recorded, for as long as the spectatord process runs. The purpose of this metric type is to enable users to more easily implement the Time Since Last Success alerting pattern. <br><br> To set a specific time as the last success: `A:time.sinceLastSuccess:1611081000`. <br><br> To set `now()` as the last success: `A:time.sinceLastSuccess:0`. <br><br> By default, a maximum of `1000` Age Gauges are allowed per `spectatord` process, because there is no mechanism for cleaning them up. This value may be tuned with the `--age_gauge_limit` flag on the spectatord binary. |
| `C` | Monotonic Counter | The value is a monotonically increasing number. A minimum of two samples must be received in order for `spectatord` to calculate a delta value and report it to the backend. The value should be a `uint64` data type, and it will handle rollovers. <br><br> A variety of networking metrics may be reported monotonically and this metric type provides a convenient means of recording these values, at the expense of a slower time-to-first metric. |
| `D` | Percentile Distribution Summary | The value tracks the distribution of events, with percentile estimates. It is similar to a Percentile Timer, but more general, because the size does not have to be a period of time. <br><br> For example, it can be used to measure the payload sizes of requests hitting a server or the number of records returned from a query. <br><br> In order to maintain the data distribution, they have a higher storage cost, with a worst-case of up to 300X that of a standard Distribution Summary. Be diligent about any additional dimensions added to Percentile Distribution Summaries and ensure that they have a small bounded cardinality. |
| `T` | Percentile Timer | The value is the number of seconds that have elapsed for an event, with percentile estimates. <br><br> This metric type will track the data distribution by maintaining a set of Counters. The distribution can then be used on the server side to estimate percentiles, while still allowing for arbitrary slicing and dicing based on dimensions. <br><br> In order to maintain the data distribution, they have a higher storage cost, with a worst-case of up to 300X that of a standard Timer. Be diligent about any additional dimensions added to Percentile Timers and ensure that they have a small bounded cardinality. |
| `X` | Monotonic Counter with Millisecond Timestamps | The value is a monotonically increasing number, sampled at a specified number of milliseconds since the epoch. A minimum of two samples must be received in order for `spectatord` to calculate a delta value and report it to the backend. The value should be a `uint64` data type, and it will handle rollovers. <br><br> This is an experimental metric type that can be used to track monotonic sources that were sampled in the recent past, with the value normalized over the reported time period. <br><br> The timestamp in milliseconds since the epoch when the value was sampled must be included as a metric option: `X,1543160297100:monotonic.Source:42` |

The data type for all numbers except `C` and `X` is `double`. The `C` and `X` values are recorded as `uint64_t`, and
the calculated deltas are passed to the backend as `double`.

### Metric Name and Tags

Expand Down
50 changes: 30 additions & 20 deletions server/spectatord.cc
Original file line number Diff line number Diff line change
Expand Up @@ -217,7 +217,7 @@ auto Server::parse_statsd_line(const char* buffer)
return {};
}

auto get_measurement(std::string_view measurement_str, std::string* err_msg)
auto get_measurement(char type, std::string_view measurement_str, std::string* err_msg)
-> std::optional<measurement> {
// get name (tags are specified with , but are optional)
auto pos = measurement_str.find_first_of(",:");
Expand Down Expand Up @@ -246,20 +246,29 @@ auto get_measurement(std::string_view measurement_str, std::string* err_msg)
pos = v_pos;
}
}

++pos;
auto value_str = measurement_str.begin() + pos;
char* last_char = nullptr;
auto value = std::strtod(value_str, &last_char);
valueT value{};
if (type == 'C' || type == 'X') {
value.u = std::strtoull(value_str, &last_char, 10);
} else {
value.d = std::strtod(value_str, &last_char);
}

if (last_char == value_str) {
// unable to parse a double
*err_msg = "Unable to parse value for measurement";
return {};
}
if (*last_char != '\0' && std::isspace(*last_char) == 0) {
// just a warning
*err_msg =
fmt::format("Got {} parsing value, ignoring chars starting at {}",
value, last_char);
if (type == 'C' || type == 'X') {
*err_msg = fmt::format("Got {} parsing value, ignoring chars starting at {}",
value.u, last_char);
} else {
*err_msg = fmt::format("Got {} parsing value, ignoring chars starting at {}",
value.d, last_char);
}
}
auto name_ref = spectator::intern_str(name);
return measurement{spectator::Id{name_ref, tags}, value};
Expand Down Expand Up @@ -502,7 +511,7 @@ auto Server::parse_line(const char* buffer) -> std::optional<std::string> {
}
++p;
std::string err_msg;
auto measurement = get_measurement(p, &err_msg);
auto measurement = get_measurement(type, p, &err_msg);
if (!measurement) {
return err_msg;
}
Expand All @@ -515,61 +524,62 @@ auto Server::parse_line(const char* buffer) -> std::optional<std::string> {
case 't':
// timer, elapsed time is reported in seconds
{
auto nanos = static_cast<int64_t>(measurement->value * 1e9);
auto nanos = static_cast<int64_t>(measurement->value.d * 1e9);
registry_->GetTimer(measurement->id)
->Record(std::chrono::nanoseconds(nanos));
}
break;
case 'c':
// counter
registry_->GetCounter(measurement->id)->Add(measurement->value);
registry_->GetCounter(measurement->id)->Add(measurement->value.d);
break;
case 'C':
// monotonic counters
registry_->GetMonotonicCounter(measurement->id)->Set(measurement->value);
registry_->GetMonotonicCounter(measurement->id)->Set(measurement->value.u);
break;
case 'g':
// gauge
if (extra > 0) {
registry_->GetGauge(measurement->id, absl::Seconds(extra))
->Set(measurement->value);
->Set(measurement->value.d);
} else {
// this preserves the previous Ttl, otherwise we would override it
// with the default value if we use the previous constructor
registry_->GetGauge(measurement->id)->Set(measurement->value);
registry_->GetGauge(measurement->id)->Set(measurement->value.d);
}
break;
case 'm':
registry_->GetMaxGauge(measurement->id)->Update(measurement->value);
registry_->GetMaxGauge(measurement->id)->Update(measurement->value.d);
break;
case 'A':
if (measurement->value == 0) {
if (measurement->value.d == 0) {
registry_->GetAgeGauge(measurement->id)->UpdateLastSuccess();
} else {
registry_->GetAgeGauge(measurement->id)
->UpdateLastSuccess(static_cast<int64_t>(measurement->value * 1e9));
->UpdateLastSuccess(static_cast<int64_t>(measurement->value.d * 1e9));
}
break;
case 'd':
// dist summary
registry_->GetDistributionSummary(measurement->id)
->Record(measurement->value);
->Record(measurement->value.d);
break;
case 'T': {
auto nanos = static_cast<int64_t>(measurement->value * 1e9);
auto nanos = static_cast<int64_t>(measurement->value.d * 1e9);
perc_timers_.get_or_create(registry_, measurement->id)
->Record(std::chrono::nanoseconds(nanos));
} break;
case 'D':
perc_ds_.get_or_create(registry_, measurement->id)
->Record(static_cast<int64_t>(measurement->value));
->Record(static_cast<int64_t>(measurement->value.d));
break;
case 'X':
if (extra > 0) {
// TODO: do we need to get this to match uint64_t?
// extra is milliseconds since the epoch
auto nanos = extra * 1000 * 1000;
registry_->GetMonotonicSampled(measurement->id)
->Set(measurement->value, nanos);
->Set(measurement->value.u, nanos);
}
break;
default:
Expand Down
Loading

0 comments on commit d30a967

Please sign in to comment.