Skip to content

Commit

Permalink
rel: prep 2.7 release (#1255)
Browse files Browse the repository at this point in the history
Prep 2.7 release

---------

Co-authored-by: Phillip Carter <[email protected]>
  • Loading branch information
kentquirk and cartermp authored Jul 29, 2024
1 parent 70781b0 commit e24f371
Show file tree
Hide file tree
Showing 11 changed files with 179 additions and 50 deletions.
50 changes: 50 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,55 @@
# Refinery Changelog

## 2.7.0 2024-07-29

This release incorporates a new publish/subscribe (pubsub) system for faster and cleaner communication between Refinery nodes.
In particular, the way Refinery uses Redis has changed.
See full details in [the Release Notes](./RELEASE_NOTES.md).

### Features

- feat: Add metrics to pubsub and peers (#1226) | [Kent Quirk](https://github.com/kentquirk)
- feat: add otel tracing support for Refinery internal operations (#1218) | [Yingrong Zhao](https://github.com/vinozzZ)
- feat: Add some useful generics (#1206) | [Kent Quirk](https://github.com/kentquirk)
- feat: gossip config reload information (#1241) | [Kent Quirk](https://github.com/kentquirk)
- feat: Health/Ready system imported from R3 (#1231) | [Kent Quirk](https://github.com/kentquirk)
- feat: peer management on pubsub via callbacks (#1220) | [Kent Quirk](https://github.com/kentquirk)
- feat: track config hash on config reload (#1212) | [Yingrong Zhao](https://github.com/vinozzZ)
- feat: use pub/sub for stress relief (#1221) | [Yingrong Zhao](https://github.com/vinozzZ)
- feat: Working, tested, but unused pubsub system (#1205) | [Kent Quirk](https://github.com/kentquirk)

### Fixes

- fix: add injection tags for configwatcher (#1246) | [Yingrong Zhao](https://github.com/vinozzZ)
- fix: add peer logging, add debug log of peers (#1239) | [Kent Quirk](https://github.com/kentquirk)
- fix: allow a single node to activate stress relief mode during significant load increase (#1256) | [Yingrong Zhao](https://github.com/vinozzZ)
- fix: allow sending otel tracing to non honeycomb backend (#1219) | [Yingrong Zhao](https://github.com/vinozzZ)
- fix: Change pubsub interface to use callbacks. (#1217) | [Kent Quirk](https://github.com/kentquirk)
- fix: clean up a print line (#1250) | [Yingrong Zhao](https://github.com/vinozzZ)
- fix: FilePeers implies no Redis (#1251) | [Kent Quirk](https://github.com/kentquirk)
- fix: make sure stress relief pub/sub topic is consistent (#1245) | [Yingrong Zhao](https://github.com/vinozzZ)
- fix: make sure to inject Health object as a pointer (#1237) | [Yingrong Zhao](https://github.com/vinozzZ)
- fix: Record hashes at startup in metrics (#1252) | [Kent Quirk](https://github.com/kentquirk)
- fix: reduce pub/sub messages from stress relief (#1248) | [Yingrong Zhao](https://github.com/vinozzZ)
- fix: remove otel-config-go as a dependency (#1240) | [Yingrong Zhao](https://github.com/vinozzZ)
- fix: remove personal api keys (#1253) | [Kent Quirk](https://github.com/kentquirk)
- fix: Root spans must have a non-empty parent ID field (#1236) | [Mike Goldsmith](https://github.com/MikeGoldsmith)
- fix: sharder should use peer identity from Peers package (#1249) | [Yingrong Zhao](https://github.com/vinozzZ)

### Maintenance

- docs: Tweak docs for reload (#1247) | [Kent Quirk](https://github.com/kentquirk)
- docs: update vulnerability reporting process (#1224) | [Robb Kidd](https://github.com/Robb Kidd)
- maint: add instrumentation for GoRedisPubSub (#1229) | [Yingrong Zhao](https://github.com/vinozzZ)
- maint: Add jitter to peer traffic, fix startup (#1227) | [Kent Quirk](https://github.com/kentquirk)
- maint: change targeted arch to arm for local development Dockerfile (#1228) | [Yingrong Zhao](https://github.com/vinozzZ)
- maint: last changes before the final release prep (#1254) | [Kent Quirk](https://github.com/kentquirk)
- maint: update doc based on config changes (#1243) | [Yingrong Zhao](https://github.com/vinozzZ)
- maint: Update licenses (#1244) | [Tyler Helmuth](https://github.com/Tyler Helmuth)
- maint(deps): bump google.golang.org/grpc from 1.64.0 to 1.64.1 (#1223) | [dependabot[bot]](https://github.com/dependabot[bot])
- maint(deps): bump the minor-patch group across 1 directory with 9 updates (#1232) | [dependabot[bot]](https://github.com/dependabot[bot])


## 2.6.1 2024-06-17

### Fixes
Expand Down
63 changes: 62 additions & 1 deletion RELEASE_NOTES.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,70 @@

While [CHANGELOG.md](./CHANGELOG.md) contains detailed documentation and links to all the source code changes in a given release, this document is intended to be aimed at a more comprehensible version of the contents of the release from the point of view of users of Refinery.

## Version 2.7.0

This release is a minor release focused on better cluster stability and data quality with a new system for communicating peer information across nodes.
As a result, clusters should generally behave more consistently.

Refinery 2.7 lays the groundwork for substantial future changes to Refinery.

### Publish/Subscribe on Redis
In this release, Redis is no longer a database for storing a list of peers.
Instead, it is used as a more general publish/subscribe framework for rapidly sharing information between nodes in the cluster.
Things that are shared with this connection are:

- Peer membership
- Stress levels
- News of configuration changes

Because of this mechanism, Refinery will now react more quickly to changes in any of these factors.
When one node detects a configuration change, all of its peers will be told about it immediately.

In addition, Refinery now publishes individual stress levels between peers.
Nodes calculate a cluster stress level as a weighted average (with nodes that are more stressed getting more weight).
If an individual node is stressed, it can enter stress relief individually.
This may happen, for example, when a single giant trace is concentrated on one node.
If the cluster as a whole is being stressed by a general burst in traffic, the entire cluster should now enter or leave stress relief at approximately the same time.

If your existing Redis instance is particularly small, you may find it necessary to increase its CPU or network allocations.

### Health checks now include both liveness and readiness

Refinery has always had only a liveness check on `/alive`, which always simply returned ok.

Starting with this release, Refinery now supports both `/alive` and `/ready`, which are based on internal status reporting.

The liveness check is alive whenever Refinery is awake and internal systems are functional.
It will return a failure if any of the monitored systems fail to report in time.

The readiness check returns ready whenever the monitored systems indicate readiness.
It will return a failure if any internal system returns not ready.
This is usually used to indicate to a load balancer that no new traffic should go to this node.
In this release, this will only happen when a Refinery node is shutting down.

### Metrics changes
There have also been some minor changes to metrics in this release:

We have two new metrics called `individual_stress_level` (the stress level as seen by a single node) and `cluster_stress_level` (the aggregated cluster level).
The `stress_level` metric indicates the maximum of the two values; it is this value which is used to determine whether an individual node activates stress relief.

There is also a new pair of metrics, `config_hash` and `rule_config_hash`.
These are numeric Gauge metrics that are set to the numeric value of the last 4 hex digits of the hash of the current config files.
These can be used to track that all refineries are using the same configuration file.

### Disabling Redis and using a static list of peers
Specifying `PeerManagement.Type=file` will cause Refinery to use the fixed list of peers found in the configuration.
This means that Refinery will operate without sharing changes to peers, stress, or configuration, as it has in previous releases.

### Config Change notifications
When deploying a cluster in Kubernetes, it is often the case that configurations are managed as a ConfigMap.
In the default setup, ConfigMaps are eventually consistent.
This may mean that one Refinery node will detect a configuration change and broadcast news of it, but a different node that receives the news will attempt to read the data and get the previous configuration.
In this situation, the change will still be detected by all Refineries within the `ConfigReloadInterval`.

## Version 2.6.1

This is a bug fix release.
This is a bug fix release.
In the log handling logic newly introduced in v2.6.0, Refinery would incorrectly consider log events to be root spans in a trace.
After this fix, log events can never be root spans.
This is recommended for everyone who wants to use the new log handling capabilities.
Expand Down
2 changes: 1 addition & 1 deletion RELEASING.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
1. Check that licenses are current with `make verify-licenses`
2. Regenerate documentation with `make all` from within the `tools/convert` folder. If there have
been changes to `rules.md`, you may need to manually modify the `rules_complete.yaml` to reflect the same change.
3. If either `refinery_config.md` or `refinery_rules.md` were modified in this release, you must also open a [docs](https://github.com/honeycombio/docs) PR and update these files there under `subpages/refinery/` .
3. If either `refinery_config.md` or `refinery_rules.md` were modified in this release, you must also open a [docs](https://github.com/honeycombio/docs) PR and update these files there under `layouts/shortcodes/subpages/refinery/` .
Replace the underscores (`_`) in the filenames with a dash (`-`) or the docs linter will be upset.
Address any feedback from the the docs team and apply that feedback back into this repo.
4. After addressing any docs change, add release entry to [changelog](./CHANGELOG.md)
Expand Down
14 changes: 8 additions & 6 deletions config.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Honeycomb Refinery Configuration Documentation

This is the documentation for the configuration file for Honeycomb's Refinery.
It was automatically generated on 2024-07-24 at 20:24:30 UTC.
It was automatically generated on 2024-07-29 at 17:37:53 UTC.

## The Config file

Expand Down Expand Up @@ -628,7 +628,7 @@ In rare circumstances, compression costs may outweigh the benefits, in which cas
`OTelTracing` contains configuration for Refinery's own tracing.
### `Enabled`

Enabled controls whether to send Refinery's own otel traces.
Enabled controls whether to send Refinery's own OpenTelemetry traces.

The setting specifies if Refinery sends traces.

Expand All @@ -650,7 +650,7 @@ Refinery's internal traces will be sent to the `/v1/traces` endpoint on this hos
APIKey is the API key used to send Refinery's traces to Honeycomb.

It is recommended that you create a separate team and key for Refinery telemetry.
If this is blank, then Refinery will not set the Honeycomb-specific headers for OpenTelemetry, and your `APIHost` must be set to a valid OpenTelemetry endpoint.
If this value is blank, then Refinery will not set the Honeycomb-specific headers for OpenTelemetry, and your `APIHost` must be set to a valid OpenTelemetry endpoint.

- Not eligible for live reload.
- Type: `string`
Expand All @@ -672,7 +672,7 @@ Only used if `APIKey` is specified.
SampleRate is the rate at which Refinery samples its own traces.

This is the Honeycomb sample rate used to sample traces sent by Refinery.
Since each incoming span generates multiple outgoing spans, a sample rate of at least 100 is strongly advised.
Since each incoming span generates multiple outgoing spans, a minimum sample rate of `100` is strongly advised.

- Eligible for live reload.
- Type: `int`
Expand All @@ -687,8 +687,10 @@ Type is the type of peer management to use.

Peer management is the mechanism by which Refinery locates its peers.
`file` means that Refinery gets its peer list from the Peers list in this config file.
`redis` means that Refinery uses a Publish/Subscribe mechanism, implemented on Redis, to propagate peer lists much more quickly than the legacy mechanism.
This is the recommended setting, especially for new installations.
It also prevents Refinery from using a publish/subscribe mechanism to propagate peer lists, stress levels, and configuration changes.
`redis` means that Refinery uses a Publish/Subscribe mechanism, implemented on Redis, to propagate peer lists, stress levels, and notification of configuration changes much more quickly than the legacy mechanism.
The recommended setting is `redis`, especially for new installations.
If `redis` is specified, fields in `RedisPeerManagement` must also be set.

- Not eligible for live reload.
- Type: `string`
Expand Down
16 changes: 9 additions & 7 deletions config/metadata/configMeta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -753,7 +753,7 @@ groups:
default: false
reload: false
firstversion: v2.6
summary: controls whether to send Refinery's own otel traces.
summary: controls whether to send Refinery's own OpenTelemetry traces.
description: >
The setting specifies if Refinery sends traces.
Expand Down Expand Up @@ -786,9 +786,9 @@ groups:
It is recommended that you create a separate team and key for
Refinery telemetry.
If this is blank, then Refinery will not set the Honeycomb-specific
headers for OpenTelemetry, and your `APIHost` must be set to a
valid OpenTelemetry endpoint.
If this value is blank, then Refinery will not set the
Honeycomb-specific headers for OpenTelemetry, and your `APIHost` must
be set to a valid OpenTelemetry endpoint.
- name: Dataset
type: string
Expand All @@ -813,7 +813,7 @@ groups:
summary: is the rate at which Refinery samples its own traces.
description: >
This is the Honeycomb sample rate used to sample traces sent by Refinery. Since each
incoming span generates multiple outgoing spans, a sample rate of at least 100 is
incoming span generates multiple outgoing spans, a minimum sample rate of `100` is
strongly advised.
- name: PeerManagement
Expand Down Expand Up @@ -841,8 +841,10 @@ groups:
`redis` means that Refinery uses a Publish/Subscribe mechanism,
implemented on Redis, to propagate peer lists, stress levels, and
notification of configuration changes much more quickly than the
legacy mechanism. This is the recommended setting, especially for new
installations. If this is specified, fields in `RedisPeerManagement`
legacy mechanism.
The recommended setting is `redis`, especially for new
installations. If `redis` is specified, fields in `RedisPeerManagement`
must also be set.
- name: Identifier
Expand Down
8 changes: 4 additions & 4 deletions config/metadata/rulesMeta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -674,10 +674,10 @@ groups:
The comparison operator to use. String comparisons are case-sensitive.
For most cases, use negative operators (`!=`, `does-not-contain`, and
`not-exists`) in a rule with a scope of "span".
WARNING: Rules can have `Scope: trace` or `Scope: span`; a negative
operator with `Scope: trace` will be true if **any** single span in the
entire trace matches the negative condition.
This is almost never desired behavior.
WARNING: Rules can have `Scope: trace` or `Scope: span`. Using a negative
operator with `Scope: trace` will cause the condition be true if **any**
single span in the entire trace matches. Use `Scope: span` with negative
operators.
- name: Value
type: anyscalar
summary: is the value to compare against.
Expand Down
27 changes: 16 additions & 11 deletions config_complete.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
## Honeycomb Refinery Configuration ##
######################################
#
# created on 2024-07-24 at 20:24:30 UTC from ../../config.yaml using a template generated on 2024-07-24 at 20:24:27 UTC
# created on 2024-07-29 at 17:37:52 UTC from ../../config.yaml using a template generated on 2024-07-29 at 17:37:49 UTC

# This file contains a configuration for the Honeycomb Refinery. It is in YAML
# format, organized into named groups, each of which contains a set of
Expand Down Expand Up @@ -643,7 +643,7 @@ OTelMetrics:
OTelTracing:
## OTelTracing contains configuration for Refinery's own tracing.
####
## Enabled controls whether to send Refinery's own otel traces.
## Enabled controls whether to send Refinery's own OpenTelemetry traces.
##
## The setting specifies if Refinery sends traces.
##
Expand All @@ -664,9 +664,9 @@ OTelTracing:
##
## It is recommended that you create a separate team and key for Refinery
## telemetry.
## If this is blank, then Refinery will not set the Honeycomb-specific
## headers for OpenTelemetry, and your `APIHost` must be set to a valid
## OpenTelemetry endpoint.
## If this value is blank, then Refinery will not set the
## Honeycomb-specific headers for OpenTelemetry, and your `APIHost` must
## be set to a valid OpenTelemetry endpoint.
##
## Not eligible for live reload.
# APIKey: ""
Expand All @@ -684,7 +684,7 @@ OTelTracing:
##
## This is the Honeycomb sample rate used to sample traces sent by
## Refinery. Since each incoming span generates multiple outgoing spans,
## a sample rate of at least 100 is strongly advised.
## a minimum sample rate of `100` is strongly advised.
##
## default: 100
## Eligible for live reload.
Expand All @@ -701,11 +701,16 @@ PeerManagement:
##
## Peer management is the mechanism by which Refinery locates its peers.
## `file` means that Refinery gets its peer list from the Peers list in
## this config file.
## this config file. It also prevents Refinery from using a
## publish/subscribe mechanism to propagate peer lists, stress levels,
## and configuration changes.
## `redis` means that Refinery uses a Publish/Subscribe mechanism,
## implemented on Redis, to propagate peer lists much more quickly than
## the legacy mechanism. This is the recommended setting, especially for
## new installations.
## implemented on Redis, to propagate peer lists, stress levels, and
## notification of configuration changes much more quickly than the
## legacy mechanism.
## The recommended setting is `redis`, especially for new installations.
## If `redis` is specified, fields in `RedisPeerManagement` must also be
## set.
##
## default: file
## Not eligible for live reload.
Expand Down Expand Up @@ -997,8 +1002,8 @@ Specialized:
##
## Eligible for live reload.
# AdditionalAttributes:
# ClusterName: MyCluster
# environment: production
# ClusterName: MyCluster

###############
## ID Fields ##
Expand Down
Loading

0 comments on commit e24f371

Please sign in to comment.