Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests and foundations for new cluster peer discovery #1311

Merged
merged 2 commits into from
Jul 31, 2024

Conversation

thampiotr
Copy link
Contributor

@thampiotr thampiotr commented Jul 17, 2024

PR Description

As part of this issue investigation: #1208 we found that there are missing tests and a few inconsistencies in the way that cluster peer discovery currently works.

Since this is a critical feature used by large scale users, we want to ensure that these are well-tested paths and bugs and inconsistencies are eliminated.

In order to isolate these changes and future refactoring, this PR:

  • Introduces a new package and functions that perform peer discovery when PublicPreview stability level is used.
  • The new package is a copy of the existing behaviour for now, but is more testable.
  • Tests are added with some comments about the discovered problems and inconsistencies (see TODO comments).

This sets a foundation for further changes:

  • Use the A/AAAA records for DNS discovery
  • Reduce the inconsistencies and issues mentioned in the tests (see TODO comments)

PR Checklist

  • Tests updated

@thampiotr thampiotr force-pushed the thampiotr/cluster-dns-queries-v2 branch from 5b78f1d to 15e8037 Compare July 17, 2024 13:58
@thampiotr thampiotr force-pushed the thampiotr/cluster-dns-queries-v2 branch from 15e8037 to 6c31bc4 Compare July 30, 2024 15:07
@thampiotr thampiotr changed the title WIP: Improve cluster peers discovery Tests and foundations for new cluster peer discovery Jul 30, 2024
Comment on lines +87 to +105
switch {
case len(opts.JoinPeers) > 0 && opts.DiscoverPeers != "":
return nil, fmt.Errorf("at most one of join peers and discover peers may be set")

case len(opts.JoinPeers) > 0:
config.DiscoverPeers = newStaticDiscovery(opts.JoinPeers, listenPort, opts.Log)

case opts.DiscoverPeers != "":
discoverFunc, err := newDynamicDiscovery(config.Log, opts.DiscoverPeers, listenPort)
if err != nil {
return nil, err
}
config.DiscoverPeers = discoverFunc

default:
// Here, both JoinPeers and DiscoverPeers are empty. This is desirable when
// starting a seed node that other nodes connect to, so we don't require
// one of the fields to be set.
default:
// Here, both JoinPeers and DiscoverPeers are empty. This is desirable when
// starting a seed node that other nodes connect to, so we don't require
// one of the fields to be set.
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff doesn't make it easy to see that this else is basically the original cluster peer discovery code.

internal/alloycli/cmd_run.go Outdated Show resolved Hide resolved
"github.com/go-kit/log"
)

func TestPeerDiscovery(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests help to highlight some of the issues that I want to address on top of using A records in the follow-up PR.

@thampiotr thampiotr marked this pull request as ready for review July 30, 2024 15:19
Comment on lines +141 to +142
The `--cluster.use-discovery-v1` flag can be used to switch back to the older, v1 version of the cluster peer discovery mechanism
in case of any issues with the newer version. This flag will be deprecated in the future and eventually removed.
Copy link
Contributor Author

@thampiotr thampiotr Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I add support for A records, I will add here a note about what differences are between these mechanisms in the follow-up PR.

I will also add something to CHANGELOG.md once we actually have functional changes - this PR doesn't make any behaviour changes.

Copy link
Contributor

@wildum wildum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, the new structure and the tests are great. The code duplication is not ideal but it's only temporary

internal/alloycli/cmd_run.go Outdated Show resolved Hide resolved
@thampiotr thampiotr force-pushed the thampiotr/cluster-dns-queries-v2 branch from 5a0900c to 920382e Compare July 31, 2024 13:37
@thampiotr thampiotr force-pushed the thampiotr/cluster-dns-queries-v2 branch from 920382e to d722910 Compare July 31, 2024 13:46
@thampiotr thampiotr merged commit 020713c into main Jul 31, 2024
18 checks passed
@thampiotr thampiotr deleted the thampiotr/cluster-dns-queries-v2 branch July 31, 2024 14:19
thampiotr added a commit that referenced this pull request Jul 31, 2024
thampiotr added a commit that referenced this pull request Jul 31, 2024
@clayton-cornell clayton-cornell added the type/docs Docs Squad label across all Grafana Labs repos label Jul 31, 2024
thampiotr added a commit that referenced this pull request Aug 2, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 31, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
frozen-due-to-age type/docs Docs Squad label across all Grafana Labs repos
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants