Skip to content
This repository has been archived by the owner on Aug 13, 2019. It is now read-only.

move WAL tailing code from Prometheus to TSDB WAL package #606

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

cstyan
Copy link
Contributor

@cstyan cstyan commented May 17, 2019

This is probably a bit of a pain to review, but moving the WAL watcher code from Prometheus to TSDB (into the WAL package) requires moving some code from the root tsdb package. Specifically the Watcher uses RefSeries and RefSample. Moving those requires a bunch of other code to be moved as well.

If anyone knows a nicer way to make the RefSeries/RefSamples change or has thoughts about the new package organization let me know, none of this is set in stone.

cc @tomwilkie @gouthamve

@codesome codesome marked this pull request as ready for review May 31, 2019 12:38
@cstyan cstyan force-pushed the callum-move-wal-watcher branch 3 times, most recently from 7dfb442 to 1caa80c Compare June 5, 2019 00:01
Copy link
Contributor

@csmarchbanks csmarchbanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I think moving all the record related code to a record package makes a lot of sense. I would try to limit code in that package to just types/methods directly interacting with a record though. Left a couple more specific comments around things you might be able to move out.

record/internal.go Outdated Show resolved Hide resolved
record/internal.go Outdated Show resolved Hide resolved
record/internal.go Outdated Show resolved Hide resolved
record/tombstones.go Outdated Show resolved Hide resolved
@csmarchbanks
Copy link
Contributor

csmarchbanks commented Jun 6, 2019

Thanks for all the responses, they helped me get my head around all this! As you said, the amount of code in the root package can be problematic to wrap your head around.

My main concern with the current state of this PR is that the record package appears to be random types with no obvious relation. I don't think it will help the code being more isolated/easier to wrap your head around.

After thinking about this for a couple hours:

  1. I am pretty against memSeries being moved away from head.
    • Perhaps have a slice of *memSeries next to the slice of RefSamples in headAppender instead of having RefSample to depend on memSeries?
  2. Should the record code (and RefSample/RefSeries) live in wal? Are records used anywhere else? Otherwise I like them living in their own record package.
  3. Perhaps tombstones should be in their own package? I don't feel too strongly about this either, just having Stone live wherever RefSample does would be fine by me too.

I am definitely curious what you and the maintainers think!

@codesome
Copy link
Contributor

codesome commented Jun 6, 2019

As I am not much familiar with WAL tailing yet, I am still wrapping my head about what the record package should be. But I agree with some points which @csmarchbanks said

  • memSeries looks closely tied with Head and would be good to keep it there.
  • If there are only a few things used from tombstones.go in the record, then duplicate only those constructs and not the entire file. Else tombstones as a separate package would make more sense than being in inside record. (I support the former)

I will take a better look before the weekend hits.

@cstyan
Copy link
Contributor Author

cstyan commented Jun 7, 2019

I've moved the tombstones code to it's own package. Unfortunately keeping memSeries in head and have the records package RefSample not contain memSeries is going to be pretty messy.

@cstyan
Copy link
Contributor Author

cstyan commented Jun 10, 2019

cc @csmarchbanks @codesome

@csmarchbanks
Copy link
Contributor

I like the tombstone move. I think you could remove the memSeries from RefSample without it becoming too messy by keeping two slices in sync. What do you think of
memSeries.patch.txt? You would have to apply to master, not your branch.

@krasi-georgiev
Copy link
Contributor

What is the problem you are trying to solve with this PR?

@cstyan
Copy link
Contributor Author

cstyan commented Jun 10, 2019

@krasi-georgiev the code for tailing/watching the WAL currently lives in the remote write package in prometheus/prometheus. We think it makes more sense for it to live in the WAL package, which is in tsdb repo right now. Unfortunately the WAL tailing code relies on a couple of types that live in the root tsdb package, which would end up causing some circular dependency issues.

@krasi-georgiev
Copy link
Contributor

hm , I see.
In that case need to spend some more time to look into this.

@cstyan cstyan force-pushed the callum-move-wal-watcher branch 2 times, most recently from ca248e3 to faf81f5 Compare June 27, 2019 15:26
@cstyan cstyan force-pushed the callum-move-wal-watcher branch from faf81f5 to e944b75 Compare July 2, 2019 21:07
@cstyan cstyan force-pushed the callum-move-wal-watcher branch from e944b75 to d6ba13b Compare July 3, 2019 18:54
Copy link
Contributor

@csmarchbanks csmarchbanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did another pass tonight, generally I am liking the refactor and beginnings of pulling things out of the base package.

record/internal.go Outdated Show resolved Hide resolved
record/internal.go Outdated Show resolved Hide resolved
wal/wal_watcher.go Outdated Show resolved Hide resolved
wal/checkpoint.go Outdated Show resolved Hide resolved
wal/checkpoint.go Outdated Show resolved Hide resolved
@cstyan
Copy link
Contributor Author

cstyan commented Jul 5, 2019

@csmarchbanks thanks for the review and for the sync.Pool suggestion #606 (comment)

Copy link
Contributor

@codesome codesome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not familiar with WAL Watcher, but other changes look fine.

head.go Outdated Show resolved Hide resolved
record/internal.go Outdated Show resolved Hide resolved
wal/watcher.go Show resolved Hide resolved
Copy link
Contributor

@csmarchbanks csmarchbanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting to look good from my perspective! Left another handful of comments, mostly pretty minor.

head.go Show resolved Hide resolved
head.go Outdated Show resolved Hide resolved
record/internal.go Outdated Show resolved Hide resolved
wal/watcher.go Outdated Show resolved Hide resolved
wal/watcher.go Outdated Show resolved Hide resolved
wal/watcher.go Outdated Show resolved Hide resolved
wal/watcher.go Outdated Show resolved Hide resolved
wal/watcher_test.go Outdated Show resolved Hide resolved
@cstyan cstyan force-pushed the callum-move-wal-watcher branch 3 times, most recently from f44209e to 476bbec Compare July 18, 2019 15:28
@cstyan
Copy link
Contributor Author

cstyan commented Jul 19, 2019

@csmarchbanks @codesome could use another review when you have some time

Copy link
Contributor

@csmarchbanks csmarchbanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple more comments.

wal/watcher.go Outdated
SeriesReset(int)
}

type watcherMetrics struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if this were WatcherMetrics to make it easier to pass around elsewhere.

wal/watcher.go Outdated
}

// NewWatcher creates a new WAL watcher for a given WriteTo.
func NewWatcher(reg prometheus.Registerer, logger log.Logger, name string, writer WriteTo, walDir string) *Watcher {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should take a WatcherMetrics, not a registerer, otherwise creating two watchers with the same registerer would panic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both NewWatcherMetrics and NewLiveReaderMetrics call reg.Register, not reg.MustRegister. But I guess the real question here is whether it's reasonable for consumers of the WAL Watcher to have to create and pass the metrics struct rather than just passing a Registerer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, reg.Register won't panic, but it also won't register the new metrics, so we would be blind for those queues/after a reload.

I think it is reasonable to have a consumer create and pass the metrics struct, just like NewLiveReader.

wal/watcher.go Outdated
}

// NewWatcher creates a new WAL watcher for a given WriteTo.
func NewWatcher(reg prometheus.Registerer, logger log.Logger, name string, writer WriteTo, walDir string) *Watcher {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, reg.Register won't panic, but it also won't register the new metrics, so we would be blind for those queues/after a reload.

I think it is reasonable to have a consumer create and pass the metrics struct, just like NewLiveReader.

wal/watcher.go Outdated
maxSegment int
}

func NewWatcherMetrics(reg prometheus.Registerer) *watcherMetrics {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these take a consumer and provide an Unregister so that if a consumer disappears it can be removed from the metrics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't Unregister as all the Watcher metrics are under the same metric but just WithLabels. Both QueueManager and Watcher Stop delete their metrics via DeleteLabelValues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@codesome codesome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I hope I did not miss anything, PR is big.

tombstones/tombstones.go Outdated Show resolved Hide resolved
wal/reader_test.go Show resolved Hide resolved
@codesome
Copy link
Contributor

Forgot to mention I haven't reviewed wal/watcher.go assuming that it was already reviewed when it was added to Prometheus.

cstyan added 15 commits July 29, 2019 11:39
manually creating a checkpoint dir and renaming files.

Signed-off-by: Callum Styan <[email protected]>
Signed-off-by: Callum Styan <[email protected]>
the live reader, expose WriteTo interface.

Signed-off-by: Callum Styan <[email protected]>
Signed-off-by: Callum Styan <[email protected]>
@cstyan cstyan force-pushed the callum-move-wal-watcher branch from 4b7a4fb to 8c40bb9 Compare July 29, 2019 18:56
@cstyan
Copy link
Contributor Author

cstyan commented Jul 29, 2019

thanks @codesome, the only new addition to watcher.go here is how it's metrics are initialized

edit: do we want to merge this before TSDB is moved in prometheus/prometheus? or after, and just apply the diff from here to the other repo?

@cstyan cstyan requested a review from codesome August 2, 2019 23:12
than creating and registering in the constructor.

Signed-off-by: Callum Styan <[email protected]>
Copy link
Contributor

@csmarchbanks csmarchbanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good overall - I agree with code movement, thanks for this!
However, I think there are some suggestions from my side:

  • PR is huge. Can we just move things around in one PR and add some extra logic (seriesSample) in another? It will help massively even if it's tedious ): e.g I don't understand really why this change.
  • tombstones package can be simplified due to nice clear name
  • some public methods/functions are missing comments.

@@ -607,7 +608,7 @@ func (c *LeveledCompactor) write(dest string, meta *BlockMeta, blocks ...BlockRe
}

// Create an empty tombstones file.
if _, err := writeTombstoneFile(c.logger, tmp, newMemTombstones()); err != nil {
if _, err := tombstones.WriteTombstoneFile(c.logger, tmp, tombstones.NewMemTombstones()); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure tombstones.WriteTombstoneFile needs to be WriteTombstoneFile etc? Why not just tombstones.WriteFile if we have clear package name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for NewMemTombstones

err error
}

func newCompactionSeriesSet(i IndexReader, c ChunkReader, t TombstoneReader, p index.Postings) *compactionSeriesSet {
func newCompactionSeriesSet(i IndexReader, c ChunkReader, t tombstones.TombstoneReader, p index.Postings) *compactionSeriesSet {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Suggested change
func newCompactionSeriesSet(i IndexReader, c ChunkReader, t tombstones.TombstoneReader, p index.Postings) *compactionSeriesSet {
func newCompactionSeriesSet(i IndexReader, c ChunkReader, t tombstones.Reader, p index.Postings) *compactionSeriesSet {

// ErrOutOfBounds is returned if an appended sample is out of the
// writable time range.
ErrOutOfBounds = errors.New("out of bounds")

// ErrAmendSample is returned if an appended sample has the same timestamp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why moving around?

//lint:ignore SA6002 safe to ignore and actually fixing it has some performance penalty.
h.appendPool.Put(b[:0])
}

func (h *Head) getSeriesBuffer() []*memSeries {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have to change this while just moving packages around?

@@ -30,7 +32,7 @@ import (
"github.com/prometheus/tsdb/fileutil"
)

const tombstoneFilename = "tombstones"
const TombstoneFilename = "tombstones"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, with tombstones package name you don't need tombstones prefix everywhere, we can simplify (:

@@ -54,8 +71,8 @@ type TombstoneReader interface {
Close() error
}

func writeTombstoneFile(logger log.Logger, dir string, tr TombstoneReader) (int64, error) {
path := filepath.Join(dir, tombstoneFilename)
func WriteTombstoneFile(logger log.Logger, dir string, tr TombstoneReader) (int64, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it is exposed we need comment e.g for goDoc ): Same everywhere else for public functions we add.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants