Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add better support for thread id #38

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

winmutt
Copy link

@winmutt winmutt commented Sep 12, 2018

Thread_id was inadvertently (?) picked up in NumberMetrics but not being parsed out of UserHost. This change will parse it out of UserHost and add it to NumberMetrics without significantly impacting the expected output.

@codecov
Copy link

codecov bot commented Sep 12, 2018

Codecov Report

Merging #38 into master will increase coverage by 0.08%.
The diff coverage is 79.31%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #38      +/-   ##
==========================================
+ Coverage   67.02%   67.11%   +0.08%     
==========================================
  Files           7        7              
  Lines        1389     1411      +22     
==========================================
+ Hits          931      947      +16     
- Misses        348      352       +4     
- Partials      110      112       +2
Impacted Files Coverage Δ
log/log.go 100% <ø> (ø) ⬆️
event/class.go 85.71% <100%> (+0.93%) ⬆️
log/slow/parser.go 71.79% <100%> (+0.74%) ⬆️
event/aggregator.go 82.69% <33.33%> (-10.34%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f5cfaf6...b6be54d. Read the comment docs.

@arvenil
Copy link
Contributor

arvenil commented Sep 18, 2018

Hi @winmutt, thank you for your contribution.
The only issue is that Thread_id is not really a NumberMetric. NumberMetrics can be aggregated e.g. Rows_examined = 2 and Rows_examined = 6 produce average metric 4. Aggregation that produces averages shouldn't be however applied to Thread_id. The aggregate of Thread_ids can be a list of those ids, not average. I would suggest adding ThreadID as separate field in Event struct for now. What do you think?

p.s. May I ask in which project/for what you are using this library?

@winmutt
Copy link
Author

winmutt commented Sep 18, 2018

@arvenil , thanks for the feedback. This had been my initial intention (I even did the work as such and had to roll it back), but was concerned with changing the interface, eg the data is already being caught in NumberMetrics. I'll happily redo that work and move it back out to Event. Do you think the existing behavior should remain in order to ensure backwards compatibility?

We have a need to parse and aggregate slow query logs by thread id, rather than checksum/fingerprint.

@arvenil
Copy link
Contributor

arvenil commented Sep 18, 2018

@winmutt Hey.

We have a need to parse and aggregate slow query logs by thread id, rather than checksum/fingerprint.

I would say, just finish your implementation and I will accept this PR (or version with Event.ThreadID) when it will be ready. Do you plan to use Aggregator too?

@winmutt
Copy link
Author

winmutt commented Sep 19, 2018

I have not looked at aggregator yet, but it sounds like I will.

@winmutt
Copy link
Author

winmutt commented Oct 1, 2018

So this doesn't get closed quite yet, I'll be starting work on it again shortly, been side tracked by a few things.

@CLAassistant
Copy link

CLAassistant commented Oct 16, 2018

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
3 out of 4 committers have signed the CLA.

✅ winmutt
✅ gywndi
✅ arvenil
❌ Rolf Martin-Hoster


Rolf Martin-Hoster seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@winmutt
Copy link
Author

winmutt commented Oct 16, 2018

@arvenil Is that what you envisioned with thread_id? Sorry my PR got a little sloppy, having to go back and fix the author email for the match the CLA I managed to pick up some stray old commits.

Also, here is the related project I am working on https://github.com/winmutt/slow_log_parse. So far it was just a POC but now I am focused on replacing our use of pt-query-digest with it (but not in entirety at all). In testing it was ~5x faster and used ~4x less memory than pt-query-digest which we use heavily sampling random 10min every hour across ~1200 db instances.

@winmutt
Copy link
Author

winmutt commented Oct 18, 2018

I've been looking more closely at the aggregator output and realized I need to add ThreadID there as well, not just the raw events. Incoming commit.

Copy link
Contributor

@arvenil arvenil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@winmutt I'm no longer part of @percona org but 👍 lgtm @rnovikovP feel free to accept
p.s. @winmutt did you consider passing whole list of thread ids or last one is enough for you?

@winmutt
Copy link
Author

winmutt commented Nov 16, 2018

@arvenil thanks so much for getting back and may your future adventures be grand! I did not consider a list of ID's was shooting for best effort.

@rnovikovP would be happy to help answer any questions you have around this PR.

@@ -277,6 +284,9 @@ func (p *SlowLogParser) parseHeader(line string) {
} else if smv[1] == "Log_slow_rate_limit" {
val, _ := strconv.ParseUint(smv[2], 10, 64)
p.event.RateLimit = uint(val)
} else if smv[1] == "Thread_id" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regexp tries to match Id or Thread_id. Shouldn't be the same here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking at this, I'll take a look today

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I updated the regex to only match Id. This Thread_id is matched as part of the "metrics" parsing. It was probably innocuous but better to be safe.

@rnovikovP rnovikovP requested a review from askomorokhov March 21, 2019 22:27
@askomorokhov askomorokhov removed their request for review January 11, 2021 11:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants