Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update NewLabel method to use more efficient update mechanism #25777

Merged
merged 9 commits into from
Jan 31, 2025

Conversation

sgress454
Copy link
Contributor

@sgress454 sgress454 commented Jan 27, 2025

For #25555

Checklist for submitter

  • Changes file added for user-visible changes in changes/, orbit/changes/ or ee/fleetd-chrome/changes.
    See Changes files for more information.
  • Input data is properly validated, SELECT * is avoided, SQL injection is prevented (using placeholders for values in statements)

This PR updates the NewLabel service to use the UpdateLabelMembershipByHostIDs method previously added by @jacobshandling rather than using ApplyLabels. The latter method has performance issues when adding large numbers of hosts at once to a manual label (see #25555) because it does an expensive lookup of host names before transforming those into Fleet host IDs. The new code skips the middleman and transforms host identifiers directly to Fleet host IDs, and does so using a batching strategy to ensure the queries don't get too large.

This PR does update UpdateLabelMembershipByHostIDs slightly to return an updated Label object and host IDs array, as this is the expected return value for NewLabel. I update the method's tests accordingly. I don't think any new tests for NewLabel are needed as it should have the same functionality and return values.

Manual Testing

On the main branch, I launched my local MySQL with the thread stack size set to the minimal allowed, and used the API to try and create a new label with 5,000 hosts attached, and received a 422 response from the server. Server logs showed:

level=error ts=2025-01-28T15:08:20.465401Z component=http [email protected] method=POST 
uri=/api/latest/fleet/labels took=16.610292ms err="get hostnames by identifiers: Error 1436 (HY000): Thread stack 
overrun:  111136 bytes used of a 131072 byte stack, and 20000 bytes needed.  Use 'mysqld --thread_stack=#' to specify 
a bigger stack."

On this branch, I kept the same MySQL settings and tried my API request again and it was successful:
image

QA

The script I used to create a new manual label with lots of hosts is at: https://gist.github.com/sgress454/84f12064c437da456c456e25c26d9069

To run it, first grab a bearer token from any API request by opening the network tab, clicking a Fleet API request, and in the headers tab scrolling down to Authorization:
image
(only take the part after "Bearer")

Then download the script from that gist and in its folder run:

NODE_TLS_REJECT_UNAUTHORIZED=0 node ./add_hosts_to_label.js <the bearer token> "<a label name>"

e.g.

NODE_TLS_REJECT_UNAUTHORIZED=0 node ./add_hosts_to_label.js U3HpbdtadmJXGKYSB0U/PbwfOpHbBt7FpkWmGKKYolOO1moLNZA6XxP+QO5LVukvAotZ7d+JbNUEEhYHZtxoqg== "some test label"

This will invoke the API on https://localhost:8080 and try to add 5000 hosts a new label "some test label".

If you need to change the # of hosts or the url of the server, there are additional arguments:

NODE_TLS_REJECT_UNAUTHORIZED=0 node ./add_hosts_to_label.js <the bearer token> "<a label name>" <number of hosts> <url>

e.g.

NODE_TLS_REJECT_UNAUTHORIZED=0 node ./add_hosts_to_label.js U3HpbdtadmJXGKYSB0U/PbwfOpHbBt7FpkWmGKKYolOO1moLNZA6XxP+QO5LVukvAotZ7d+JbNUEEhYHZtxoqg== "some test label" 10000 https://foo.bar

@sgress454 sgress454 requested a review from a team as a code owner January 27, 2025 17:01
@sgress454 sgress454 changed the title Sgress454/25555 update new label Update NewLabel method to use more efficient update mechanism Jan 27, 2025
@sgress454 sgress454 marked this pull request as draft January 27, 2025 18:14
@sgress454
Copy link
Contributor Author

@jacobshandling some failing tests, i'll ping you for review once they're resolved.

@sgress454 sgress454 marked this pull request as ready for review January 27, 2025 19:37
@sgress454
Copy link
Contributor Author

@jacobshandling all set, I just had to do the thing I said I would do 🙄

Copy link

codecov bot commented Jan 27, 2025

Codecov Report

Attention: Patch coverage is 71.42857% with 4 lines in your changes missing coverage. Please review.

Project coverage is 63.60%. Comparing base (42d7227) to head (2f8c401).
Report is 73 commits behind head on main.

Files with missing lines Patch % Lines
server/datastore/mysql/labels.go 50.00% 2 Missing and 1 partial ⚠️
server/service/labels.go 75.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #25777      +/-   ##
==========================================
- Coverage   63.61%   63.60%   -0.02%     
==========================================
  Files        1623     1623              
  Lines      155465   155456       -9     
  Branches     4077     4077              
==========================================
- Hits        98905    98883      -22     
- Misses      48759    48769      +10     
- Partials     7801     7804       +3     
Flag Coverage Δ
backend 64.46% <71.42%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@jacobshandling jacobshandling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good.

Can we also update this comment to keep it accurate to the new matching functionality?

@@ -165,8 +165,11 @@ VALUES ` + strings.Join(placeholders, ", ")
}
return nil
})
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? Seems like every possible err is being handled already

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the error returned from the retry function above. Errors are handled inside the function e.g. here but we still need to check whether the retry function failed so we can return out of UpdateLabelMembershipByHostIDs early. We were previously doing this implicitly with

return ctxerr.Wrap(ctx, err, "UpdateLabelMembershipByHostIDs transaction")

where err would usually be nil.


return ctxerr.Wrap(ctx, err, "UpdateLabelMembershipByHostIDs transaction")
return ds.labelDB(ctx, labelID, teamFilter, ds.writer(ctx))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about ds.reader(ctx), or even using
this function?

Copy link
Contributor Author

@sgress454 sgress454 Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with both of those is that they use the read replica (ds.reader()), which doesn't have the updates yet.

@sgress454 sgress454 merged commit 1cd37ef into main Jan 31, 2025
35 checks passed
@sgress454 sgress454 deleted the sgress454/25555-update-new-label branch January 31, 2025 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants