Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor and enhance pipeline with batch processing #607

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

zhuoyuan-liu
Copy link
Contributor

This update introduces a new batch processing mechanism for node last seen time udpate, improving efficiency and reducing redundant operations.

It also removed unused functions. Discontinued historical tracking of IP addresses and node names, as there are no clear use cases for this data.

What is missing in this PR:

  • Graceful Shutdown for the Main Function: The main function needs to be updated to allow graceful shutdown to ensure cached data is properly flushed before shutdown.

* Implement batch processing for node updates and refactor IP address handling

* Remove history IP table

* Remove unused function

* remove history name tables

* Remove unused GetMetadata function from NodeMetadata

* Keep remove unused function
@javuto javuto requested review from javuto and Copilot March 1, 2025 22:48
@javuto javuto added osctrl-tls osctrl-tls related changes backend Backend related issues labels Mar 1, 2025
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Overview

This PR refactors the pipeline by introducing a batch processing mechanism for updating node last seen times and removing legacy historical tracking functions. Key changes include:

  • Implementation of a new batch writer (in cmd/tls/handlers/writers.go) to accumulate and flush node update events.
  • Updates in multiple handler functions (post.go and handlers.go) to use the new batch writer instead of immediate database calls.
  • Removal of unused historical tracking code for IP addresses, hostnames, and localnames in the nodes package.

Reviewed Changes

File Description
cmd/tls/handlers/writers.go Introduces the batch writer and its background processing logic
cmd/tls/handlers/handlers.go Integrates the batch writer into the handlers with hardcoded testing values
cmd/tls/handlers/post.go Updates event handling to use the batch writer instead of immediate refresh calls
pkg/nodes/nodes.go Removes legacy functions and repurposes RefreshLastSeenBatch to update "last_config"
pkg/logging/dispatch.go Removes refresh calls for logging, aligning with the new update strategy
pkg/nodes/metadata.go, names.go, ipaddress.go Remove unused historical tracking functionality

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (2)

cmd/tls/handlers/post.go:421

  • [nitpick] In the QueryWriteHandler, utils.GetIP(r) is redundantly retrieved. Reuse the previously retrieved IP to ensure consistency across updates.
ip := utils.GetIP(r)

pkg/nodes/nodes.go:464

  • The RefreshLastSeenBatch function updates the 'last_config' column, which may be misleading. Verify if this is the intended behavior or if a dedicated 'last_seen' column should be updated instead.
return n.DB.Model(&OsqueryNode{}).Where("id IN ?", nodeID).UpdateColumn("last_config", time.Now()).Error

log.Err(err).Msg("error refreshing last query read")
}
// Refresh node last seen
ip := utils.GetIP(r)
Copy link
Preview

Copilot AI Mar 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] In the QueryReadHandler, utils.GetIP(r) is called a second time even though it was already obtained earlier. Consider capturing the IP address once and reusing it to avoid potential inconsistencies.

Suggested change
ip := utils.GetIP(r)

Copilot is powered by AI, so mistakes are possible. Review output carefully before use.

Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
return nil
func (n *NodeManager) RefreshLastSeenBatch(nodeID []uint) error {

return n.DB.Model(&OsqueryNode{}).Where("id IN ?", nodeID).UpdateColumn("last_config", time.Now()).Error
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just have a new field called last_seen and not to recycle any of the existing ones.

)

// writeEvent represents a single update request.
type writeEvent struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either we have a typed event for each update, or something generic enough that can be used for any updates

Copy link
Collaborator

@javuto javuto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the two comments about having a new field last_seen and to have generic events. Thanks for the PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Backend related issues osctrl-tls osctrl-tls related changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants