Skip to content

Centralize log state management into shared package #1701

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

jefchien
Copy link
Contributor

Description of the issue

Both the log tailing input plugin (logfile) and the windows log events input plugin (windows_event_log) maintain state files. The plugins create one state file for each tailed log file and each subscribed windows event respectively. The state files serve as a way to keep track of progress (byte offset) and a recovery mechanism to restart from the last known position if the agent restarts. This helps to maintain log integrity and prevent duplicate processing of already sent logs.

Since these are maintained in two separate packages, there is overlap in how the behavior is defined and makes it more difficult to maintain or make changes to.

Description of changes

Centralizes log state management into a single shared package state which has a FileOffsetManager that manages the update/save loop for the state file.

License

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Tests

Added unit tests and update existing unit tests to use state manager package.

Requirements

Before commit the code, please do the following steps.

  1. Run make fmt and make fmt-sh
  2. Run make lint

@jefchien jefchien requested a review from a team as a code owner May 23, 2025 16:27
"github.com/aws/amazon-cloudwatch-agent/logs"
"github.com/aws/amazon-cloudwatch-agent/plugins/inputs/windows_event_log/wineventlog"
)

const (
forcePullInterval = 250 * time.Millisecond
stateQueueSize = 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the math to determine this value away from the default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the previous queue size, so I kept it the same for now. We can make adjustments later, but I want to make sure we aren't changing behavior.

if cfg.QueueSize <= 0 {
cfg.QueueSize = defaultQueueSize
}
if cfg.SaveInterval <= 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any plan to make this configurable instead always using the default? maybe just remove from ManagerConfig for now?

Copy link
Contributor Author

@jefchien jefchien May 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. This is in place for unit testing.

// defaultSaveInterval is the default duration between state file saves
defaultSaveInterval = 100 * time.Millisecond
// defaultQueueSize is the default capacity of the offset queue
defaultQueueSize = 2000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to keep this many in the queue? Is the purpose to reduce the likelihood of older items being dropped when the queue becomes full?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same deal as the windows_event_logs. This was the queue size of the original state managers for the logfile plugin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants