Read /proc/net files with a single read syscall. #361
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The /proc/net files are not guaranteed to be consistent, they are only
consitent on the row level. This is probably one of the reasons why
consequent read calls might return duplicate entries - the kernel is
changing the file as it is being read. In certain situations this might
lead to loop like situations - the same net entry is being returned by consequent read calls as new connections are added to the kernel tcp table.
This PR is trying to reduce the duplications, by fetching the contents
of the net files with a single read syscall.
This discussion on Stackoverflow goes into more detail on how the /proc files work in terms of consistency.
In our use case, there were certain situations where the netstat telegraf input plugin spiked eating up a lot of memory. On further inspection it appeared that this was due to a lot of entries in the /proc/net/tcp file. These entries, however, did not correspond to higher tcp connections - they remained within the expected range. This is what leads me to believe, that in certain situations loop like conditions are created and a lot of duplicate rows appear in /proc/net/tcp.
Consider the following graphs:
The memory in the graph is the memory taken from the netstat telegraf plugin. You can see how spikes in the /proc/net/tcp length correspond to spikes in netstat's memory usage. Additionally, in those spikes, there are no real spikes in tcp connection counts. The length of /proc/net/tcp was gathered separately using
cat /proc/net/tcp
.cat
itself issues many read calls to the file, so it has the same behavior as the netstat plugin.After the patch the graphs look like this:
There are no more spikes in memory.
Slightly more CPU is consumed, but this is compensated with the reduction of duplicates and stabilization of the memory used. For smaller use cases I don't think CPU usage will be an issue, in higher ones overall single reads yields much better performance.
In our use case we are observing a 56% improvement in the time required to gather the netstat data: