Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added caching in PyperfNativeStack #36

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

IzabellaRaulin
Copy link

@IzabellaRaulin IzabellaRaulin commented Sep 13, 2022

Before (docker image granulate/gprofiler:1.6.0):
image

[Outdated] After enabling caching
image

IzabellaRaulin added a commit to IzabellaRaulin/gprofiler that referenced this pull request Sep 14, 2022
const uint8_t *NativeStackTrace::stack = NULL;
size_t NativeStackTrace::stack_len = 0;
uintptr_t NativeStackTrace::sp = 0;
uintptr_t NativeStackTrace::ip = 0;
time_t NativeStackTrace::now;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
time_t NativeStackTrace::now;
time_t NativeStackTrace::now;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There some other formatting issues - you can run ./scripts/git-clang-format to fix them all :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last commit contains git-clang-format formatting changes for files modified in the PR.

return false;
}

UnwindCacheEntry NativeStackTrace::cache_get(const UnwindCache &map, const uint32_t &key) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not return a reference here?

Copy link
Author

@IzabellaRaulin IzabellaRaulin Oct 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My objective was to limit cache_get to read-only access and to make it more explicit that the content of the cache is only changing when cache_put or cache_delete_key are called. The code is then easier to maintain especially since there is no need to have a reference there.

Do you prefer having a reference here? Might you provide briefly why you think so? Maybe I missed something...I am open to your suggestion and looking forward to your opinion and guidelines.

void NativeStackTrace::cache_put(UnwindCache &mp, const uint32_t &key, const unw_cursor_t cursor, const unw_addr_space_t as, void *upt) {
// Check available capacity
if (cache_size() > NativeStackTrace::CacheMaxSizeMB*1024*1024 - cache_single_entry_size()) {
logInfo(2, "The cache usage is %.2f MB, close to reaching the max memory usage (%d MB)\n", cache_size_KB()/1024, NativeStackTrace::CacheMaxSizeMB);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 means it's not printed by default - only if we pass -v 2 or higher. Correct?

(these should not be printed by default, only when debugging)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not printed by default (and longer story below), but it makes sense to put here logLevel=3 to distinguish this info from others that are more of a warning nature.

Long story short, this is related to what I reported regarding my troubles getting/finding logs from PyPerfNativeStack. Increasing verbosity on granulate/gprofiler does not make those bcc/pyperf logs visible in gprofiler stdout. Increasing setVerbosityLevel in PyPerfLoggingHelper.cc) does not help as well.

I understand no one has reported this problem before so maybe it's my lack of knowledge of how the gprofiler handles logs coming from its modules. So, I decided to check the correctness of those new logs (especially passed values - how they look like) by dumping them into a temp file. For this purpose, I modified locally void logInfo to open a /tmp/myfile.txt and process vfprintf to it. Below I share the code diff.

 void logInfo(uint64_t logLevel, const char* fmt, ...) {
+
+  if (logLevel <= 2 ) {
+    va_list va;
+    va_start(va, fmt);
+    FILE * pFile;
+    pFile = fopen("/tmp/gprofiler_tmp/izahelperfile.txt","a");
+    if (pFile) {
+      std::vfprintf(pFile, fmt, va);
+      fclose (pFile);
+      va_end(va);
+    }
 [...]
 }

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an interesting idea to increase verbosity in PyPerf when you increase verbosity in gProfiler indeed. It's not the case. That doesn't explain why PyPerf shows no prints though.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I applied this diff on gProfiler and now I do see log messages in the PyPerf output: print:

diff --git a/gprofiler/profilers/python_ebpf.py b/gprofiler/profilers/python_ebpf.py
index 9113f355..649f9367 100644
--- a/gprofiler/profilers/python_ebpf.py
+++ b/gprofiler/profilers/python_ebpf.py
@@ -175,6 +175,8 @@ class PythonEbpfProfiler(ProfilerBase):
             str(self._EVENTS_BUFFER_PAGES),
             "--symbols-map-size",
             str(self._SYMBOLS_MAP_SIZE),
+            "-v",
+            "9",
             # Duration is irrelevant here, we want to run continuously.
         ] + self._offset_args()

I'll open a ticket to add it, it's a good feature that'll help debug PyPerf in the future.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this->symbols.push_back(error.str());
this->error_occurred = true;
goto out;
// Pseudo-proactive way of implementing TTL - whenever any call is made, all expired entries are removed
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Together with TTL-based eviction - PyPerf already tracks PIDs exiting so we can (and should) make use of that - otherwise we risk PID reuse getting us false results.

See populatePidTable where in the Pruning dead pids step it removes PIDs. We should call cache_delete_key there as well. Please ensure it actually works in removing PIDs that exit

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I push a draft of implementation.
populatePIDTable calls static NativeStackTrace::Pruning_dead_pid(). Please take a look at the proposal and let me know if such an approach is acceptable. It is WIP, so please ignore debug logs. I will remove it later.

In the meantime, I am debugging why the cache version gives missing and do not manage to add native symbols properly.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine with me. Despite written in C++ this entire project is fairly functional.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I removed debug leftovers.

@IzabellaRaulin IzabellaRaulin force-pushed the pyperf_caching branch 4 times, most recently from d753a85 to 3438223 Compare September 29, 2022 16:28
@Jongy
Copy link

Jongy commented Sep 29, 2022

@IzabellaRaulin , next times if you need to update from master, you can just merge it instead of rebasing. Rebase causes github to lose the current review state (I cannot see just the new changes). In any case we use squash-and-merge here, so whether you used merge or rebase to update your branch doesn't matter much eventually :)

@Jongy
Copy link

Jongy commented Sep 29, 2022

Besides the debug prints etc - please also remove all unrelated formatting changes before we merge (e.g all blank lines addition/removals, ...)

@IzabellaRaulin
Copy link
Author

@IzabellaRaulin , next times if you need to update from master, you can just merge it instead of rebasing. Rebase causes github to lose the current review state (I cannot see just the new changes). In any case we use squash-and-merge here, so whether you used merge or rebase to update your branch doesn't matter much eventually :)

Sorry, I didn't realize. But I put new changes in separate commits, so I hope that will help to track what is new

@IzabellaRaulin
Copy link
Author

Added commit 'Fixed missing symbols` to address the observed issue with missing native symbols in the version with enabled caching.

After:

image

Before (to prove the root cause was identified properly):
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants