Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory leak ($300 bounty) #278

Closed
louis030195 opened this issue Sep 5, 2024 · 48 comments
Closed

memory leak ($300 bounty) #278

louis030195 opened this issue Sep 5, 2024 · 48 comments

Comments

@louis030195
Copy link
Collaborator

louis030195 commented Sep 5, 2024

how does screenpipe work?

  • vision: screenpipe take screenshots of all your windows of all your monitors continuously and do OCR + mp4 encoding to disk
    • we use windows native OCR for windows
    • we use tesseract for linux
    • we use apple native OCR for apple (through compiling a Swift lib used thru C in rust)
    • also option to use cloud unstructured api
  • audio: it also record audio continuously and do STT + mp4 encoding
    • we use voice activity detection models and
    • whisper tiny/large
    • or deepgram (cloud)
  • server
    • it also index all this data in a local sqlite DB
    • and runs an axum api on top of the db

previously noticed memory leaks in dependencies:

what is still to fix:

  • memory grow infinitely when using vision only
  • memory grow infinitely when using audio and vision (but feels like audio is leaking more/faster)

what i tried/did:

  • using less Arc and more Weak ptr to avoid circular references
  • avoid infinitely growing data structures (maybe some left?)
  • fixed virtual device on macos which allows user to use virtual device to capture system audio avoiding the scpk leak (atm closing a partnership with Blackhole to automate virtual device creation)
  • adding benchmarks and stuff in resource monitor to keep track of perf automatically (still not satisfied)
  • currently my best way to track perf is to write down in my notes how i start the program (settings etc.) and set stopwatch lap on my phone, note down memory in activity monitor (mac) and update every 30 min or so
  • added a hack that restart process every x min in the app UI settings that should fix
  • added restart interval arg in CLI which is broken and leaks even faster :)
  • using less .clone
  • use xcode instruments + leaks to debug / profile
  • tried a bit lldb but did not get any value out of it
  • maybe other things

what could be helpful to try:

  • running main entrypoints on small parts of code and run the leaks command on it with lot of looping to force leak like i did here
  • using profiling tools from other OSes like on linux things that are not available on macos

what i suspect is still leaking:

  1. circular references with arc: overuse of arc without proper weak references can create reference cycles, preventing memory from being freed.

  2. unbounded channels: using unbounded channels (e.g., mpsc::unbounded_channel()) without proper backpressure can lead to memory growth if producers outpace consumers.

  3. long-running loops: continuous capture loops in vision and audio processing might accumulate data over time if not properly managed.

  4. unmanaged file handles: repeatedly opening file handles for logging or data storage without proper closure could leak file descriptors.

  5. spawned tasks not being cleaned up: tokio tasks that are spawned but not properly awaited or cancelled could lead to resource leaks.

  6. large data structures in long-running processes: storing large amounts of data in memory for extended periods without proper cleanup.

  7. improper error handling: failing to properly handle errors in async contexts might leave resources uncleaned.

  8. caching without limits: implementing caches without size limits or eviction policies could lead to unbounded growth.

  9. improper use of 'static lifetimes: overuse of 'static lifetimes might prevent data from being dropped when it's no longer needed.

  10. resource-intensive callbacks: callbacks for audio or video processing that allocate memory without proper deallocation.

  11. improper management of external resources: not properly releasing resources from external libraries or apis (e.g., ffmpeg, ocr engines).

  12. accumulating historical data: storing historical data (e.g., previous images for comparison) without a retention policy.

  13. inefficient string handling: repeated string allocations and concatenations in logging or data processing without reuse.

  14. improper shutdown procedures: not properly shutting down all components and releasing resources when the application terminates.

  15. memory fragmentation: frequent allocations and deallocations of varying sizes could lead to memory fragmentation, appearing as a "leak".

  16. improper use of lazy_static or similar patterns: global state that grows over time without bounds.

  17. inefficient use of buffers: repeatedly allocating new buffers for audio or video data instead of reusing existing ones.

  18. improper handling of large files: loading large files entirely into memory instead of streaming or chunking.

  19. unclosed streams: not properly closing audio or video streams, especially when dealing with multiple devices.

  20. improper handling of device disconnections: not cleaning up resources when audio or video devices are disconnected unexpectedly.

  21. wrong usage of ffmpeg maybe switch to ffmpeg-sidecar #194 would help

  22. wrong usage of sqlite db

  23. maybe using IPC for ffmpeg would help [stability/perf] using IPC to communicate with ffmpeg #246

  24. something else

context:

how to reproduce:

  • build the CLI or app and run it for 30-60m
  • open htop or anything to monitor memory in parallel

definition of done:

  • i can run screenpipe CLI or app for 24h at less than 6 GB memory (atm reach 8 gb in 30m, 20 gb after 20h roughly)

cc:

bounty $300

/bounty 300

happy to jump on a call if useful or for efficiency

Copy link

linear bot commented Sep 5, 2024

Copy link

algora-pbc bot commented Sep 5, 2024

💎 $300 bounty • Screenpi.pe

Steps to solve:

  1. Start working: Comment /attempt #278 with your implementation plan
  2. Submit work: Create a pull request including /claim #278 in the PR body to claim the bounty
  3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Thank you for contributing to mediar-ai/screenpipe!

Add a bountyShare on socials

@exi
Copy link

exi commented Sep 5, 2024

Hi, I'd be interested in giving this a shot if you could give me instructions on how exactly to run this to trigger the problem.

@FractalFir
Copy link
Contributor

I am (mostly) free ATM, so I would not mind fixing this leak too.

@louis030195
Could you provide more details, like the leaks logs included last time?

@louis030195
Copy link
Collaborator Author

@FractalFir last time?

this is current process leaks (after 27 min, uses 8 gb):
https://gist.github.com/louis030195/41914b36910efcbf9cb96e96714eee68

but i think the leaks command or UI is not helpful anymore, this is just 7mb leak

that's why looking for other way to profile, are you on linux? i heard about this https://github.com/flamegraph-rs/flamegraph

but does not work on mac

to build CLI on linux:

sudo apt-get update
sudo apt-get install -y libavformat-dev libavfilter-dev libavdevice-dev ffmpeg libasound2-dev tesseract-ocr libtesseract-dev
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
git clone https://github.com/mediar-ai/screenpipe
cd screenpipe
cargo build --release 
. /target/release/screenpipe

screencapture does not work on Wayland fyi

@umanwizard
Copy link

umanwizard commented Sep 5, 2024

you can profile heap memory usage with jemalloc_pprof on Linux.

Apply this diff:

diff --git a/screenpipe-server/Cargo.toml b/screenpipe-server/Cargo.toml
index 169b99c..542e4b1 100644
--- a/screenpipe-server/Cargo.toml
+++ b/screenpipe-server/Cargo.toml
@@ -75,6 +75,10 @@ async-trait = "0.1.68"
 ndarray = "0.15.6"
 rust-stemmers = "1.2.0"
 
+tikv-jemallocator = { version = "0.5.0", features = ["profiling", "unprefixed_malloc_on_supported_platforms"] }
+jemalloc_pprof = "0.4.2"
+
+
 [dev-dependencies]
 tempfile = "3.3.0"
 
diff --git a/screenpipe-server/src/bin/screenpipe-server.rs b/screenpipe-server/src/bin/screenpipe-server.rs
index 0cb2b5e..d2aed29 100644
--- a/screenpipe-server/src/bin/screenpipe-server.rs
+++ b/screenpipe-server/src/bin/screenpipe-server.rs
@@ -69,8 +69,47 @@ fn get_base_dir(custom_path: Option<String>) -> anyhow::Result<PathBuf> {
     Ok(base_dir)
 }
 
+#[cfg(not(target_env = "msvc"))]
+#[global_allocator]
+static ALLOC: tikv_jemallocator::Jemalloc = tikv_jemallocator::Jemalloc;
+
+#[allow(non_upper_case_globals)]
+#[export_name = "malloc_conf"]
+pub static malloc_conf: &[u8] = b"prof:true,prof_active:true,lg_prof_sample:19\0";
+
+use axum::http::StatusCode;
+use axum::response::IntoResponse;
+
+pub async fn handle_get_heap() -> Result<impl IntoResponse, (StatusCode, String)> {
+    let mut prof_ctl = jemalloc_pprof::PROF_CTL.as_ref().unwrap().lock().await;
+    require_profiling_activated(&prof_ctl)?;
+    let pprof = prof_ctl
+        .dump_pprof()
+        .map_err(|err| (StatusCode::INTERNAL_SERVER_ERROR, err.to_string()))?;
+    Ok(pprof)
+}
+
+/// Checks whether jemalloc profiling is activated an returns an error response if not.
+fn require_profiling_activated(prof_ctl: &jemalloc_pprof::JemallocProfCtl) -> Result<(), (StatusCode, String)> {
+    if prof_ctl.activated() {
+        Ok(())
+    } else {
+        Err((axum::http::StatusCode::FORBIDDEN, "heap profiling not activated".into()))
+    }
+}
+
 #[tokio::main]
 async fn main() -> anyhow::Result<()> {
+    let app = axum::Router::new()
+        .route("/debug/pprof/heap", axum::routing::get(handle_get_heap));
+
+    // run our app with hyper, listening globally on port 3000
+    let listener = tokio::net::TcpListener::bind("0.0.0.0:3000").await.unwrap();
+
+    tokio::spawn(async {
+        axum::serve(listener, app).await.unwrap();
+    });
+
     let cli = Cli::parse();
 
     if find_ffmpeg_path().is_none() {

install pprof (assuming you have golang installed):

go install github.com/google/pprof@latest

get a heap pprof and analyze it with the pprof tool:

curl http://localhost:3000/debug/pprof/heap > out.pprof && ~/go/bin/pprof -http : out.pprof

(I suggest navigating to "flamegraph" in the pprof UI)

@FractalFir
Copy link
Contributor

last time?

I meant like when I worked on fixing the leak in screencapturekit-rs.

Yes, I am on Linux, currently downloading and building the CLI. I have used things like cargo-flamegraph quite a bit before, because I was dealing with high memory usage in my own projects.

Well, the output still tells us a few things. None of the leaks are > 64 bytes, which suggest that the leaked object is small. It is unlikely to be a video frame / audio sample.

@umanwizard
Copy link

@louis030195

screencapture does not work on Wayland fyi

what is "screencapture" ?

And does this mean it is impossible to reproduce the leak on wayland?

@louis030195
Copy link
Collaborator Author

louis030195 commented Sep 5, 2024

@louis030195

screencapture does not work on Wayland fyi

what is "screencapture" ?

And does this mean it is impossible to reproduce the leak on wayland?

screenpipe take screenshots of all your windows of all your monitors continuously and do OCR + mp4 encoding to disk

it also record audio continuously and do STT + mp4 encoding

and it mean you cannot reproduce on wayland the vision leaks, you can reproduce audio leaks though (might break down the bounty in smaller ones if there is a leak in both audio and vision)

you can disable audio or vision using --disable-vision or --disable-audio

@umanwizard
Copy link

I'm repeatedly getting this error when running (in X):

[2024-09-05T18:31:41Z ERROR screenpipe_server::video] Failed to write frame to ffmpeg: Broken pipe (os error 32)

@louis030195
Copy link
Collaborator Author

I'm repeatedly getting this error when running (in X):

[2024-09-05T18:31:41Z ERROR screenpipe_server::video] Failed to write frame to ffmpeg: Broken pipe (os error 32)

hmm

this is another issue that nobody found how to reproduce actually #228

@FractalFir
Copy link
Contributor

I think can replicate the leak on my machine, and it seems a bit bigger on Linux.

[2024-09-05T18:43:49Z INFO  screenpipe_server::resource_monitor] Runtime: 310s, Total Memory: 21% (3.33 GB / 15.72 GB), Total CPU: 17%
[2024-09-05T18:44:19Z INFO  screenpipe_server::resource_monitor] Runtime: 340s, Total Memory: 23% (3.65 GB / 15.72 GB), Total CPU: 617%

300 Mb in 30 seconds is quite a lot. I will be analysing the exact cause.

@FractalFir
Copy link
Contributor

It looks like memory usage goes up in very sudden bursts of allocations.

image

@louis030195
Copy link
Collaborator Author

louis030195 commented Sep 5, 2024

keep in mind we load a whisper-large model in memory at boot (nvidia if using cuda feature and apple stuff when using metal feature or RAM+CPU) for audio transcription

create_whisper_channel(audio_transcription_engine.clone()).await?

also leaks show big leak at boot only every time but i cannot see the full stack for some reason in the UI and does not show in the CLI: huggingface/candle#2271 (comment)

seems correlated to model loading but not sure

@FractalFir
Copy link
Contributor

Yeah, I will let it run for a bit longer to have more accurate data. I thought 1 minute would be enough to initialize everything, but giving it more time will not hurt.

@louis030195
Copy link
Collaborator Author

louis030195 commented Sep 5, 2024

(will keep updating this msg w perf logs) atm trying different setup myself:

11:47 am

pid 57490 - alacritty - ./target/release/screenpipe --fps 0.2 --audio-transcription-engine whisper-large --audio-device "MacBook Pro Microphone (input)" --data-dir /tmp/sp --ocr-engine apple-native --port 3038

pid 57424 - app - /Applications/screenpipe.app/Contents/MacOS/screenpipe --port 3030 --fps 0.2 --audio-transcription-engine whisper-large --ocr-engine apple-native --audio-device "MacBook Pro Microphone (input)"

pid 57647 - cursor - cargo run --bin screenpipe -- --disable-audio --fps 0.2 --ocr-engine apple-native --port 3031 --data-dir /tmp/spp

at 7m:
Screenshot 2024-09-05 at 11 52 21

30m

Screenshot 2024-09-05 at 12 15 22

1h40m

Screenshot 2024-09-05 at 13 20 36

ofc running parallel stuff add more noise

@FractalFir
Copy link
Contributor

I will let in run for some more time to get a better picture of what is happening.

@FractalFir
Copy link
Contributor

This is quite a weird issue.

I have run the executable under heaptrack to see the exact cause of the leak.

I think the memory is leaking, but according to heaptrack, the memory usage seems to stay the same.

Heaptrack also thinks that the peak memory usage was 3.7 GB(or 4.6 including heaptrack overhead). However, this is not the case according to the memory usage metrics, which claim a higher usage.

Runtime: 491s, Total Memory: 30% (4.71 GB / 15.72 GB), Total CPU: 787%

So, it seems like heaptrack will not be enough, and I will try using valgrind. It is much slower, but should give more accurate info.

@FractalFir
Copy link
Contributor

FractalFir commented Sep 5, 2024

I have run the program under valgrind for some time, and have some initial results.

==147072== LEAK SUMMARY:
==147072==    definitely lost: 7,800 bytes in 110 blocks
==147072==    indirectly lost: 11,223 bytes in 60 blocks
==147072==      possibly lost: 12,615,327 bytes in 165,968 blocks
==147072==    still reachable: 3,139,704,411 bytes in 14,658 blocks
==147072==                       of which reachable via heuristic:
==147072==                         length64           : 292,104 bytes in 1,346 blocks
==147072==         suppressed: 332 bytes in 2 blocks
==147072== 
==147072== For lists of detected and suppressed errors, rerun with: -s
==147072== ERROR SUMMARY: 595 errors from 595 contexts (suppressed: 2 from 2)

The still reachable blocks is memory which is still accessible to the program, but was not freed when I stopped it (for example, the whisper model).

Directly and indirectly, lost blocks are pieces of memory valgrind knows can't be freed. However, those were allocated in C code of some Linux audio utilities, and should not be the cause of the problem.

The 12.6 MB of "possibly lost" memory kind of looks like it could be the leak, but I am not sure.

The thing about "possibly lost" blocks is that they could be still reachable, so false positives are not out of the question.

Some things seem to suggest that at least some of the leaks you have observed are included in that "possibly lost" memory.

You have said that you think you might have a leak related to model loading. This to me looks like it could be that leak:

==147072== 19,660,800 bytes in 1 blocks are still reachable in loss record 3,096 of 3,110
==147072==    at 0x5758866: malloc (vg_replace_malloc.c:446)
==147072==    by 0x4A407B7: UnknownInlinedFun (alloc.rs:98)
==147072==    by 0x4A407B7: UnknownInlinedFun (alloc.rs:181)
==147072==    by 0x4A407B7: UnknownInlinedFun (alloc.rs:241)
==147072==    by 0x4A407B7: UnknownInlinedFun (raw_vec.rs:478)
==147072==    by 0x4A407B7: with_capacity_in<alloc::alloc::Global> (raw_vec.rs:425)
==147072==    by 0x4A407B7: with_capacity_in<f32, alloc::alloc::Global> (raw_vec.rs:202)
==147072==    by 0x4A407B7: with_capacity_in<f32, alloc::alloc::Global> (mod.rs:698)
==147072==    by 0x4A407B7: with_capacity<f32> (mod.rs:480)
==147072==    by 0x4A407B7: from_iter<f32, core::iter::adapters::map::Map<core::slice::iter::Iter<half::binary16::f16>, candle_core::cpu_backend::utils::unary_map::{closure_env#0}<half::binary16::f16, f32, candle_core::cpu_backend::{impl#27}::to_dtype::{closure_env#14}>>> (spec_from_iter_nested.rs:52)
==147072==    by 0x4A407B7: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter (spec_from_iter.rs:33)
==147072==    by 0x4AB478A: from_iter<f32, core::iter::adapters::map::Map<core::slice::iter::Iter<half::binary16::f16>, candle_core::cpu_backend::utils::unary_map::{closure_env#0}<half::binary16::f16, f32, candle_core::cpu_backend::{impl#27}::to_dtype::{closure_env#14}>>> (mod.rs:2986)
==147072==    by 0x4AB478A: collect<core::iter::adapters::map::Map<core::slice::iter::Iter<half::binary16::f16>, candle_core::cpu_backend::utils::unary_map::{closure_env#0}<half::binary16::f16, f32, candle_core::cpu_backend::{impl#27}::to_dtype::{closure_env#14}>>, alloc::vec::Vec<f32, alloc::alloc::Global>> (iterator.rs:2000)
==147072==    by 0x4AB478A: candle_core::cpu_backend::utils::unary_map (utils.rs:285)
==147072==    by 0x4A30394: <candle_core::cpu_backend::CpuStorage as candle_core::backend::BackendStorage>::to_dtype (mod.rs:1721)
==147072==    by 0x4AC2B29: UnknownInlinedFun (storage.rs:182)
==147072==    by 0x4AC2B29: candle_core::tensor::Tensor::to_dtype (tensor.rs:2019)
==147072==    by 0x4A10D27: <candle_core::safetensors::MmapedSafetensors as candle_nn::var_builder::SimpleBackend>::get (var_builder.rs:382)
==147072==    by 0x4A0F67C: <alloc::boxed::Box<dyn candle_nn::var_builder::SimpleBackend> as candle_nn::var_builder::Backend>::get (var_builder.rs:86)
==147072==    by 0x490742A: get_with_hints_dtype<alloc::boxed::Box<dyn candle_nn::var_builder::SimpleBackend, alloc::alloc::Global>, (usize, usize, usize)> (var_builder.rs:198)
==147072==    by 0x490742A: get_with_hints<alloc::boxed::Box<dyn candle_nn::var_builder::SimpleBackend, alloc::alloc::Global>, (usize, usize, usize)> (var_builder.rs:181)
==147072==    by 0x490742A: candle_nn::var_builder::VarBuilderArgs<B>::get (var_builder.rs:186)
==147072==    by 0x48F6C3D: candle_transformers::models::whisper::model::conv1d (model.rs:13)
==147072==    by 0x48FC8B9: load (model.rs:259)
==147072==    by 0x48FC8B9: candle_transformers::models::whisper::model::Whisper::load (model.rs:382)
==147072==    by 0x268B5F9: screenpipe_audio::stt::WhisperModel::new (stt.rs:77)
==147072==    by 0x1E741DD: {async_fn#0} (stt.rs:731)
==147072==    by 0x1E741DD: {async_fn#0} (core.rs:51)
==147072==    by 0x1E741DD: screenpipe::main::{{closure}}::{{closure}}::{{closure}} (screenpipe-server.rs:327)
==147072== 
==147072== 26,214,400 bytes in 1 blocks are still reachable in loss record 3,097 of 3,110
==147072==    at 0x5758866: malloc (vg_replace_malloc.c:446)
==147072==    by 0x4A407B7: UnknownInlinedFun (alloc.rs:98)
==147072==    by 0x4A407B7: UnknownInlinedFun (alloc.rs:181)
==147072==    by 0x4A407B7: UnknownInlinedFun (alloc.rs:241)
==147072==    by 0x4A407B7: UnknownInlinedFun (raw_vec.rs:478)
==147072==    by 0x4A407B7: with_capacity_in<alloc::alloc::Global> (raw_vec.rs:425)
==147072==    by 0x4A407B7: with_capacity_in<f32, alloc::alloc::Global> (raw_vec.rs:202)
==147072==    by 0x4A407B7: with_capacity_in<f32, alloc::alloc::Global> (mod.rs:698)
==147072==    by 0x4A407B7: with_capacity<f32> (mod.rs:480)
==147072==    by 0x4A407B7: from_iter<f32, core::iter::adapters::map::Map<core::slice::iter::Iter<half::binary16::f16>, candle_core::cpu_backend::utils::unary_map::{closure_env#0}<half::binary16::f16, f32, candle_core::cpu_backend::{impl#27}::to_dtype::{closure_env#14}>>> (spec_from_iter_nested.rs:52)
==147072==    by 0x4A407B7: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter (spec_from_iter.rs:33)
==147072==    by 0x4AB478A: from_iter<f32, core::iter::adapters::map::Map<core::slice::iter::Iter<half::binary16::f16>, candle_core::cpu_backend::utils::unary_map::{closure_env#0}<half::binary16::f16, f32, candle_core::cpu_backend::{impl#27}::to_dtype::{closure_env#14}>>> (mod.rs:2986)
==147072==    by 0x4AB478A: collect<core::iter::adapters::map::Map<core::slice::iter::Iter<half::binary16::f16>, candle_core::cpu_backend::utils::unary_map::{closure_env#0}<half::binary16::f16, f32, candle_core::cpu_backend::{impl#27}::to_dtype::{closure_env#14}>>, alloc::vec::Vec<f32, alloc::alloc::Global>> (iterator.rs:2000)
==147072==    by 0x4AB478A: candle_core::cpu_backend::utils::unary_map (utils.rs:285)
==147072==    by 0x4A30394: <candle_core::cpu_backend::CpuStorage as candle_core::backend::BackendStorage>::to_dtype (mod.rs:1721)
==147072==    by 0x4AC2B29: UnknownInlinedFun (storage.rs:182)
==147072==    by 0x4AC2B29: candle_core::tensor::Tensor::to_dtype (tensor.rs:2019)
==147072==    by 0x4A10D27: <candle_core::safetensors::MmapedSafetensors as candle_nn::var_builder::SimpleBackend>::get (var_builder.rs:382)
==147072==    by 0x4A10B7C: UnknownInlinedFun (var_builder.rs:86)
==147072==    by 0x4A10B7C: candle_nn::var_builder::VarBuilderArgs<B>::get_with_hints_dtype (var_builder.rs:198)
==147072==    by 0x4A1058A: get_with_hints<alloc::boxed::Box<dyn candle_nn::var_builder::SimpleBackend, alloc::alloc::Global>, (usize, usize)> (var_builder.rs:181)
==147072==    by 0x4A1058A: candle_nn::linear::linear (linear.rs:62)
==147072==    by 0x4906469: candle_transformers::models::with_tracing::linear (with_tracing.rs:57)
==147072==    by 0x48F9B04: candle_transformers::models::whisper::model::ResidualAttentionBlock::load (model.rs:163)
==147072==    by 0x4905071: {closure#0} (model.rs:263)
==147072==    by 0x4905071: {closure#0}<usize, core::result::Result<candle_transformers::models::whisper::model::ResidualAttentionBlock, candle_core::error::Error>, (), core::ops::control_flow::ControlFlow<core::ops::control_flow::ControlFlow<candle_transformers::models::whisper::model::ResidualAttentionBlock, ()>, ()>, candle_transformers::models::whisper::model::{impl#2}::load::{closure_env#0}, core::iter::adapters::{impl#0}::try_fold::{closure_env#0}<core::iter::adapters::map::Map<core::ops::range::Range<usize>, candle_transformers::models::whisper::model::{impl#2}::load::{closure_env#0}>, core::result::Result<core::convert::Infallible, candle_core::error::Error>, (), core::iter::traits::iterator::Iterator::try_for_each::call::{closure_env#0}<candle_transformers::models::whisper::model::ResidualAttentionBlock, core::ops::control_flow::ControlFlow<candle_transformers::models::whisper::model::ResidualAttentionBlock, ()>, fn(candle_transformers::models::whisper::model::ResidualAttentionBlock) -> core::ops::control_flow::ControlFlow<candle_transformers::models::whisper::model::ResidualAttentionBlock, ()>>, core::ops::control_flow::ControlFlow<candle_transformers::models::whisper::model::ResidualAttentionBlock, ()>>> (map.rs:95)
==147072==    by 0x4905071: try_fold<core::ops::range::Range<usize>, (), core::iter::adapters::map::map_try_fold::{closure_env#0}<usize, core::result::Result<candle_transformers::models::whisper::model::ResidualAttentionBlock, candle_core::error::Error>, (), core::ops::control_flow::ControlFlow<core::ops::control_flow::ControlFlow<candle_transformers::models::whisper::model::ResidualAttentionBlock, ()>, ()>, candle_transformers::models::whisper::model::{impl#2}::load::{closure_env#0}, core::iter::adapters::{impl#0}::try_fold::{closure_env#0}<core::iter::adapters::map::Map<core::ops::range::Range<usize>, candle_transformers::models::whisper::model::{impl#2}::load::{closure_env#0}>, core::result::Result<core::convert::Infallible, candle_core::error::Error>, (), core::iter::traits::iterator::Iterator::try_for_each::call::{closure_env#0}<candle_transformers::models::whisper::model::ResidualAttentionBlock, core::ops::control_flow::ControlFlow<candle_transformers::models::whisper::model::ResidualAttentionBlock, ()>, fn(candle_transformers::models::whisper::model::ResidualAttentionBlock) -> core::ops::control_flow::ControlFlow<candle_transformers::models::whisper::model::ResidualAttentionBlock, ()>>, core::ops::control_flow::ControlFlow<candle_transformers::models::whisper::model::ResidualAttentionBlock, ()>>>, core::ops::control_flow::ControlFlow<core::ops::control_flow::ControlFlow<candle_transformers::models::whisper::model::ResidualAttentionBlock, ()>, ()>> (iterator.rs:2405)
==147072==    by 0x4905071: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::try_fold (map.rs:121)
==147072==    by 0x48EDFE9: UnknownInlinedFun (mod.rs:191)
==147072==    by 0x48EDFE9: UnknownInlinedFun (iterator.rs:2467)
==147072==    by 0x48EDFE9: UnknownInlinedFun (mod.rs:174)
==147072==    by 0x48EDFE9: from_iter<candle_transformers::models::whisper::model::ResidualAttentionBlock, core::iter::adapters::GenericShunt<core::iter::adapters::map::Map<core::ops::range::Range<usize>, candle_transformers::models::whisper::model::{impl#2}::load::{closure_env#0}>, core::result::Result<core::convert::Infallible, candle_core::error::Error>>> (spec_from_iter_nested.rs:24)
==147072==    by 0x48EDFE9: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter (spec_from_iter.rs:33)

(the bytes in the log are the total count, not the count in that leak).

However, once again, those are possible leaks, and not "guranteed leaks". Also, I am not sure if the leak seen on MacOS also present on Linux.

Could you try running the program under valgrind yourself? I just want to make sure the issue is present on both platforms.

EDIT: it looks like valgrind is not supported on ARM Macs. :(. I guess we will need to use something different.

@exi
Copy link

exi commented Sep 5, 2024

Just to spare someone else the effort:
I ran it under bytehound, and the results are similar to heaptrack.
It claims the memory is lower than it actually is (according to htop) and does not even see the ever-increasing memory usage.

@louis030195
Copy link
Collaborator Author

Just to spare someone else the effort: I ran it under bytehound, and the results are similar to heaptrack. It claims the memory is lower than it actually is (according to htop) and does not even see the ever-increasing memory usage.

interesting

actually our resource monitor never properly recorded memory also https://github.com/mediar-ai/screenpipe/blob/main/screenpipe-server/src/resource_monitor.rs

@louis030195
Copy link
Collaborator Author

i need to go away for max 1h, will come back on this issue after (~2 pm here)

@louis030195
Copy link
Collaborator Author

louis030195 commented Sep 5, 2024

also you can run these other binaries to run smaller parts btw:

cargo build --release
./target/release/screenpipe # (end to end vision, audio, db, api)
./target/release/screenpipe-vision # just record vision + ocr (does not save files)
./target/release/screenpipe-audio-forever # just record audio + stt (save files)

# or through cargo run:
cargo run --bin screenpipe
# etc.

@FractalFir
Copy link
Contributor

i need to go away for max 1h, will come back on this issue after (~2 pm here)

Understandable, I too will have to go away in a few hours.

I have found a way to make the leak faster, and more visible.

  1. By using the smaller whisper model, we can make the base memory usage smaller. This should make the leak more visible, since it is going to be relatively larger.
  2. Setting higher fps makes memory usage grow faster on my machine.

With those settings, the memory usage seems to grow from 0.67 GB at the start to 2GB after ~5.5 minutes.

[2024-09-05T21:09:16Z INFO  screenpipe_server::resource_monitor] Runtime: 320s, Total Memory: 13% (2.02 GB / 15.72 GB), Total CPU: 688%

this seems to suggest this is issue is related to video recording. However, this could also be a false positive, since video recording is resource intensive in general.

@louis030195
Copy link
Collaborator Author

louis030195 commented Sep 5, 2024

11:47 am

57490 - ./target/release/screenpipe --fps 0.2 --audio-transcription-engine whisper-large --audio-device "MacBook Pro Microphone (input)" --data-dir /tmp/sp --ocr-engine apple-native --port 3038

57424 - /Applications/screenpipe.app/Contents/MacOS/screenpipe --port 3030 --fps 0.2 --audio-transcription-engine whisper-large --ocr-engine apple-native --audio-device "MacBook Pro Microphone (input)"

57647 - cargo run --bin screenpipe -- --disable-audio --fps 0.2 --ocr-engine apple-native --port 3031 --data-dir /tmp/spp

1.29 pm

87637 - cargo run --bin screenpipe -- --disable-audio --port 3031 --data-dir /tmp/spp

2.03 pm

98906 - cargo run --bin screenpipe-vision

2.08 pm

1808 - cargo run --features metal --bin screenpipe-audio-forever -- --audio-device "MacBook Pro Microphone (input)"

stopped all at 2.55 pm:

Screenshot 2024-09-05 at 14 55 12

@louis030195
Copy link
Collaborator Author

louis030195 commented Sep 5, 2024

easy way to reproduce:

  • vision leak: cargo run --bin screenpipe-vision -- --fps 30 quickly grow, despite no vision, server or db code
  • audio leak:
    • cargo run --bin screenpipe-audio-forever -- --list-audio-devices
    • cargo run --bin screenpipe-audio-forever -- --audio-device "your audio device" --audio-chunk-duration 1

@FractalFir
Copy link
Contributor

FractalFir commented Sep 5, 2024

This does not seem to work on my machine, when I run

cargo run --bin screenpipe-vision -- --fps 30

I get:

error: unexpected argument '--fps' found

Usage: screenpipe-vision [OPTIONS]

For more information, try '--help'.

When I run cargo run --bin screenpipe-vision --help
I get:

Usage: screenpipe-vision [OPTIONS]

Options:
      --save-text-files  Save text files
      --cloud-ocr-off    Disable cloud OCR processing
  -h, --help             Print help
  -V, --version          Print version

So, I am unable to run the vison code standalone.

Are you on some specific branch?

@louis030195
Copy link
Collaborator Author

git pull (just updated)

@exi
Copy link

exi commented Sep 5, 2024

I think there might be more than one or a more fundamental issue here.
I'm currently looking at screenpipe-audio-forever, which has a bug causing ffmpeg to fail immediately after the second recording.

This still grows.

@louis030195
Copy link
Collaborator Author

louis030195 commented Sep 5, 2024

fixed screenpipe-audio-forever (try 5 for duration maybe instead) (git pull)

@exi
Copy link

exi commented Sep 5, 2024

Also at least on linux/ubuntu with pipewire as an alsa backend, each time we list the devices, it seems to leak a bit inside pipewire.
This particular leak can be seen in bytehound because there is a lot of libdbus allocations still hanging around and also the alsa/pcm allocations keep growing:

DBus leakage
Accumulated devices

Both of these seem outside our control.
My initial suspicion was that we are hanging onto the device handles, but that seems not to be the case because i can see the "drop" happening in my debugger.

@exi
Copy link

exi commented Sep 5, 2024

I'll look at this more tomorrow, it's midnight here. Good luck 👋

@FractalFir
Copy link
Contributor

FractalFir commented Sep 5, 2024

It is also midnight for me, so I will too be soon heading to bed.

BTW: I can't reproduce the vision leak on Linux. The memory grows for some time, but then it stabilizes.

Question: could you provide the output of the leaks command for just the vision module?

export MallocStackLogging=1
leaks cargo run --bin screenpipe-vision -- --fps 30

I don't know if the leaks command tracks child processes, so while it might not have worked when running the whole project, it could work when running just the faulty component.

@louis030195
Copy link
Collaborator Author

louis030195 commented Sep 5, 2024

@FractalFir

i reached 12 gb after running screenpipe-vision with 30 fps for 48 min

will share leaks, do you think it could be apple native OCR?

also i mostly heard of memory issues from mac users and less on windows and linux, but still some windows users found using too much memory/cpu sometimes (esp when only 16 gb ram computers or no GPU)

this is the code:

https://github.com/mediar-ai/screenpipe/blob/main/screenpipe-vision/src/apple.rs

https://github.com/mediar-ai/screenpipe/blob/main/screenpipe-vision/src/ocr.swift

leaks: https://gist.github.com/louis030195/92717aaedfde57e592bb424567aeeeb6

(note that i did a few changes right now in core.rs vision code before running leaks command which could have improved perf, just see overuse of arc and clones in the vision code that is not necessary)

@FractalFir
Copy link
Contributor

FractalFir commented Sep 5, 2024

The leaks output is sadly not very helpful (since it seems to be mostly empty).

will share leaks, do you think it could be apple native OCR?

Well, there is a way to check if this is caused by Apple native OCR.
Is the leak still present when you switch to a different OCR engine?
E.g. when you run with --ocr-engine unstructured?

If the leak is present with a different OCR engine, then the issue must be somewhere else. If it disappears after changing engines, then this must be related to Apple OCR.

@louis030195
Copy link
Collaborator Author

running screenpipe-vision with tesseract with 120 fps right now to see

@louis030195
Copy link
Collaborator Author

louis030195 commented Sep 5, 2024

Screenshot 2024-09-05 at 16 18 33

well

with tesseract screenpipe uses less than 50 mb while apple using 4 gb

looking into the swift code now

@louis030195
Copy link
Collaborator Author

trying some changes on the swift/rs code related to apple ocr
image

@louis030195
Copy link
Collaborator Author

(does not seem to be the issue)

@FractalFir
Copy link
Contributor

Does just calling this function in a loop leak memory?

If so, then we know that the issue is in that function and that function alone.

If the issue is there, can you replicate it in swift? For example, by just passing some hardcoded image?

@FractalFir
Copy link
Contributor

OK, so it looks like the leak is there.

There must be some kind of bug in the swift code, so the next logical step would be looking closer at that code to find the exact cause.

I am not a swift expert, but I would suggest disabling certain parts of the swift code until the leak disappears.

For example, you could check if this swift code alone:


  guard let dataProvider = CGDataProvider(data: Data(bytes: imageData, count: length) as CFData),
    let cgImage = CGImage(
      width: width,
      height: height,
      bitsPerComponent: 8,
      bitsPerPixel: 32,
      bytesPerRow: width * 4,
      space: CGColorSpaceCreateDeviceRGB(),
      bitmapInfo: CGBitmapInfo(rawValue: CGImageAlphaInfo.premultipliedLast.rawValue),
      provider: dataProvider,
      decode: nil,
      shouldInterpolate: false,
      intent: .defaultIntent
    )
  else {
    return strdup("Error: Failed to create CGImage")
  }

  // Preprocess the image
  let ciImage = CIImage(cgImage: cgImage)
  let context = CIContext(options: nil)

  // Apply preprocessing filters (slightly reduced contrast compared to original)
  let processed = ciImage
    .applyingFilter("CIColorControls", parameters: [kCIInputSaturationKey: 0, kCIInputContrastKey: 1.08])
    .applyingFilter("CIUnsharpMask", parameters: [kCIInputRadiusKey: 0.8, kCIInputIntensityKey: 0.4])

  guard let preprocessedCGImage = context.createCGImage(processed, from: processed.extent) else {
    return strdup("Error: Failed to create preprocessed image")
  }

  var ocrResult = ""
  var textElements: [[String: Any]] = []
  var totalConfidence: Float = 0.0
  var observationCount: Int = 0
    // disable all code after this statement by returnign early.  
return strdup("Is this enoguh to leak?")

leaks memory. If this first part leaks memory, then the issue is likely there. If calling this stub does nothing, then we know that the leak is somewhere further down the line. You can repeat this process until you find the exact cause of the leak.

Sadly, I have to go now. I will take a closer look at this tommorow.

@louis030195
Copy link
Collaborator Author

louis030195 commented Sep 6, 2024

hey everyone, i fixed the leak, doing few more test and will distribute the bounty shortly

@louis030195
Copy link
Collaborator Author

/tip $150 @FractalFir
/tip $50 @exi

thanks a lot 🙏

feel free to have a look at other issues we do a bunch of bounties, also we did not have the opportunity to test much on linux unfortunately (still trying to setup a cloud desktop with audio and vision available)

Copy link

algora-pbc bot commented Sep 6, 2024

🎉🎈 @FractalFir has been awarded $150! 🎈🎊

Copy link

algora-pbc bot commented Sep 6, 2024

@exi: You just got a $50 tip! 👉 Complete your Algora onboarding to collect your payment.

Copy link

algora-pbc bot commented Sep 6, 2024

Copy link

algora-pbc bot commented Sep 6, 2024

🎉🎈 @exi has been awarded $50! 🎈🎊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants