Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0 fix self healing #114

Closed
wants to merge 2 commits into from
Closed

v0 fix self healing #114

wants to merge 2 commits into from

Conversation

louis030195
Copy link
Collaborator

@louis030195 louis030195 commented Aug 6, 2024

fix and implement a self-healing mode

why?

sometimes the frame or audio recording become stale after long time or on windows when computer sleep long enough or IDK

i'd rather make the code antifragile/redundant at high level first than spending decades digging into low parts

  • it checks if the /health endpoint returns unhealthy, which is a state when last frame and last audio chunk has not been added to the db for a while
  • it avoids infinite loop restart through cooldown
  • it properly stop inner loops running in threads etc

different ways to test this PR:

cargo build --release # add --features metal on mac
./target/release/screenpipe --self-healing
  1. passive: run for decades (24h+) and search in the log "restart" or things like this, or see if everything still runs smoothly,
  2. active:
  • try to make your computer sleep etc.
  • try to kill stuff, like turning off permissions, (killing ffmpeg unlikely will be detected as crash as our KPI here is data inserted in db), try to delete your db maybe? or turning the audio device currently used, or blocking the API idk

on my side it was working, but need to test deeper

PS: went kinda bulldozer removing lot of dead code, so would be nice to merge fast otherwise feel like this PR will die

@louis030195 louis030195 requested a review from m13v August 6, 2024 16:46
@louis030195 louis030195 marked this pull request as ready for review August 6, 2024 16:46
}

let (restart_sender, mut restart_receiver) = channel(10);
let resource_monitor =
ResourceMonitor::new(cli.self_healing, Duration::from_secs(60), 3, restart_sender);
ResourceMonitor::new(cli.self_healing, Duration::from_secs(5), 3, restart_sender);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo set 60 back when merging

@@ -289,27 +275,6 @@ pub(crate) async fn stop_device(
}))
}

pub(crate) async fn start_recording(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately restart api was kinda broken, lets re-add once we have more traction on this feat

@@ -156,47 +162,39 @@ pub async fn continuous_capture(
previous_image = Some(Arc::new(image.clone()));
// debug!("ocr_task_running {} BEFORE if if !ocr_task_running.load(Ordering::SeqCst)", ocr_task_running.load(Ordering::SeqCst));

if !ocr_task_running.load(Ordering::SeqCst) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi this atomic bool stuff was not necessary

it was used back in the day we did ocr on multi thread but here its just sequential, no need

ocr_task_running.store(true, Ordering::SeqCst);
// debug!("ocr_task_running {}", ocr_task_running.load(Ordering::SeqCst));
let ocr_engine_clone = ocr_engine.clone();
tokio::spawn(async move {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also this thread spawn was not necessary at all?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no effect because we await inside

could have been useful if we didnt await the inside func maybe? i do that sometime in js but i think async here is slightly different

@@ -501,9 +462,6 @@ impl Server {
.route("/audio/stop", post(stop_device))
.route("/audio/status", post(get_device_status))
.route("/audio/list", get(get_devices))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could remove audio restart api too, just dead code atm until we ensure it work well

@m13v
Copy link
Contributor

m13v commented Aug 7, 2024

  1. first i tried to launch it with existing server on 3030, it failed, it couldn't do anything about it
  2. then i tried switching audio devices from headphones to desktop, it couldn't switch, keeps status : Loading
    It tries to restart, says restart already in progress, and goes on in a loop for 11 attemps, then it stops. It cancels attempt, and starts again restarting without success
  3. when it restarts it doesn't remember that I chose cloud to process audio, it fallback into whisper

@m13v
Copy link
Contributor

m13v commented Aug 7, 2024

now the audio mic stopped working after like a minute:
[2024-08-07T01:46:10Z DEBUG screenpipe_server::resource_monitor] Health data: HealthCheckResponse { status: "Unhealthy", last_frame_timestamp: Some(2024-08-07T01:44:18.563041Z), last_audio_timestamp: Some(2024-08-07T01:45:01.008412Z), frame_status: "Stale", audio_status: "Stale", message: "Some systems are not functioning properly. Frame status: Stale, Audio status: Stale", verbose_instructions: Some("If you're experiencing issues, please try the following steps:\n1. Restart the application.\n2. If using a desktop app, reset your Screenpipe OS audio/screen recording permissions.\n3. If the problem persists, please contact support with the details of this health check at [email protected].\n4. Last, here are some FAQ to help you troubleshoot: https://github.com/louis030195/screen-pipe/blob/main/content/docs/NOTES.md") }

@m13v
Copy link
Contributor

m13v commented Aug 7, 2024

but when i run screenpipe from main branch, all works fine

@louis030195
Copy link
Collaborator Author

now the audio mic stopped working after like a minute: [2024-08-07T01:46:10Z DEBUG screenpipe_server::resource_monitor] Health data: HealthCheckResponse { status: "Unhealthy", last_frame_timestamp: Some(2024-08-07T01:44:18.563041Z), last_audio_timestamp: Some(2024-08-07T01:45:01.008412Z), frame_status: "Stale", audio_status: "Stale", message: "Some systems are not functioning properly. Frame status: Stale, Audio status: Stale", verbose_instructions: Some("If you're experiencing issues, please try the following steps:\n1. Restart the application.\n2. If using a desktop app, reset your Screenpipe OS audio/screen recording permissions.\n3. If the problem persists, please contact support with the details of this health check at [email protected].\n4. Last, here are some FAQ to help you troubleshoot: https://github.com/louis030195/screen-pipe/blob/main/content/docs/NOTES.md") }

thats strange

i never had such a good performance on my side

been running for 16h now, <4s delay

@louis030195
Copy link
Collaborator Author

okay one of the issue new with this PR is that i feel like tesseract is duplicated in the background which leads to kind of memory leak and program get killed by OS at some point

can solve it by properly stop tesseract upon reboot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants