Optimize AsyncVideoDecoder for Better Memory + Switch to Bounded Channels (Possible memory leak fix) #122

newideas99 · 2024-10-13T23:51:29Z

Pull Request: Optimize AsyncVideoDecoder for Better Memory Management and Performance

Changes Made

Added new constants for cache management:

const MAX_CACHE_MEMORY: usize = 1024 * 1024 * 1024; // 1 GB max cache size
const CACHE_CLEANUP_INTERVAL: Duration = Duration::from_secs(60); // Clean up every 60 seconds

Introduced CachedFrame struct for better frame management:

#[derive(Clone)]
struct CachedFrame {
    frame_number: u32,
    frame: DecodedFrame,
    last_accessed: Instant,
}

Changed cache data structure from BTreeMap to VecDeque:
```
let mut cache = VecDeque::new();
```

Implemented new cache management methods:

fn cleanup_cache(cache: &mut VecDeque<CachedFrame>, cache_size: &mut usize) {
    // ... (implementation)
}

fn aggressive_cleanup(cache: &mut VecDeque<CachedFrame>, cache_size: &mut usize) {
    // ... (implementation)
}

Modified frame decoding and caching logic:
- Introduced frame_to_send variable to store the frame to be sent.
- Updated cache insertion and removal logic.
- Implemented periodic cache cleanup.
Moved sender.send() outside of all loops:
```
sender.send(frame_to_send).ok();
```

Explanation

The main issues addressed in this update are related to memory management, performance optimization, and potential undefined behavior due to improper use of the oneshot sender.

Prevention of Memory Leaks and Improved Performance

Better Cache Management: The new CachedFrame struct and VecDeque data structure allow for more efficient frame caching and access patterns. The cleanup_cache and aggressive_cleanup methods ensure that the cache doesn't grow beyond specified limits, preventing potential memory leaks.
Single Use of Sender: By moving the sender.send() call outside of all loops, we ensure that the sender is used exactly once per request. This prevents potential memory leaks that could occur if frames were sent multiple times or if the sender was dropped without being used.
Efficient Frame Caching: The frame_to_send variable allows us to store the required frame as soon as it's decoded, without immediately sending it. This approach prevents unnecessary decoding of frames after the target frame is found, reducing memory usage and improving performance.
Periodic Cache Cleanup: The implementation of periodic cache cleanup helps maintain a reasonable memory footprint over time, preventing the accumulation of unused frames in memory.

Problems Addressed

Memory Leaks: The previous implementation risked accumulating too many frames in memory without a proper cleanup mechanism. The new cache management system addresses this.
Inefficient Caching: The old BTreeMap implementation was less efficient for the access patterns typically seen in video decoding. The new VecDeque structure is more appropriate for this use case.
Potential Undefined Behavior: The previous code had multiple points where the oneshot sender could be used, risking using it more than once or not at all in some code paths. The new implementation ensures consistent, single use of the sender.
Performance Issues: The old code continued decoding frames even after finding the requested frame. The new implementation breaks the decoding loop as soon as the required frame is found, improving performance.

Conclusion

These changes significantly improve the robustness, efficiency, and safety of the AsyncVideoDecoder. By implementing better cache management, ensuring proper use of the oneshot sender, and optimizing the frame decoding process, we've mitigated the risk of memory leaks, improved overall performance, and enhanced the reliability of the video decoding system.

Change Number 2:

Transition to Bounded Channels and Performance Enhancements

Changes

In `desktop/src-tauri/src/lib.rs`:

Added explicit import for mpsc:
```
use tokio::sync::mpsc;
```

Replaced unbounded channel with bounded channel:

// Old
// let (tx_image_data, mut rx_image_data) = tokio::sync::mpsc::unbounded_channel::<Vec<u8>>();

// New
let buffer_size = 60 * 3; // 3 seconds at 60 fps, or 6 seconds at 30 fps
let (tx_image_data, mut rx_image_data) = mpsc::channel::<Vec<u8>>(buffer_size);

In `rendering/lib.rs`:

Updated imports:

use tokio::sync::mpsc;
use tokio::time::{timeout, Duration};
use futures::stream::{self, StreamExt};
use std::sync::Arc;
use num_cpus;

Changed render_video_to_channel function signature:

// Old
// sender: tokio::sync::mpsc::UnboundedSender<Vec<u8>>,

// New
sender: mpsc::Sender<Vec<u8>>,

Implemented parallel frame processing using stream::iter and buffer_unordered:

let stream = stream::iter(0..total_frames)
    .map(|frame_number| {
        // ... (frame processing logic)
    })
    .buffer_unordered(num_cpus::get());

Added timeout for frame production:

match timeout(
    Duration::from_secs(30),
    produce_frame(constants, &screen_frame, &camera_frame, background, uniforms),
)
.await
{
    // ... (timeout handling logic)
}

Rationale

Memory Management: Transitioning from unbounded to bounded channels helps prevent potential memory leaks by providing backpressure.
Parallel Processing: Utilizing buffer_unordered with num_cpus::get() allows for efficient parallel processing of frames, potentially improving render speed on multi-core systems.
Timeout Mechanism: Adding a timeout for frame production helps prevent the renderer from hanging indefinitely on problematic frames.
Performance Optimization: The buffer size of 180 frames (3 seconds at 60 fps) provides a good balance between smooth performance and efficient resource usage for most modern systems.

Potential Impact

Improved memory management, reducing the risk of out-of-memory errors during long rendering sessions.
Potential performance improvements on multi-core systems due to parallel frame processing.
More robust error handling and prevention of indefinite hangs during rendering.

Testing Recommendations

Test rendering of videos with various durations, especially long videos, to ensure no memory leaks occur.
Verify performance improvements on both single-core and multi-core systems.
Test with different video resolutions and frame rates to ensure the buffer size is appropriate across various use cases.
Verify that the timeout mechanism works as expected for problematic frames.

Future Considerations

Implement adaptive buffer sizing based on system performance and available resources.

vercel · 2024-10-13T23:51:33Z

@newideas99 is attempting to deploy a commit to the Cap Software Inc Team on Vercel.

A member of the Team first needs to authorize it.

newideas99 · 2024-10-13T23:52:28Z

Everything seemed to build and work when i tested, but give it a test before pushing

Brendonovich · 2024-11-03T10:51:34Z

Using VecDeque and switching to bounded channels are great changes, thanks. Wanted to note a couple of things:

Single Use of Sender: By moving the sender.send() call outside of all loops, we ensure that the sender is used exactly once per request. This prevents potential memory leaks that could occur if frames were sent multiple times or if the sender was dropped without being used.

sender is a oneshot, it's impossible for more than one frame to get sent over it. Dropping the sender without using it is also fine, we treat it the same as sending None.

Performance Issues: The old code continued decoding frames even after finding the requested frame. The new implementation breaks the decoding loop as soon as the required frame is found, improving performance.

This is done on purpose to warm the cache ahead of time during playback, we keep decoding past the requested frame until we reach the limit or until a new frame request comes in.

updated decoder cache

eed5047

Switched to bounded channels

6a7e040

newideas99 changed the title ~~Optimize AsyncVideoDecoder for Better Memory (Possible memory leak fix)~~ Optimize AsyncVideoDecoder for Better Memory + Switch to Bounded Channels (Possible memory leak fix) Oct 14, 2024

conflicts resolved

3acaf19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize AsyncVideoDecoder for Better Memory + Switch to Bounded Channels (Possible memory leak fix) #122

Optimize AsyncVideoDecoder for Better Memory + Switch to Bounded Channels (Possible memory leak fix) #122

newideas99 commented Oct 13, 2024 •

edited

Loading

vercel bot commented Oct 13, 2024

newideas99 commented Oct 13, 2024

Brendonovich commented Nov 3, 2024

Optimize AsyncVideoDecoder for Better Memory + Switch to Bounded Channels (Possible memory leak fix) #122

Are you sure you want to change the base?

Optimize AsyncVideoDecoder for Better Memory + Switch to Bounded Channels (Possible memory leak fix) #122

Conversation

newideas99 commented Oct 13, 2024 • edited Loading

Pull Request: Optimize AsyncVideoDecoder for Better Memory Management and Performance

Changes Made

Explanation

Prevention of Memory Leaks and Improved Performance

Problems Addressed

Conclusion

Change Number 2:

Transition to Bounded Channels and Performance Enhancements

Changes

In desktop/src-tauri/src/lib.rs:

In rendering/lib.rs:

Rationale

Potential Impact

Testing Recommendations

Future Considerations

vercel bot commented Oct 13, 2024

newideas99 commented Oct 13, 2024

Brendonovich commented Nov 3, 2024

newideas99 commented Oct 13, 2024 •

edited

Loading

In `desktop/src-tauri/src/lib.rs`:

In `rendering/lib.rs`: