Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize AsyncVideoDecoder for Better Memory + Switch to Bounded Channels (Possible memory leak fix) #122

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

newideas99
Copy link
Contributor

@newideas99 newideas99 commented Oct 13, 2024

Pull Request: Optimize AsyncVideoDecoder for Better Memory Management and Performance

Changes Made

  1. Added new constants for cache management:

    const MAX_CACHE_MEMORY: usize = 1024 * 1024 * 1024; // 1 GB max cache size
    const CACHE_CLEANUP_INTERVAL: Duration = Duration::from_secs(60); // Clean up every 60 seconds
  2. Introduced CachedFrame struct for better frame management:

    #[derive(Clone)]
    struct CachedFrame {
        frame_number: u32,
        frame: DecodedFrame,
        last_accessed: Instant,
    }
  3. Changed cache data structure from BTreeMap to VecDeque:

    let mut cache = VecDeque::new();
  4. Implemented new cache management methods:

    fn cleanup_cache(cache: &mut VecDeque<CachedFrame>, cache_size: &mut usize) {
        // ... (implementation)
    }
    
    fn aggressive_cleanup(cache: &mut VecDeque<CachedFrame>, cache_size: &mut usize) {
        // ... (implementation)
    }
  5. Modified frame decoding and caching logic:

    • Introduced frame_to_send variable to store the frame to be sent.
    • Updated cache insertion and removal logic.
    • Implemented periodic cache cleanup.
  6. Moved sender.send() outside of all loops:

    sender.send(frame_to_send).ok();

Explanation

The main issues addressed in this update are related to memory management, performance optimization, and potential undefined behavior due to improper use of the oneshot sender.

Prevention of Memory Leaks and Improved Performance

  1. Better Cache Management: The new CachedFrame struct and VecDeque data structure allow for more efficient frame caching and access patterns. The cleanup_cache and aggressive_cleanup methods ensure that the cache doesn't grow beyond specified limits, preventing potential memory leaks.

  2. Single Use of Sender: By moving the sender.send() call outside of all loops, we ensure that the sender is used exactly once per request. This prevents potential memory leaks that could occur if frames were sent multiple times or if the sender was dropped without being used.

  3. Efficient Frame Caching: The frame_to_send variable allows us to store the required frame as soon as it's decoded, without immediately sending it. This approach prevents unnecessary decoding of frames after the target frame is found, reducing memory usage and improving performance.

  4. Periodic Cache Cleanup: The implementation of periodic cache cleanup helps maintain a reasonable memory footprint over time, preventing the accumulation of unused frames in memory.

Problems Addressed

  1. Memory Leaks: The previous implementation risked accumulating too many frames in memory without a proper cleanup mechanism. The new cache management system addresses this.

  2. Inefficient Caching: The old BTreeMap implementation was less efficient for the access patterns typically seen in video decoding. The new VecDeque structure is more appropriate for this use case.

  3. Potential Undefined Behavior: The previous code had multiple points where the oneshot sender could be used, risking using it more than once or not at all in some code paths. The new implementation ensures consistent, single use of the sender.

  4. Performance Issues: The old code continued decoding frames even after finding the requested frame. The new implementation breaks the decoding loop as soon as the required frame is found, improving performance.

Conclusion

These changes significantly improve the robustness, efficiency, and safety of the AsyncVideoDecoder. By implementing better cache management, ensuring proper use of the oneshot sender, and optimizing the frame decoding process, we've mitigated the risk of memory leaks, improved overall performance, and enhanced the reliability of the video decoding system.

Change Number 2:

Transition to Bounded Channels and Performance Enhancements

Changes

In desktop/src-tauri/src/lib.rs:

  1. Added explicit import for mpsc:

    use tokio::sync::mpsc;
  2. Replaced unbounded channel with bounded channel:

    // Old
    // let (tx_image_data, mut rx_image_data) = tokio::sync::mpsc::unbounded_channel::<Vec<u8>>();
    
    // New
    let buffer_size = 60 * 3; // 3 seconds at 60 fps, or 6 seconds at 30 fps
    let (tx_image_data, mut rx_image_data) = mpsc::channel::<Vec<u8>>(buffer_size);

In rendering/lib.rs:

  1. Updated imports:

    use tokio::sync::mpsc;
    use tokio::time::{timeout, Duration};
    use futures::stream::{self, StreamExt};
    use std::sync::Arc;
    use num_cpus;
  2. Changed render_video_to_channel function signature:

    // Old
    // sender: tokio::sync::mpsc::UnboundedSender<Vec<u8>>,
    
    // New
    sender: mpsc::Sender<Vec<u8>>,
  3. Implemented parallel frame processing using stream::iter and buffer_unordered:

    let stream = stream::iter(0..total_frames)
        .map(|frame_number| {
            // ... (frame processing logic)
        })
        .buffer_unordered(num_cpus::get());
  4. Added timeout for frame production:

    match timeout(
        Duration::from_secs(30),
        produce_frame(constants, &screen_frame, &camera_frame, background, uniforms),
    )
    .await
    {
        // ... (timeout handling logic)
    }

Rationale

  1. Memory Management: Transitioning from unbounded to bounded channels helps prevent potential memory leaks by providing backpressure.

  2. Parallel Processing: Utilizing buffer_unordered with num_cpus::get() allows for efficient parallel processing of frames, potentially improving render speed on multi-core systems.

  3. Timeout Mechanism: Adding a timeout for frame production helps prevent the renderer from hanging indefinitely on problematic frames.

  4. Performance Optimization: The buffer size of 180 frames (3 seconds at 60 fps) provides a good balance between smooth performance and efficient resource usage for most modern systems.

Potential Impact

  • Improved memory management, reducing the risk of out-of-memory errors during long rendering sessions.
  • Potential performance improvements on multi-core systems due to parallel frame processing.
  • More robust error handling and prevention of indefinite hangs during rendering.

Testing Recommendations

  1. Test rendering of videos with various durations, especially long videos, to ensure no memory leaks occur.
  2. Verify performance improvements on both single-core and multi-core systems.
  3. Test with different video resolutions and frame rates to ensure the buffer size is appropriate across various use cases.
  4. Verify that the timeout mechanism works as expected for problematic frames.

Future Considerations

  • Implement adaptive buffer sizing based on system performance and available resources.

Copy link

vercel bot commented Oct 13, 2024

@newideas99 is attempting to deploy a commit to the Cap Software Inc Team on Vercel.

A member of the Team first needs to authorize it.

@newideas99
Copy link
Contributor Author

Everything seemed to build and work when i tested, but give it a test before pushing

@newideas99 newideas99 changed the title Optimize AsyncVideoDecoder for Better Memory (Possible memory leak fix) Optimize AsyncVideoDecoder for Better Memory + Switch to Bounded Channels (Possible memory leak fix) Oct 14, 2024
@Brendonovich
Copy link
Member

Using VecDeque and switching to bounded channels are great changes, thanks. Wanted to note a couple of things:

Single Use of Sender: By moving the sender.send() call outside of all loops, we ensure that the sender is used exactly once per request. This prevents potential memory leaks that could occur if frames were sent multiple times or if the sender was dropped without being used.

sender is a oneshot, it's impossible for more than one frame to get sent over it. Dropping the sender without using it is also fine, we treat it the same as sending None.

Performance Issues: The old code continued decoding frames even after finding the requested frame. The new implementation breaks the decoding loop as soon as the required frame is found, improving performance.

This is done on purpose to warm the cache ahead of time during playback, we keep decoding past the requested frame until we reach the limit or until a new frame request comes in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants