Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

questions regarding low req/s write and how to sync between them as ipc use #11

Closed
sprappcom opened this issue Jun 29, 2024 · 9 comments

Comments

@sprappcom
Copy link

sprappcom commented Jun 29, 2024

  1. why is the write so slow at 2500 req/s? how to make it send as fast as possible?
  2. any mechanism to synchronize reader and writer like https://github.com/cloudwego/shmipc-go ? how to coordinate the read and write ipc?

thx for the great work by the way

mod common;

use common::HelloWorld;
use mmap_sync::synchronizer::Synchronizer;
use std::sync::{Arc, atomic::{AtomicUsize, Ordering}};
use std::thread;
use std::time::{Duration, Instant};

fn main() {
    let mut synchronizer = Synchronizer::new("/tmp/hello_world".as_ref());

    let data = HelloWorld {
        version: 7,
        messages: vec!["Hello".to_string(), "World".to_string(), "!".to_string()],
    };

    let request_count = Arc::new(AtomicUsize::new(0));
    let request_count_clone = Arc::clone(&request_count);

    // Spawn a thread to print requests per second
    thread::spawn(move || {
        let mut start = Instant::now();
        loop {
            // Check if a second has passed
            if start.elapsed() >= Duration::from_secs(1) {
                let requests = request_count_clone.swap(0, Ordering::Relaxed);
                println!("Server requests per second: {}", requests);
                start = Instant::now();
            }
            // Sleep for a short duration to yield CPU
            thread::sleep(Duration::from_millis(10));
        }
    });

    loop {
        // Write data to shared memory without any delay
        synchronizer.write(&data, Duration::from_nanos(1)).expect("failed to write data");
        request_count.fetch_add(1, Ordering::Relaxed);
    }
}
@bocharov
Copy link
Collaborator

bocharov commented Jul 5, 2024

Thank you for your feedback and for using our library!

Regarding the slow write performance at 2500 req/s, using a tmpfs backed memory can definitely provide a significant performance boost. Using tmpfs will reduce the disk I/O latency since it operates directly on RAM, offering faster read and write capabilities compared to conventional disk-based storage. Here's how you can adjust your implementation:

let mut synchronizer = Synchronizer::new("/dev/shm/hello_world".as_ref());

This change points the synchronizer to use a shared memory object located in a tmpfs filesystem, which is typically mounted at /dev/shm on most Linux systems. This should help alleviate some of the bottlenecks associated with disk I/O.

If /dev/shm does not provide enough space or if you want to create a dedicated tmpfs instance, you can set up your own with the desired size. For example, to create a 1GB tmpfs volume, you can use the following command:

sudo mount -t tmpfs -o size=1G tmpfs /mnt/mytmpfs

Additionally, for optimizing the performance further, make sure to compile and run your program in release mode:

cargo run --release --example writer

After implementing these adjustments and running in release mode, I observed a substantial increase in the server requests per second:

Server requests per second: 34829
Server requests per second: 35479
Server requests per second: 35779

It's also important to note that this library is optimized for a read-heavy data access pattern, where typically 99% of operations are reads from many processes, and only about 1% are writes from a single writer. Nonetheless, by utilizing a tmpfs volume and making these adjustments, you can still achieve considerable performance enhancements even with a higher frequency of write operations.

@bocharov bocharov closed this as completed Jul 5, 2024
@sprappcom
Copy link
Author

@bocharov the write speed is too slow.

can u check this out?
https://github.com/cloudwego/shmipc-go

the read write speed is rather balanced. however i get millions of read req/s but only thousands of writes / s

@sprappcom
Copy link
Author

@bocharov reader speed
reader Server requests per second: 80476354
writer Server requests per second: 49318

reader

mod common;

use common::HelloWorld;
use mmap_sync::synchronizer::Synchronizer;
use std::sync::{Arc, atomic::{AtomicUsize, Ordering}};
use std::thread;
use std::time::{Duration, Instant};

fn main() {
    // Initialize the Synchronizer
    let mut synchronizer = Synchronizer::new("/tmp/hello_world".as_ref());

    // Read data from shared memory
    let request_count = Arc::new(AtomicUsize::new(0));
    let request_count_clone = Arc::clone(&request_count);

    thread::spawn(move || {
        let mut start = Instant::now();
        loop {
            // Check if a second has passed
            if start.elapsed() >= Duration::from_secs(1) {
                let requests = request_count_clone.swap(0, Ordering::Relaxed);
                println!("Server requests per second: {}", requests);
                start = Instant::now();
            }
            // Sleep for a short duration to yield CPU
            thread::sleep(Duration::from_millis(10));
        }
    });

    loop {
        // Read data from shared memory within an unsafe block
        unsafe {
            synchronizer.read::<HelloWorld>(false).expect("failed to read data");
        }
        // Write data to shared memory without any delay
        //synchronizer.write(&data, Duration::from_nanos(1)).expect("failed to write data");
        request_count.fetch_add(1, Ordering::Relaxed);
    }

    // Access fields of the struct
    //println!("version: {} | messages: {:?}", data.version, data.messages);
}

@bocharov
Copy link
Collaborator

bocharov commented Jul 8, 2024

@sprappcom I've looked with profiler at write method and most of the time is spent on serializing data of HelloWorld struct. With rkyv it's only zero-copy deserialization, serialization still takes a bit of time, even though fast.

If your use case permits, you can consider using Synchronizer::write_raw method, which accepts already serialized bytes:

    // serialize given entity into bytes
    let mut serializer = DefaultSerializer::default();
    let _ = serializer
        .serialize_value(&data)
        .unwrap();
    let bytes = serializer.into_serializer().into_inner();

    loop {
        // Write data to shared memory without any delay
        synchronizer.write_raw::<HelloWorld>(bytes.as_ref(), Duration::from_nanos(10)).expect("failed to write data");
        // synchronizer.write(&data, Duration::from_nanos(10)).expect("failed to write data");
        request_count.fetch_add(1, Ordering::Relaxed);
    }

With this change write requests went from 35k to 115k per second on my laptop.

More performance gains are possible by optimizing DataContainer::write logic to re-use opened mapped file similarly to how DataContainer::data does it. PRs are welcome.

https://github.com/cloudwego/shmipc-go looks interesting too, we could borrow few things from there for the future versions, although it's implemented in Go and will need to adopt logic to Rust.

@sprappcom
Copy link
Author

@bocharov golang is slower than rust. hope to see it faster than shmipc-go.

thx by the way.

p.s. : 115k writes/s vs 80 mil reads/s

shmipc-go (golang using shm) does 4mil reads/writes per second.

do u think there's an ETA on when this speed up is possible?

@bocharov
Copy link
Collaborator

@sprappcom good news, I was able to optimize writer performance via #12

You can expect ~4M writes/s with Synchronizer::write method and ~7M writes/s with Synchronizer::write_raw method.
I've added benchmarks so you can run them yourself and report the numbers you're getting on your machine with:

cargo criterion --bench synchronizer

On my Linux laptop with 13th Gen Intel(R) Core(TM) i7-13800H processor I'm getting:

synchronizer/write
    time:   [250.71 ns 251.42 ns 252.41 ns]
    thrpt:  [3.9619 Melem/s 3.9774 Melem/s 3.9887 Melem/s]

synchronizer/write_raw
    time:   [145.25 ns 145.53 ns 145.92 ns]
    thrpt:  [6.8531 Melem/s 6.8717 Melem/s 6.8849 Melem/s]

synchronizer/read/check_bytes_true
    time:   [40.114 ns 40.139 ns 40.186 ns]
    thrpt:  [24.884 Melem/s 24.914 Melem/s 24.929 Melem/s]

synchronizer/read/check_bytes_false
    time:   [26.658 ns 26.673 ns 26.696 ns]
    thrpt:  [37.458 Melem/s 37.491 Melem/s 37.512 Melem/s]

@sprappcom
Copy link
Author

@bocharov thx for the efficiency! u are great! it's really amazing. thx again.

@sprappcom
Copy link
Author

sprappcom commented Jul 10, 2024

@bocharov given the amount of wyhash2 collision here, wont it be better to make it xxh3? possible to provide a way for users to define the hash function? i mean they can define whatever function they want including xxh3, fnv1a etc

https://docs.google.com/spreadsheets/d/1HmqDj-suH4wBFNg7etwE8WVBlfCufvD5-gAnIENs94k/edit?pli=1&gid=1915335726#gid=1915335726

coming from golang background, something along the lines of this hash feature:
https://github.com/elastic/go-freelru

some cases xxh3 is better, some fnv1a, wyhash2 is good (as mentioned online somewhere) for length < 192.
i trust xxh3 and fnv1a for the collision performance etc.

p.s. : alternatively, i suggest an option for users to choose between xxh3, fnv1a or wyhash2

@bocharov
Copy link
Collaborator

@sprappcom default hasher is wyhash not wyhash2, which has decent collision properties as per https://medium.com/@tprodanov/benchmarking-non-cryptographic-hash-functions-in-rust-2e6091077d11

It's still possible to provide different hasher to be used for checksum calculation by Synchronizer. It's H tempalte parameter in Synchronizer struct: https://github.com/cloudflare/mmap-sync/blob/main/src/synchronizer.rs#L27

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants