Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deserialization 3.5x slower than Python pickle, 4x slower than serde_json #14

Open
naktinis opened this issue Mar 6, 2021 · 1 comment

Comments

@naktinis
Copy link

naktinis commented Mar 6, 2021

I set up a simple benchmark with a 67MB pickle and measured deserialization speed in 7 scenarios.

library time
Python pickle.load 341 ms
Python json.load 397 ms
serde_json from_str 327 ms
bincode from_slice 314 ms
py-marshal marshal_load 691 ms
serde-pickle from_reader 1250 ms
serde-pickle from_slice 1310 ms

Is this a known behavior? Is there any hope of this getting improved in the foreseeable future? I share my setup below, so you can point out any issues or things I missed.

Data

>>> import random, string
>>> data = [''.join(random.sample(string.ascii_letters, 32)) for _ in range(2_000_000)]

Python load

>>> import time, pickle, marshal
>>> marshal.dump(data, open('test.marshal', 'wb'))
>>> pickle.dump(data, open('test.pickle', 'wb'))
>>> json.dump(data, open('test.json', 'w'))
>>> t = time.time(); _ = pickle.load(open('test.pickle', 'rb')); print(f'{time.time() - t:.3f}s')
0.341s
>>> t = time.time(); _ = json.load(open('test.json', 'rb')); print(f'{time.time() - t:.3f}s')
0.397s

Rust load

pub fn load_pickle(path: &str) -> pickle::Value {
    let file = BufReader::new(File::open(path).unwrap());
    pickle::from_reader(file).expect("couldn't load pickle")
}

pub fn load_pickle_slice(path: &str) -> pickle::Value {
    let mut bytes = Vec::new();
    File::open(path).unwrap().read_to_end(&mut bytes).unwrap();
    pickle::from_slice(&bytes).expect("couldn't load pickle")
}

pub fn load_marshal(path: &str) -> Result<Arc<RwLock<Vec<Obj>>>, &'static str> {
    let file = BufReader::new(File::open(path).unwrap());
    match read::marshal_load(file) {
        Ok(obj) => Ok(obj.extract_list().unwrap()),
        Err(_) => Err("error_load"),
    }
}

pub fn load_json(path: &str) -> json::Value {
    let mut s = String::new();
    File::open(path).unwrap().read_to_string(&mut s).unwrap();
    serde_json::from_str(&s).expect("couldn't load json")
}

pub fn load_bincode<T>(path: &str) -> T
    where T: serde::de::DeserializeOwned
{
    let file = BufReader::new(File::open(path).unwrap());
    bincode::deserialize_from(file).unwrap()
}

fn main() {
    println!("Loading pickle...");
    let timer = time::Instant::now();
    let data = load_pickle("test.pickle");
    println!("Load completed in {:.2?}", timer.elapsed());

    println!("Loading pickle slice...");
    let timer = time::Instant::now();
    let data = load_pickle_slice("test.pickle");
    println!("Load completed in {:.2?}", timer.elapsed());

    println!("Loading marshal...");
    let timer = time::Instant::now();
    let data = load_marshal("test.marshal").unwrap();
    println!("Load completed in {:.2?}", timer.elapsed());

    println!("Loading JSON...");
    let timer = time::Instant::now();
    let data = load_json("test.json");
    println!("Load completed in {:.2?}", timer.elapsed());

    println!("Loading Bincode...");
    let timer = time::Instant::now();
    let data: Vec<String> = load_bincode("test.bincode");
    println!("Load completed in {:.2?}", timer.elapsed());
}

Dependencies

[dependencies]
serde-pickle = "0.6"
bincode = "1.3"
serde_json = "1.0"
py-marshal = { git = "https://github.com/sollyucko/py-marshal" }
serde = { version = "1.0", features = ["derive"] }
@birkenfeld
Copy link
Owner

Thanks for the report, I can more or less reproduce the results. (Please include all of the code next time though, it makes it much easier.)

This crate hasn't been optimized for speed (yet), so it's not surprising that it won't outperform Python's pickle module. As for a comparison between different formats, that is always a little more difficult to reason about.

In any case I can't spend much time on this at present - PRs are welcome and I expect there might be some easy wins achievable with basic profiling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants