Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add deserialization #8

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from
Draft

Add deserialization #8

wants to merge 7 commits into from

Conversation

kerristrasz
Copy link
Owner

Implement serde::Deserializer. This closes #2.

@kerristrasz kerristrasz self-assigned this Jul 8, 2024
@kerristrasz
Copy link
Owner Author

One of the issues that will need to be resolved is how to handle lists. In KeyValues, lists are represented by simply repeating the key of the enclosing variable multiple times. For instance, the following struct:

use serde::{Serialize, Deserialize}

#[derive(Serialize, Deserialize)]
struct Data {
    pub numbers: Vec<i32>,
}

let data = Data { numbers: vec![10, 20, 30] };
println!(vdflex::kv_to_string("Data", &data));

could be serialized like this:

"Data"
{
    "numbers" "10"
    "numbers" "20"
    "numbers" "30"
}

This is quite easy to do when serializing (well, relatively easy at least). However, this poses an issue during deserialization. Suppose we have the following struct:

#[derive(Serialize, Deserialize)]
struct FooBar {
    foo: Vec<i32>,
    bar: Vec<i32>,
}

One possible value of this struct might be:

MyDataHere
{
    foo 0
    foo 1
    bar 2
    bar 3
}

In this case, there is no issue. When deserializing, the key foo is enough to know that we are deserializing the vec stored in the field named foo. We consume KeyValues with that key until we encounter one that doesn't match the field name (in this case, bar). Then, we're done with that field and can move on to the next one.

The major issue with this approach is that the key order is not guaranteed. In particular, nothing about the KeyValues standard (or rather, lack of a standard) mandates that keys with the same name are next to each other. Thus, the following is another, perfectly valid representation of the same data:

MyDataHere
{
    foo 0
    bar 2
    foo 1
    bar 3
}

This is a problem because we would have to consume the entire enclosing object (MyDataHere in this case) to know that we have used up all the keys. But we can't backtrack if we find data we don't need; once something is read and discarded, it's gone. It would be nice if we could just go key-by-key and add data to the existing vec if we find some other value to add, but Serde's data model does not allow that; we must deserialize the whole list in one go.

Note that this is only an issue for streaming parsers. In other words, this would be a non-issue if we simply parsed the entire file to some kind of a struct, then converted the struct to an actual Rust type. I would strongly prefer to avoid this because it would probably be a lot slower and would take more memory, but I can't think of a better way to do this at the moment. It might be worth adding a half-baked implementation just to get deserialization working, then optimize it later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement deserialization
1 participant