Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic Array Datatype #127

Open
theduke opened this issue Jul 15, 2022 · 2 comments
Open

Generic Array Datatype #127

theduke opened this issue Jul 15, 2022 · 2 comments

Comments

@theduke
Copy link

theduke commented Jul 15, 2022

There currently only is a resource-array datatype, which requires using nested resources if there are multiple values.

Often I would want to have a property with multiple plain values though.

Reasons:

Image you want an array of ints or strings.

  • defining an extra resource type is really noisy in the schema
  • wrapping each value in an extra object introduces a lot of overhead for larger databases

So there should be a datatype for "array of type T".

Defining the nested type would run into similar issues as #126 though.

@joepio
Copy link
Member

joepio commented Jul 16, 2022

I always felt like this was bound to come up at some point. I think you're right, we probably need an Array datatype.

I think that if a Property has the Array datatype, it should also indicate which types of elements are supported. Maybe it has a second datatype, namely innerDatatype, which refers to the shape of the items in the array (e.g. String or Integer).

@theduke
Copy link
Author

theduke commented Jul 17, 2022

This brings up an interesting modeling problem.

How do you express "array of integers" in the schema?

This is actually the more general problem of "how to refine types".

I see several solutions, all of them with downsides.

Additional Properties on Property Resources

A property of type atomicdata.dev/datatypes/array could use a atomicdata.dev/properties/array-item-type property to specify the expected type of array items.

The big downside here is that it would not be apparent from the schema that this property is expected or required as a refinement of the array datatype, so that makes the schema more cryptic and implementations more complicated.

It's also more complex to "unify" and compare schema types, since libraries now need to understand that the array-item-type property, and convert those into an Array<T> type for processing.

Custom Datatypes

Have something like a ../classes/ArrayType class, which requires the array-item-type property.

Properties can then specify their type (usually with a nested resource, probably) as an ArrayType.

The downside here is that libraries now have to understand what an ArrayType means, and need code to unify different ArrayType definitions into a Array<T> type for things like queries, filters, etc.(as above)

Express Types With a Core Type System

In my factordb implementation I went in a somewhat different direction.

I don't allow defining arbitrary datatypes.
Types have to be expressed in terms of the built-in core type system.

A simplified definition of the core types in Rust looks a bit like this:

pub enum ValueType {
    Const(Value),

    Any,

    Unit,

    Bool,
    Int {
        min: Option<i64>,
        max: Option<i64>,
    },
    UInt {
        min: Option<u64>,
        max: Option<u64>,
    },
    Float {
        min: Option<f64>,
        max: Option<f64>,
    },
    String {
        min_length: Option<u64>,
        max_length: Option<u64>,
        regex_validators: Option<Vec<String>>,
    },
    Bytes {
        min_length: Option<u64>,
        max_length: Option<u64>,
    },

    // Containers.
    List {
        item_type: Box<Self>,
        min_length: Option<u64>,
        max_length: Option<u64>,
    },

    /// A mapping from keys to values
    Map {
        key_type: Box<Self>,
        value_type: Box<Self>,
    },

    /// 
    Object(ObjectType),

    /// An anonymous union of different types.
    Union(Vec<Self>),
    /// Tagged union (aka sum type / ADT)
    Variant(VariantType),

    /// Reference (aka foreign key) pointing to another entity
    Reference {
        /// Restrict the allowed entity types.
        allowed_types: Option<HashSet<Ident>>,
    },
    
    /// A custom data type.
    Named(Ident),
}

Properties can either specify a concrete ValueType as their type (serialized as a nested object), or a custom datatype, but custom datatype entities essentially only provide a named definition for a specific ValueType.

The main advantage here is that clients will always be able to understand and work with all data.

More complex types can always be expressed in terms of this core schema, and worst case they can just use a bytes array or string for arbitrary serialization.

(including things like ObjectType or Map here is probably very debatable because it is hard to express in something like a triple/quad format, and might be better expressed with something like nested resources, but I don't have that yet)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants