Generic Array Datatype #127

theduke · 2022-07-15T14:12:38Z

There currently only is a resource-array datatype, which requires using nested resources if there are multiple values.

Often I would want to have a property with multiple plain values though.

Reasons:

Image you want an array of ints or strings.

defining an extra resource type is really noisy in the schema
wrapping each value in an extra object introduces a lot of overhead for larger databases

So there should be a datatype for "array of type T".

Defining the nested type would run into similar issues as #126 though.

The text was updated successfully, but these errors were encountered:

joepio · 2022-07-16T08:19:10Z

I always felt like this was bound to come up at some point. I think you're right, we probably need an Array datatype.

I think that if a Property has the Array datatype, it should also indicate which types of elements are supported. Maybe it has a second datatype, namely innerDatatype, which refers to the shape of the items in the array (e.g. String or Integer).

theduke · 2022-07-17T08:04:35Z

This brings up an interesting modeling problem.

How do you express "array of integers" in the schema?

This is actually the more general problem of "how to refine types".

I see several solutions, all of them with downsides.

Additional Properties on Property Resources

A property of type atomicdata.dev/datatypes/array could use a atomicdata.dev/properties/array-item-type property to specify the expected type of array items.

The big downside here is that it would not be apparent from the schema that this property is expected or required as a refinement of the array datatype, so that makes the schema more cryptic and implementations more complicated.

It's also more complex to "unify" and compare schema types, since libraries now need to understand that the array-item-type property, and convert those into an Array<T> type for processing.

Custom Datatypes

Have something like a ../classes/ArrayType class, which requires the array-item-type property.

Properties can then specify their type (usually with a nested resource, probably) as an ArrayType.

The downside here is that libraries now have to understand what an ArrayType means, and need code to unify different ArrayType definitions into a Array<T> type for things like queries, filters, etc.(as above)

Express Types With a Core Type System

In my factordb implementation I went in a somewhat different direction.

I don't allow defining arbitrary datatypes.
Types have to be expressed in terms of the built-in core type system.

A simplified definition of the core types in Rust looks a bit like this:

pub enum ValueType {
    Const(Value),

    Any,

    Unit,

    Bool,
    Int {
        min: Option<i64>,
        max: Option<i64>,
    },
    UInt {
        min: Option<u64>,
        max: Option<u64>,
    },
    Float {
        min: Option<f64>,
        max: Option<f64>,
    },
    String {
        min_length: Option<u64>,
        max_length: Option<u64>,
        regex_validators: Option<Vec<String>>,
    },
    Bytes {
        min_length: Option<u64>,
        max_length: Option<u64>,
    },

    // Containers.
    List {
        item_type: Box<Self>,
        min_length: Option<u64>,
        max_length: Option<u64>,
    },

    /// A mapping from keys to values
    Map {
        key_type: Box<Self>,
        value_type: Box<Self>,
    },

    /// 
    Object(ObjectType),

    /// An anonymous union of different types.
    Union(Vec<Self>),
    /// Tagged union (aka sum type / ADT)
    Variant(VariantType),

    /// Reference (aka foreign key) pointing to another entity
    Reference {
        /// Restrict the allowed entity types.
        allowed_types: Option<HashSet<Ident>>,
    },
    
    /// A custom data type.
    Named(Ident),
}

Properties can either specify a concrete ValueType as their type (serialized as a nested object), or a custom datatype, but custom datatype entities essentially only provide a named definition for a specific ValueType.

The main advantage here is that clients will always be able to understand and work with all data.

More complex types can always be expressed in terms of this core schema, and worst case they can just use a bytes array or string for arbitrary serialization.

(including things like ObjectType or Map here is probably very debatable because it is hard to express in something like a triple/quad format, and might be better expressed with something like nested resources, but I don't have that yet)

theduke mentioned this issue Jul 15, 2022

Generic Set Type #128

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generic Array Datatype #127

Generic Array Datatype #127

theduke commented Jul 15, 2022

joepio commented Jul 16, 2022

theduke commented Jul 17, 2022 •

edited

Loading

Generic Array Datatype #127

Generic Array Datatype #127

Comments

theduke commented Jul 15, 2022

joepio commented Jul 16, 2022

theduke commented Jul 17, 2022 • edited Loading

Additional Properties on Property Resources

Custom Datatypes

Express Types With a Core Type System

theduke commented Jul 17, 2022 •

edited

Loading