Skip to content
This repository has been archived by the owner on Oct 8, 2020. It is now read-only.

Core concepts

Daniel Wertheim edited this page Nov 29, 2012 · 27 revisions

History

A while back ago I started to fiddle with Microsofts CTP edition of code first in Entity framework 4. The product is great but I wanted something else, I wanted something “more” schemaless. I turned to MongoDB and wrote together an open source driver targeting .Net 4. At the time there was things that I didn’t like so I built a document DB over Lucene. I relatively quickly discovered that I missed all the great infrastructure that SQL-server provides. Security, replication, scheduler etc, so I prototyped a solution that used JSON to create a document/structure provider over SQL-server, namely: Simple Structure Oriented Db (SisDb).

How is data stored?

All objects (structures) you pass to SisoDb is stored as JSON and as key-value indexes. SisoDb stores your POCO-graphs (plain old clr objects) using JSON which enables us to go from a POCO to persistable JSON. For each type of entity there will be a set of tables created for you on the fly. One table, the Structure-table, is to be seen as the master table and it holds the document (the JSON-representation of the structure/document). There is also one table, Uniques-table, that holds values of the scalar properties that you mark as unique. This table is only used when inserting and updating documents, and ensures that your unique constraints are enforced. Lastly there's a set of tables created, called the Indexes-tables. Read more about it below, but in short, these tables holds the value of each indexed property in the object graph, with the purpose of providing a more performant and query friendly representation of the entity. You can of course decide on what to index.

Models

The complete graph is serialized and deserialized so there's no magical lazy loading and proxies. Instead you have to think a bit before designing your documents (as with all pure document oriented DBs) and have to learn your self in thinking in documents. It isn't a relational data model you are working in. You could of course take advantage of a cacheprovider and mark queries as cacheable. That way you could pull in references from cache instead of Db. But again. It's not a relational model. In some scenarios you might benefit from duplicating data and create specific readmodels; which of course causes more writes when you update your models.

Tables

The tables in the database that are required for an entity are created on the fly and is nothing you should have to care about. For each structure there will be a set of tables created.

Storage layout

Grouping data that is of the samy type together like this, has been done so that more effective indexes can be designed for enhancing queries and at the same time retaining insert speed.

CustomerStructure

Holds the document, which by default is represented as JSON, and is what's being deserialized when you retrieve data as entities. You can read more about the serialization here.

CustomerIndexes (integers, fractals, ..., texts)

The primary concern of these tables are for querying. All queries generated by SisoDb, unless you query by id, are translated and executed against these tables. It holds the member paths and the typed values of each scalar property in the object hierarchy as well as pre-converted string-values so that effective LIKE queries etc can be performed.

How is nested enumerable members handled?

If a member is part of something that is enumerable, e.g: ProductNo of an OrderLine in Order - it will result in several rows with the same member path in the Indexes-table. The member path will be: Order.Lines.ProductNo.

CustomerUniques

As with the Indexes-table, it is key-value oriented, but the value is always a checksum generated for each scalar property that has been marked with the SisoDb.Annotations.UniqueAttribute. This attribute can be used in one of two ways. Either you mark a scalar property to be unique per instance (e.g. OrderLine.ProductNo)** or per type** (e.g. Order.OrderNo).

Text vs String

SisoDb makes a difference in strings and texts. From v13.0.0-pre1 it has been simplified and to get content classified as text, your property should be of type string and have a name that ends with: Text | Content | Description | Body. This is a convention you can control, by replacing a Func.

db.StructureSchemas
    .SchemaBuilder
    .DataTypeConverter
    .MemberNameIsForTextType = name => name.EndsWith("Foo");

Values classified as strings will end up in [Entity]Strings table and values classified as texts in the [Entity]Texts table. Strings has a max length of 300 chars while Text doesn’t. This semantic separation is done so that effective indexes for queries could be created for normal strings, which isn’t feasible if it would be nvarchar(max).

public class BlogPost
{
    public Guid Id { get; set; }
    public string Title { get; set; } //Ends up in BlogPostStrings
    public string MyText { get; set; } //Ends up in BlogPostTexts
    public string MyContent { get; set; } //Ends up in BlogPostTexts
    public string MyDescription { get; set; } //Ends up in BlogPostTexts
    public string MyBody { get; set; } //Ends up in BlogPostTexts
}

var post = new BlogPost 
{
    Title = "A title of max 300 chars",
    Content = "Some long text that can exceed 300 chars."
}

JSON

The JSON representation of the structure is what's being deserialized back when you query for your documents. This gives you opportunities to store an structure as a class or interface and then return it as something completely else. You can read more about it here: Store as X, return as Y.

Serializer

The default JSON serializer being used is one of the most performant serializer within .Net, namely: ServiceStack.Text. Read more about a comparison here: http://daniel.wertheim.se/2011/02/07/json-net-vs-servicestack/; but again. There are custom packages allowing you to use custom serializers as well, and you can of course write one on your own. Read more here.

Key-value indexes

The primary concerns of these tables are querying. All SQL-queries generated by SisoDb, unless you query by id, are translated and executed against these tables. The tables are created the first time you use execute a command against a Session for a certain type of structure (Person, Customer, Order, ...). A schema for that structure is constructed and cached in the Database instance you created via your SisoDbFactory. This schema is essentially made up of property accessors, that via IL EMITs accesses the values of your structures and gives them a key. The key is the complete member path of the property.

public class Customer
{
    public Guid CustomerId { get; set; }
    public int CustomerNo { get; set; }
    public Address Address { get; set; }
}

public class Address
{
    public string Street { get; set; }
    public string Zip { get; set; }
    public string City { get; set; }
}

This will create six different cached index accessors, with the member path and type info:

  • CustomerId : Guid
  • CustomerNo : int
  • Address.Street : string
  • Address.Zip : string
  • Address.City : string

Now, everytime you insert or update a structure into the database, this cached accessors will be used to extract the values of the structure. Each value will be stored using the member path as the key and the value as the typed value in a certain indexes table.

Control what to index

Per default every scalar property in the object graph is extracted and given a key and a value, but you can control what to index. Read more about it here: Control what to index

Unique constraints

There's one attribute you can use in SisoDb, and it's the UniqueAttribute. Using it you can mark a scalar property as being unique, either per instance or per type.

public class Customer
{
    public Guid CustomerId { get; set; }

    [Unique(UniqueModes.PerType)]
    public int CustomerNo { get; set; }

    ...
}

When doing this, a checksum is generated for the CustomerNo which is inserted in the Unques-table. That table has some unique constraints in it, which will be enforced the next time you insert or update a document of that type.

Model updates

If you keep your self to adding new or dropping existing properties nothing is needed, it all works. If you remove a member from the model, values for the member is deleted from the indexes-table and the uniques-table. The structure-table will of course contain the "truth", the JSON, which will not be updated until the structure is "touched" - fetched and re-saved.

More complex model updates?

There is the concept of using a structureset updater to handle more complex model updates. Read more about it here

Clone this wiki locally