-
Notifications
You must be signed in to change notification settings - Fork 25
Core concepts
A while back ago I started to fiddle with Microsofts CTP edition of code first in Entity framework 4. The product is great but I wanted something else, I wanted something “more” schemaless. I turned to MongoDB and wrote together an open source driver targeting .Net 4. At the time there was things that I didn’t like so I built a document DB over Lucene. I relatively quickly discovered that I missed all the great infrastructure that SQL-server provides. Security, replication, scheduler etc, so I prototyped a solution that used JSON to create a document/structure provider over SQL-server, namely: Simple Structure Oriented Db (SisDb).
All objects (structures) you pass to SisoDb is stored as JSON and as key-value indexes. SisoDb stores your POCO-graphs (plain old clr objects) using JSON which enables us to go from a POCO to persistable JSON. For each type of entity there will be a set of tables created for you on the fly. One table, the Structure-table, is to be seen as the master table and it holds the document (the JSON-representation of the structure/document). There is also one table, Uniques-table, that holds values of the scalar properties that you mark as unique. This table is only used when inserting and updating documents, and ensures that your unique constraints are enforced. Lastly there's a set of tables created, called the Indexes-tables. Read more about it below, but in short, these tables holds the value of each indexed property in the object graph, with the purpose of providing a more performant and query friendly representation of the entity. You can of course decide on what to index.
The complete graph is serialized and deserialized so there's no magical lazy loading and proxies. Instead you have to think a bit before designing your documents (as with all pure document oriented DBs) and have to learn your self in thinking in documents. It isn't a relational data model you are working in. You could of course take advantage of a cacheprovider and mark queries as cacheable. That way you could pull in references from cache instead of Db. But again. It's not a relational model. In some scenarios you might benefit from duplicating data and create specific readmodels; which of course causes more writes when you update your models.
The tables in the database that are required for an entity are created on the fly and is nothing you should have to care about. For each structure there will be a set of tables created.
Grouping data that is of the samy type together like this, has been done so that more effective indexes can be designed for enhancing queries and at the same time retaining insert speed.
Holds the document, which by default is represented as JSON, and is what's being deserialized when you retrieve data as entities. You can read more about the serialization here.
The primary concern of these tables are for querying. All queries generated by SisoDb, unless you query by id, are translated and executed against these tables. It holds the member paths and the typed values of each scalar property in the object hierarchy as well as pre-converted string-values so that effective LIKE queries
etc can be performed.
If a member is part of something that is enumerable, e.g: ProductNo
of an OrderLine
in Order
- it will result in several rows with the same member path in the Indexes-table. The member path will be: Order.Lines.ProductNo
.
As with the Indexes-table, it is key-value oriented, but the value is always a checksum generated for each scalar property that has been marked with the SisoDb.Annotations.UniqueAttribute
. This attribute can be used in one of two ways. Either you mark a scalar property to be unique per instance (e.g. OrderLine.ProductNo)** or per type** (e.g. Order.OrderNo).
SisoDb makes a difference in strings and texts. From v13.0.0-pre1 it has been simplified and to get content classified as text, your property should be of type string
and have a name that ends with: Text | Content | Description | Body
. This is a convention you can control, by replacing a Func
.
db.StructureSchemas
.SchemaBuilder
.DataTypeConverter
.MemberNameIsForTextType = name => name.EndsWith("Foo");
Values classified as strings will end up in [Entity]Strings table and values classified as texts in the [Entity]Texts table. Strings has a max length of 300 chars while Text doesn’t. This semantic separation is done so that effective indexes for queries could be created for normal strings, which isn’t feasible if it would be nvarchar(max).
public class BlogPost
{
public Guid Id { get; set; }
public string Title { get; set; } //Ends up in BlogPostStrings
public string MyText { get; set; } //Ends up in BlogPostTexts
public string MyContent { get; set; } //Ends up in BlogPostTexts
public string MyDescription { get; set; } //Ends up in BlogPostTexts
public string MyBody { get; set; } //Ends up in BlogPostTexts
}
var post = new BlogPost
{
Title = "A title of max 300 chars",
Content = "Some long text that can exceed 300 chars."
}
The JSON representation of the structure is what's being deserialized back when you query for your documents. This gives you opportunities to store an structure as a class or interface and then return it as something completely else. You can read more about it here: Store as X, return as Y.
The default JSON serializer being used is one of the most performant serializer within .Net, namely: ServiceStack.Text. Read more about a comparison here: http://daniel.wertheim.se/2011/02/07/json-net-vs-servicestack/; but again. There are custom packages allowing you to use custom serializers as well, and you can of course write one on your own. Read more here.
The primary concerns of these tables are querying. All SQL-queries generated by SisoDb, unless you query by id, are translated and executed against these tables. The tables are created the first time you use execute a command against a Session for a certain type of structure (Person, Customer, Order, ...). A schema for that structure is constructed and cached in the Database instance you created via your SisoDbFactory. This schema is essentially made up of property accessors, that via IL EMITs accesses the values of your structures and gives them a key. The key is the complete member path of the property.
public class Customer
{
public Guid CustomerId { get; set; }
public int CustomerNo { get; set; }
public Address Address { get; set; }
}
public class Address
{
public string Street { get; set; }
public string Zip { get; set; }
public string City { get; set; }
}
This will create six different cached index accessors, with the member path and type info:
- CustomerId : Guid
- CustomerNo : int
- Address.Street : string
- Address.Zip : string
- Address.City : string
Now, everytime you insert or update a structure into the database, this cached accessors will be used to extract the values of the structure. Each value will be stored using the member path as the key and the value as the typed value in a certain indexes table.
Per default every scalar property in the object graph is extracted and given a key and a value, but you can control what to index. Read more about it here: Control what to index
There's one attribute you can use in SisoDb, and it's the UniqueAttribute. Using it you can mark a scalar property as being unique, either per instance or per type.
public class Customer
{
public Guid CustomerId { get; set; }
[Unique(UniqueModes.PerType)]
public int CustomerNo { get; set; }
...
}
When doing this, a checksum is generated for the CustomerNo which is inserted in the Unques-table. That table has some unique constraints in it, which will be enforced the next time you insert or update a document of that type.
If you keep your self to adding new or dropping existing properties nothing is needed, it all works. If you remove a member from the model, values for the member is deleted from the indexes-table and the uniques-table. The structure-table will of course contain the "truth", the JSON, which will not be updated until the structure is "touched" - fetched and re-saved.
There is the concept of using a structureset updater to handle more complex model updates. Read more about it here