Skip to main content

Managed Data for .NET

Ensō is an interesting new language being developed by Alex Loh, William R. Cook, and Tijs van der Storm. The overarching goal is to significantly raise the level of abstraction, partly via declarative data models.

They recently published a paper on this subject for Onwards! 2012 titled Managed Data: Modular Strategies for Data Abstraction. Instead of programmers defining concrete classes, managed data requires the programmer to define a schema describing his data model, consisting of a description of the set of fields and field types. Actual implementations of this schema are provided by "data managers", which interpret the schema and add custom behaviour. This is conceptually similar to aspect-oriented programming, but with a safer, more principled foundation.

A data manager can implement any sort of field-like behaviour. The paper describes a few basic variants:

  • BasicRecord: implements a simple record with getters and setters.
  • LockableRecord: implements locking on a record, rendering it immutable.
  • InitRecord: implements field initialization on records.
  • ObserverRecord: implements the observer pattern, notifying listeners of any field changes.
  • DataflowRecord: registers field dependencies and recalculates dependent fields on fields that change.

Managed Data for .NET

The core idea of managed data requires two basic concepts, a declarative means of describing the schema, and a means of interpreting that schema to add behaviour. .NET interfaces are a means to specify simple declarative schemas completely divorced from implementations. The following interface can be seen as the IFoo schema containing an immutable integer field and a mutable string field:

// the schema for a data object
public interface IFoo
{
  int Bar { get; }
  string Fooz { get; set; }
}

Data managers then generate concrete instances of IFoo with the desired behaviour. To fit this into a typed framework, I had to reorganize the concepts a little from what appears in the paper:

// creates data instances with custom behaviour
public sealed class DataManager
{
  // create an instance of interface type T
  public T Create<T>();
}

I have a single DataManager type which analyzes the interface T and generates an instance with all the same properties as found in T. The DataManager constructor accepts an instance of ISchemaCompiler, which is where the actual magic happens:

public interface ISchemaCompiler
{
  // next compiler in the chain
  ISchemaCompiler Next { get; set; }
  // a new type is being defined
  void Type(TypeBuilder type);
  // a new property is being defined
  void Property(TypeBuilder type, PropertyBuilder property);
  // a new setter is being defined
  void Setter(PropertyBuilder prop, MethodBuilder setter,
              ILGenerator il);
  // a new getter is being defined
  void Getter(PropertyBuilder prop, MethodBuilder getter,
              ILGenerator il);
}

So DataManager creates a dynamic type implementing an interface, and it calls into the ISchemaCompiler chain while it's generating the various properties. The schema compilers can then output IL to customize the behaviour of the various property getters and setters.

You'll note however that the IFoo schema has an immutable property "Bar". We can specify an initializer for this property using the Schema object that the DataManager uses:

var schema = new Schema();
schema.Type<IFoo>()
      .Default(x => x.Bar, x => 4);

This declares that the Bar property maps to a constant value of 4. It need not be a constant of course, since the initializer is an arbitrary delegate.

The following schema compilers are implemented and tested:

  • BasicRecord: implements the backing fields for the properties.
  • LockableRecord: unlike the paper's lockable record, this version actually calls Monitor.Enter and Monitor.Exit for use in concurrent scenarios.
  • NotifyChangedRecord: implements INotifyPropertyChanged on all properties
  • ChangesOnlyRecord: only assigns the field if the value differs.

Developing programs with managed data consists of only defining interfaces describing your business model and allowing the DataManager to provide the instances. This is obviously also excellent for mocking and unit testing purposes, so it's a win all around.

Here's a simple test program that demonstrates the use of managed data via the composition of ChangesOnlyRecord, NotifyChangedRecord and BasicRecord:

var schema = new Schema();
schema.Type<IFoo>()
      .Default(x => x.Bar, x => 4);
// construct the data manager by composing schema compilers
var record = new BasicRecord();
var dm = new DataManager(schema, new ChangesOnlyRecord
{
    Record = record,
    Next = new NotifyChangedRecord { Next = record }
});
// create instance of IFoo
var y = dm.Create<IFoo>();
var inotify = y as INotifyPropertyChanged;
var bar = y.Bar;
var fooz = y.Fooz;
int count = 0;
Assert(bar == 4);
Assert(fooz == null);
// register notification Fooz changes
inotify.PropertyChanged += (o, e) =>
{
    if (e.PropertyName == "Fooz")
    {
        fooz = y.Fooz;
        count++;
    }
};
// trigger change notification
y.Fooz = "Hello World!";
Assert(fooz == "Hello World!");
Assert(count == 1);
// no change notification since value unchanged
y.Fooz = "Hello World!";
Assert(count == 1);
// trigger second change notification
y.Fooz = "empty";
Assert(fooz == "empty");
Assert(count == 2);

Closing Thoughts

You can download the current implementation here, but note that it's still an alpha preview. I'll probably eventually integrate this with my Sasa framework under Sasa.Data, together with a few more elaborate data managers. For instance, a data manager that uses an SQL server as a backend. Say goodbye to NHibernate mapping files and LINQ attributes, and just let the data manager create and manage your tables!

Comments

Unknown said…
How about adding support for IDataErrorInfo in generated properties e.g. you could configure a property like so


schema.Type<Product>()
.Validate(x => x.Price > 0, "Price must be greater than 0")


and have the builder generate the validation code?
Sandro Magi said…
Good suggestion. I was thinking of some way to add contracts to each property. Something like what you have:

schema.Type<Product>()
.Requires(x => x.Price > 0, "Price must be greater than 0.");

schema.Type<Product>()
.Invariant(x => x.ItemNo != null, "Product must have an item#.");

So preconditions, postconditions and invariants. I'm not sure exactly how I'm going to do it, but it seems like a necessary extension.
William R Cook said…
Part of the beauty of the Ruby implementation of Managed Data is that everything is interpreted. This allows interpreters to be wrapped using inheritance to create new aspects. I'm curious if you are using lots of code generation, and if so whether you find it easy to extend/wrap the code generators with new functionality.
Sandro Magi said…
I could have gone the interpreted route for the backing store as well, but I explicitly chose code gen for performance reasons. Only the ISchemaCompiler interface would need to change to accommodate interpreters. You'd basically just have to change the ILGenerator parameter to an IDictionary on objects.

As for complexity, it isn't much assuming you're familiar with code gen on the CLR. Most of the code gen happens in the shared DataManager class. For example, here's the code gen required to add INotifyPropertChanged behaviour to a setter:

Next.Setter(prop, setter, il);
// raise property changed event by calling into Events.Raise
il.Emit(OpCodes.Ldarg_0);
il.Emit(OpCodes.Ldfld, propertyChanged);
il.Emit(OpCodes.Ldarg_0);
il.Emit(OpCodes.Ldstr, prop.Name);
il.Emit(OpCodes.Newobj, ctorEventArgs);
il.Emit(OpCodes.Call, raise);


The getter requires a comparable amount of code. This was also the more complicated of the schema compilers. The rest of the code is just standard boilerplate to deal with the CLR's reflection and code gen abstractions.

Popular posts from this blog

async.h - asynchronous, stackless subroutines in C

The async/await idiom is becoming increasingly popular. The first widely used language to include it was C#, and it has now spread into JavaScript and Rust. Now C/C++ programmers don't have to feel left out, because async.h is a header-only library that brings async/await to C! Features: It's 100% portable C. It requires very little state (2 bytes). It's not dependent on an OS. It's a bit simpler to understand than protothreads because the async state is caller-saved rather than callee-saved. #include "async.h" struct async pt; struct timer timer; async example(struct async *pt) { async_begin(pt); while(1) { if(initiate_io()) { timer_start(&timer); await(io_completed() || timer_expired(&timer)); read_data(); } } async_end; } This library is basically a modified version of the idioms found in the Protothreads library by Adam Dunkels, so it's not truly ground bre

Building a Query DSL in C#

I recently built a REST API prototype where one of the endpoints accepted a string representing a filter to apply to a set of results. For instance, for entities with named properties "Foo" and "Bar", a string like "(Foo = 'some string') or (Bar > 99)" would filter out the results where either Bar is less than or equal to 99, or Foo is not "some string". This would translate pretty straightforwardly into a SQL query, but as a masochist I was set on using Google Datastore as the backend, which unfortunately has a limited filtering API : It does not support disjunctions, ie. "OR" clauses. It does not support filtering using inequalities on more than one property. It does not support a not-equal operation. So in this post, I will describe the design which achieves the following goals: A backend-agnostic querying API supporting arbitrary clauses, conjunctions ("AND"), and disjunctions ("OR"). Implemen

Easy Automatic Differentiation in C#

I've recently been researching optimization and automatic differentiation (AD) , and decided to take a crack at distilling its essence in C#. Note that automatic differentiation (AD) is different than numerical differentiation . Math.NET already provides excellent support for numerical differentiation . C# doesn't seem to have many options for automatic differentiation, consisting mainly of an F# library with an interop layer, or paid libraries . Neither of these are suitable for learning how AD works. So here's a simple C# implementation of AD that relies on only two things: C#'s operator overloading, and arrays to represent the derivatives, which I think makes it pretty easy to understand. It's not particularly efficient, but it's simple! See the "Optimizations" section at the end if you want a very efficient specialization of this technique. What is Automatic Differentiation? Simply put, automatic differentiation is a technique for calcu