Wednesday, August 29, 2012

Managed Data for .NET

Ensō is an interesting new language being developed by Alex Loh, William R. Cook, and Tijs van der Storm. The overarching goal is to significantly raise the level of abstraction, partly via declarative data models.

They recently published a paper on this subject for Onwards! 2012 titled Managed Data: Modular Strategies for Data Abstraction. Instead of programmers defining concrete classes, managed data requires the programmer to define a schema describing his data model, consisting of a description of the set of fields and field types. Actual implementations of this schema are provided by "data managers", which interpret the schema and add custom behaviour. This is conceptually similar to aspect-oriented programming, but with a safer, more principled foundation.

A data manager can implement any sort of field-like behaviour. The paper describes a few basic variants:

  • BasicRecord: implements a simple record with getters and setters.
  • LockableRecord: implements locking on a record, rendering it immutable.
  • InitRecord: implements field initialization on records.
  • ObserverRecord: implements the observer pattern, notifying listeners of any field changes.
  • DataflowRecord: registers field dependencies and recalculates dependent fields on fields that change.

Managed Data for .NET

The core idea of managed data requires two basic concepts, a declarative means of describing the schema, and a means of interpreting that schema to add behaviour. .NET interfaces are a means to specify simple declarative schemas completely divorced from implementations. The following interface can be seen as the IFoo schema containing an immutable integer field and a mutable string field:

// the schema for a data object
public interface IFoo
{
  int Bar { get; }
  string Fooz { get; set; }
}

Data managers then generate concrete instances of IFoo with the desired behaviour. To fit this into a typed framework, I had to reorganize the concepts a little from what appears in the paper:

// creates data instances with custom behaviour
public sealed class DataManager
{
  // create an instance of interface type T
  public T Create<T>();
}

I have a single DataManager type which analyzes the interface T and generates an instance with all the same properties as found in T. The DataManager constructor accepts an instance of ISchemaCompiler, which is where the actual magic happens:

public interface ISchemaCompiler
{
  // next compiler in the chain
  ISchemaCompiler Next { get; set; }
  // a new type is being defined
  void Type(TypeBuilder type);
  // a new property is being defined
  void Property(TypeBuilder type, PropertyBuilder property);
  // a new setter is being defined
  void Setter(PropertyBuilder prop, MethodBuilder setter,
              ILGenerator il);
  // a new getter is being defined
  void Getter(PropertyBuilder prop, MethodBuilder getter,
              ILGenerator il);
}

So DataManager creates a dynamic type implementing an interface, and it calls into the ISchemaCompiler chain while it's generating the various properties. The schema compilers can then output IL to customize the behaviour of the various property getters and setters.

You'll note however that the IFoo schema has an immutable property "Bar". We can specify an initializer for this property using the Schema object that the DataManager uses:

var schema = new Schema();
schema.Type<IFoo>()
      .Default(x => x.Bar, x => 4);

This declares that the Bar property maps to a constant value of 4. It need not be a constant of course, since the initializer is an arbitrary delegate.

The following schema compilers are implemented and tested:

  • BasicRecord: implements the backing fields for the properties.
  • LockableRecord: unlike the paper's lockable record, this version actually calls Monitor.Enter and Monitor.Exit for use in concurrent scenarios.
  • NotifyChangedRecord: implements INotifyPropertyChanged on all properties
  • ChangesOnlyRecord: only assigns the field if the value differs.

Developing programs with managed data consists of only defining interfaces describing your business model and allowing the DataManager to provide the instances. This is obviously also excellent for mocking and unit testing purposes, so it's a win all around.

Here's a simple test program that demonstrates the use of managed data via the composition of ChangesOnlyRecord, NotifyChangedRecord and BasicRecord:

var schema = new Schema();
schema.Type<IFoo>()
      .Default(x => x.Bar, x => 4);
// construct the data manager by composing schema compilers
var record = new BasicRecord();
var dm = new DataManager(schema, new ChangesOnlyRecord
{
    Record = record,
    Next = new NotifyChangedRecord { Next = record }
});
// create instance of IFoo
var y = dm.Create<IFoo>();
var inotify = y as INotifyPropertyChanged;
var bar = y.Bar;
var fooz = y.Fooz;
int count = 0;
Assert(bar == 4);
Assert(fooz == null);
// register notification Fooz changes
inotify.PropertyChanged += (o, e) =>
{
    if (e.PropertyName == "Fooz")
    {
        fooz = y.Fooz;
        count++;
    }
};
// trigger change notification
y.Fooz = "Hello World!";
Assert(fooz == "Hello World!");
Assert(count == 1);
// no change notification since value unchanged
y.Fooz = "Hello World!";
Assert(count == 1);
// trigger second change notification
y.Fooz = "empty";
Assert(fooz == "empty");
Assert(count == 2);

Closing Thoughts

You can download the current implementation here, but note that it's still an alpha preview. I'll probably eventually integrate this with my Sasa framework under Sasa.Data, together with a few more elaborate data managers. For instance, a data manager that uses an SQL server as a backend. Say goodbye to NHibernate mapping files and LINQ attributes, and just let the data manager create and manage your tables!

Saturday, August 25, 2012

M3U.NET: Parsing and Output of .m3u files in .NET

I've been reorganizing my media library using the very cool MusicBrainz Picard, but of course all my m3u files broke. So I wrote the free M3U.NET library, and then wrote a utility called FixM3U that regenerates an M3U file by searching your music folder for the media files based on whatever extended M3U information is available:

> FixM3u.exe /order:title,artist foo.m3u bar.m3u ...

The M3U.NET library itself has a fairly simple interface:

// Parsing M3U files.
public static class M3u
{
  // Write a media list to an extended M3U file.
  public static string Write(IEnumerable<MediaFile> media);
  // Parse an M3U file.
  public static IEnumerable<MediaFile> Parse(
         string input,
         DirectiveOrder order);
  // Parse an M3U file.
  public static IEnumerable<MediaFile> Parse(
         IEnumerable<string> lines,
         DirectiveOrder order);
}

The 3 exported types are straightforward. A MediaFile just has a full path to the file itself and a list of directives supported by the extended M3U format:

// A media file description.
public sealed class MediaFile
{
    // The full absolute path to the file.
    public string Path { get; set; }
    // Extended M3U directives.
    public List<MediaDirective> Directives { get; set; }
}

The directives are represented in this library as key-value pairs:

// An extended M3U directive.
public struct MediaDirective
{
    // The directive name.
    public string Name { get; set; }
    // The directive value.
    public string Value { get; set; }
    // The separator delineating this field from the next.
    public char? Separator { get; set; }
}

The currently supported keys are "Artist", "Title" and "Length".

The M3U format is supposed to order directives as "length, artist - title", but iTunes seems to reverse the order of artist and title. I've thus made this configurable via a parsing parameter of type DirectiveOrder, and you can specify the ordering when parsing:

// The order of the title and artist directives.
public enum DirectiveOrder
{
    // Artist followed by title.
    ArtistTitle,
    // Title followed by artist.
    TitleArtist,
}

Monday, August 20, 2012

Delete Duplicate Files From the Command-line with .NET

Having run into a scenario where I had directories with many duplicate files, I just hacked up a simple command-line solution based on crypto signatures. It's the same idea used in source control systems like Git and Mercurial, basically the SHA-1 hash of a file's contents.

Sample usage:

DupDel.exe [target-directory]

The utility will recursively analyze any sub-directories under the target directory and build an index of all files based on their content. Once complete, duplicates are processed in an interactive manner where the user is presented with a choice of which duplicate to keep

Keep which of the following duplicates:
1. \Some foo.txt
2. \bar\some other foo.doc
>

The types of files under the target directory are not important, so you can pass in directories to documents, music files, pictures, etc. My computer churned through 30 GB of data in about 5 minutes, so it's reasonably fast.