Thursday, April 17, 2008

Object/Relational Mapping and Factories

My day job is has stuck me with C#, but I try to make the best of it, as evidenced by my FP# and Sasa C# libraries.

One thing that still gets in my way more than it should is O/R mapping. No other mapper I've come across encourages a true object-oriented application structure. Granted, I've only really used NHibernate, and I had built my own mapper before that was even available, but I've read up quite a bit on the the other mappers.

By true OO structure, I mean that all application objects are only constructed from other application objects, which doesn't involve dependencies on environment-specific code (ie. if you're running under ASP.NET, Windows forms, Swing, etc.). A pure structure encourages a proper separation between core application code, and display and controller code, which allows more flexible application evolution.

Instead, controller logic often manually constructs application objects, passing in default arguments to properly initialize the required fields. This means constructor and initialization code must be duplicated when running in another environment, or tedious refactoring is needed when changing the constructor interface. Further, the defaults are hardcoded in the code, which means changes in defaults require an application upgrade.

Instead, O/R mappers should promote a factory pattern for constructing application objects. Factories themselves are constructed when the application is initialized, and are henceforth singletons within a given application instance. O/R mappers don't support or encourage factories or singletons in this manner however, as they always map a key/identifier, to an object instance. Factories are slightly different as they are generally singletons.

For example, let's assume we have a simple Product class:
public abstract class Product
{
int productId;
decimal price;
protected Product()
{
}
}
Now we have here a public constructor which requires a Quote object to initialize the base Product object. You can't sell 'abstract' products, so we need a concrete product, like a Table:
public class Table : Product
{
int length;
int width;
public Table(int length, int width) : base()
{
this.length = length;
this.width = width;
}
}
Of course, a Table with dimensions of 0'x0' is invalid, so we need to ensure that a Table is initialized with a proper length and width. We can pass in a pair of default dimensions when constructing a Table instance in a controller, but chances are the default values will be the same everytime you construct an instance of Table. So why duplicate all that code?

For instance, suppose we have another class "DiningSet" which consists of a Table and a set of Chairs. Do we call the Table constructor with the same default values within the DiningSet constructor?

Of course, many of you might now be thinking, "just create an empty constructor which invokes the parameterized constructor with the default values; done". All well and good because your language likely supports the int type very well. Now suppose that constructor needs an object that cannot be just constructed at will from within application code, such as an existing object in the database.

Enter factories:
public interface IProductFactory
{
Product Make();
}
public sealed class TableFactory : IProductFactory
{
int defaultLength;
int defaultWidth;
public Product Make()
{
return new Table(defaultLength, defaultWidth);
}
}
The IProductFactory abstract all factories which construct products. Any parameters that the base Product class accepts in its constructor are passed in to the Make() method, as this is shared across all Product Factories. TableFactory is mapped to a table with a single record containing the default length and width values. If the constructor requires an existing database object, this can be referenced via a foreign key constraint, and the O/R mapper will load the object reference and its dependencies for you.

Since factories are generally singletons, it would be nice if O/R mappers provided special loading functions:
public interface ISession
{
T Load<T>(object id);
T Singleton<T>();
}
This models and O/R mapper session interface after the one in NHibernate. Note that a special Singleton() method simply loads the singleton of the given type without needing an object identifier.

Our controller code is thus reduced to:
...
Product table = session.Singleton<TableFactory>().Make();
...
Which encapsulates all the constructor details in application objects, does not hardcode any default values since they live in the database and can be upgraded on the fly, isolates refactorings which alter the Table constructor interface to the TableFactory alone, and simplifies controller code as we don't need to load any objects. This is a "pure" object-oriented design, in that the application can almost bootstrap itself, instead of relying on its environment to properly endow it with "god-given" defaults.

This approach also enables another useful application pattern which I may describe in a future post.

[Edit: I've just realized that the above is misleading in some parts, so I'll amend soon. Singletons aren't needed as much as I suggest above.]

Tuesday, April 1, 2008

Permutations with Duplicates in C

Calculating permutations has some fairly nifty algorithms out there. I recently ran into a permutation problem for which I couldn't find an existing algorithm. I admit that I didn't look too hard though. Basically, I needed the permutations of the elements of a set of size N over K slots. However, the permutations should include duplicate elements from the set, as K > N is valid configuration. This corresponds to NK permutations. Most algorithms I found did not permit duplicate elements.

As an example of an application for such a permutation algorithm, imagine the set of all function signatures of arity K-1 over N types. This corresponds to K slots with N possible choices for each slot.

I devised a fairly simple implementation of such a permutation algorithm. Essentially, N forms the base of an arbitrary-precision integer of size K. In other words, we have an array of elements with a maximum of N which index our set. To permute, we simply increment the first element and propagate any overflow across the rest of the array. If the carry is 1 when we're done iterating over the whole array, then we're done generating permutations.

Calculating permutations has been reduced to counting! Here is the C code:
#include <stdio.h>

void print(const unsigned *slots, const unsigned K)
{
unsigned i;
for (i = 0; i < K; ++i) {
printf("%4d", slots[i] );
}
printf("\n");
}

unsigned incr(unsigned *slots, const unsigned K, const unsigned N) {
unsigned i, carry;
print(slots, K);
for (i=0, carry=1; i < K; ++i) {
unsigned b = slots[i] + carry;
carry = b/N;
slots[i] = b % N;
}
return !carry;
}

void count(const unsigned N, const unsigned K) {
unsigned i;
unsigned *slots = calloc(K, sizeof(unsigned));
while(incr(slots, K, N)) {
}
}

//output:
// 0 0 0
// 1 0 0
// 0 1 0
// 1 1 0
// 0 0 1
// 1 0 1
// 0 1 1
// 1 1 1
int main(int argc, char ** argv) {
count(2, 3);
getchar();
}

The only assumption is that N is less than UINT_MAX. Also, for some insane reason that I cannot fathom, I can't get the slots to print in reverse order. The trivial reversal of the print loop induces an access violation under Windows:
void print(const unsigned *slots, const unsigned K)
{
unsigned i;
for (i = K-1; i >= 0; --i) {
printf("%4d", slots[i] );
}
printf("\n");
}

The mind boggles.

[Edit: a commenter pointed out the problem, so the solution can be found in the comments.]