Wednesday, August 26, 2015

Algebra.NET: A Simple Algebra eDSL for .NET

Algebra.NET is a simple library designed to facilitate easy expression and manipulation of algebraic functions. For instance, here's a simple function:

Function<Func<double, double>> a = Algebra.Function(x => 2 * x + 1);

We can compile such a function to efficient IL:

Func<double, double> func = a.Compile("times2plus1");

Or we can apply some algebraic identities to rewrite it:

Identity associative = Algebra.Identity(x => x + 1 == 1 + x);
Identity mulEqAdd = Algebra.Identity(x => 2 * x == x + x);
Console.WriteLine(a);
Console.WriteLine(a.Rewrite(1, associative, mulEqAdd));

// Prints:
// ((2 * x) + 1)
// (1 + (x + x))

Rewrites can sometimes loop forever (consider "x + y == y + x"), so the Rewrite method takes a number indicating the maximum number of iterations to perform all the rewrites.

All the usual arithmetic operations are available, including an extension method for exponentiation:

var f = Algebra.Function(x => x.Pow(3));
Console.WriteLine(x);

// Prints:
// (x ^ (3))

Design

As of this writing, Algebra.NET is a functional example of a simple term rewriting system. Term rewriting is usually pretty awkward to express in an object-oriented language, and I banged my head against the keyboard to figure out a nice way to do it, until I hit on just doing unification (of course!).

So I reused the term language and added an equality operator to generate an identity that conceptually maps one term to another. I then perform unification on the left hand side, and generate a set of substitutions to transform the matching term into the right hand side of the identity.

It was ultimately quite simple, consisting of 3 methods on Term:

Term Rewrite(Identity e, Term[] bindings)
bool TryUnify(Term e, Term[] bindings)
Term Subsitute(Term[] bindings)

Rewrite tries to recursively unify the Identity's left hand side with the current term using TryUnify. On success, the 'bindings' array will have been populated by TryUnify with the substitutions to perform, so it substitutes the bindings into the identity's right hand side to generate the new term.

There are only 3 term types: constants, variables and binary operations. Negation is handled as a binary operation "0 - x" for simplicity. The unification methods on each of the term types are only a few lines of code each, but are quite powerful!

So if you want to understand expression compilation to CIL, unification, or term rewriting, this is pretty much as simple as it gets.

Algebra.NET doesn't perform any term simplification at this point, only term rewriting. Some rewrites may of course be simplifications, but a term like "0 - 3" will not be simplified to "-3".

Future Work

As mentioned, Algebra.NET doesn't perform simplification, so that's a big one. I started developing this to work on symbolic and automatic differentiation for numerical optimization problems. I'm aware of other .NET libraries for this, but I didn't like how clumsy it was to write algebraic expressions, nor did they have nice and extensible facilities for rewriting expressions. So I created this in my spare time and intended to continue fleshing it out as needed.

Monday, August 10, 2015

Sasa v0.17.0 Released

A few new features in this release, and a few bugfixes and enhancements in MIME parsing. As before, the online docs are available on my site, the full release including experimental assemblies on Sourceforge, and the production assemblies on Nuget. The changelog:

 * sasametal can now replace columns and FK associations with enums that are
   provided separately
 * sasametal can now output an interface file, which generates an interface
   describing the database entities
 * many MIME fixes thanks to a few bug reports on Sasa discussion list
 * added System.Data, which provides useful extensions to ADO.NET and
   IQueryable, like a simple interface for batched SQL queries
 * MIME parser now accepts an optional parsing parameter that controls various
   parsing options, like how strict e-mail address formatting accepted,
   and whether HTML views should be parsed into MailMessage.Body
 * added enum as a primitive to Sasa.Dynamics IReducer and IBuilder
 * fixed Sasa.Enums.IsSequential check

Sasa.Dynamics probably isn't used much, but in my experiments I found it increasingly important to handle enums as a primitive, so type reduction and construction (IReduce and IBuild respectively) now includes it as such.

I created sasametal to ease the pain of maintaining several legacy systems that depend upon Microsoft's Linq2Sql. It first renames some associations to make them more meaningful, since Linq2Sql does a bad job of this in some cases. This made some integrations with ORMs like EntityFramework and NHibernate a little easier, but there still remained some painful points.

One of these pains was handling enums, ie. an entity set whose members are effectively fixed. Sasametal now takes enums into account via a command-line option, /enum:[file path], that accepts a file listing table-to-enum rewrites, one per line. For instance, if an entity Foo had a StatusId column with an FooStatus enum for the status, we could call sasametal like so:

C:\somepath> sasametal ...[other option]... /code:FooDb.cs /enum:enums.txt

and the enums.txt contains merely:

dbo.FooStatuses=Namespace.Statuses.FooStatus

You should provide the full table name including schema that sqlmetal would generate, and provide the fully qualified name of the enum you want to substitute. Sasametal will then eliminate the FooStatus entity and any FK associations linked to that entity, then it will update the type of any properties for that foreign key to use the enum type. The entities generated by sasametal are thus somewhat more efficient, and I've found it easier to use Linq2Sql as-is for quick projects against legacy systems.

When combined with sasametal's other new feature, generating interfaces describing the entities, it also eases the transition to more sophisticated ORMs that already handle enums.

Monday, June 15, 2015

Sasa v0.16.1 Released

Just a bugfix release, mainly for the MIME parsing. Changelog:

* added support for 8bit transfer encoding, even if not supported by .NET <4.5
* support content types whose prologue is empty
* added support for arbitrarily nested multipart messages
* added alternative to Enum.HasFlag that invokes Sasa's faster enum code

As usual, online docs are available, or the a CHM file is available on Sourceforge.

Saturday, June 6, 2015

C# Enums with [Flags]

I've had numerous posts here describing instances where C# has come so close to getting it right, and yet misses the mark by an inch. The original C# enums have a simple semantics:

enum SomeEnum
{
  First,   // compiler implicitly assigns 0
  Second,  // compiler implicitly assigns 1
  Third,   // compiler implicitly assigns 2
  Fourth,  // compiler implicitly assigns 3
}

This worked nicely as a concise expression of a need for a set of distinct of values, but without caring what they are. C# later introduced the [Flags] attribute, which signals to the compiler that a particular enum isn't actually a set of disjoint values, but a set of bit flags. However, the compiler doesn't actually change its behaviour given this change of semantics. For instance, the following enum is completely unchanged, despite the semantically meaningful change to a set of bitwise flags:

[Flags]
enum SomeEnum
{
  First,   // compiler implicitly assigns 0
  Second,  // compiler implicitly assigns 1
  Third,   // compiler implicitly assigns 2
  Fourth,  // compiler implicitly assigns 3
}

SomeEnum.First is now not a valid flag, and SomeEnum.Fourth is now equivalent to (SomeEnum.Second | SomeEnum.Third). The native enum behaviour is useless as a set of enumerated flags.

I suspect most people would argue that if you're interested in bitwise flags, you generally need to explicitly specify what the flag values ought to be. I don't think this is true. In fact, the only places where this is true is where the flags are defined in an external system, for instance read/write flags when opening file streams.

Exactly the same argument could be levelled against the native enum behaviour too, but like native enums provide a default correct semantics for when you don't care, so flags should provide a default correct semantics for when you don't care.

Monday, April 13, 2015

Sasa v0.16.0 Released

Mainly a bugfix release, with only a few new features. As always, the docs are available here online, or as a compiled CHM file on sourceforge. Here's the changelog:

 * a few bugfixes to MIME parsing for header and word decoding (Thanks Evan!)
 * added combined SubstringSplit function for more space efficient parsing
 * explicitly parse e-mail addresses for more flexible address separators
 * NonNull now throws an exception if encapsulated value is null, which
   is used in the case when the instance is invalidly constructed (Thanks Mauricio!)
 * build now checks for the presence of the signing key and ignores it if
   it's not present
 * a more efficient Enum.HasFlag using codegen
 * added a new ImmutableAttribute which Sasa's runtime analyses respects
 * FilePath's encapsulated string now exposed as a property

Thanks to Evan/iaiken for a number of bug reports, and to Mauricio for feedback on the NonNull<T> pattern.

Monday, March 2, 2015

Sasa v0.15.0 Released

This release features a few new conveniences, a few bugfixes and a vector that's faster than any other I've managed to find for .NET. Here's the changelog since the last v0.14.0 release:

 * more elaborate benchmarking and testing procedures
 * added a minor optimization to purely functional queues that caches
   the reversed enqueue list
 * FingerTree was switched to use arrays as opposed to a Node type
 * optimized FingerTree enumerator
 * FingerTree array nodes are now reused whenever possible
 * Sasa.Collections no longer depends on Sasa.Binary
 * MIME message's body encoding is now defaulted to ASCII if none specified
 * added convenient ExceptNull to filter sequences of optional values
 * NonNull<T> no longer requires T to be a class type, which inhibited its
   use in some scenarios
 * fixed null error when Weak<T> improperly constructed
 * added variance annotations on some base interfaces
 * added a super fast Vector<T> to Sasa.Collections
 * sasametal: fixed member name generated for compound keys
 * Sasa's Option<T> now integrates more seamlessly in LINQ queries
   with IEnumerable<T> thanks to two new SelectMany overloads
 * added two new array combinators to extract values without exceptions
 * added a few efficient Enumerables.Interleave extensions to interleave
   the values of two or more sequences
 * decorated many collection methods with purity attributes
 * fixed a bug with the Enums<T>.Flags property

A forthcoming post on the recent optimizations I've applied to Sasa's immutable collections will show the dramatic performance when compared to Microsoft's and F#'s immutable collections. Sasa's vector is an order of magnitude faster than F#'s, and Sasa's trie is actually faster than the standard mutable System.Collections.Generic.Dictionary up to around 100 elements.

As usual, you can find the individual packages on nuget, or download the full release via Sourceforge. The documentation is available in full as a .CHM file from Sourceforge, or you can peruse it online.

Tuesday, February 24, 2015

µKanren.NET - Featherweight Relational Logic Programming in C#

The µKanren paper is a nice introduction to a lightweight logic programming language which is a simplification of the miniKanren family of languages. The existing µKanren implementation in C# was a translation from Scheme, and thus is verbose, untyped with lots of casts, and non-idiomatic. I also found most of the other Kanren implementations unnecessarily obscure, heavily relying on native idioms that aren't clear to newcomers.

uKanren.NET provides a clear presentation of the core principles of µKanren using only IEnumerable<T> and lambdas, showing that µKanren's search is fundamentally just a set of combinators for transforming sequences of states. The values of the sequence are sets of bound variables that satisfy a set of equations. For instance, given the following expression:

  Kanren.Exists(x => x == 5 | x == 6)

You can read it off as saying there exists an integer value to which we can bind variable x, such that x equals either 5 or 6 [1]. Solving this equation produces this sequence of values:

x[0] = 5,
x[0] = 6,

where each line represents a different value that satisfies the equation. These equations can be arbitrarily nested or chained using | and &, just like ordinary logical expressions, and µKanren will find a solution for all variables, if a solution exists. A further benefit of this implementation, is that you cannot accidentally mix up boolean equality expressions with relational logic expressions, because relational equality, conjunction and disjunction operators all operate on non-boolean types.

Logic Variables, Goals & States

We start with the simple core, which is just a logic variable identified uniquely by an integer, and with a convenient local name:

public sealed class Kanren
{
    internal int id;

    // the variable's public name, extracted from the delegate parameter
    public string Name { get; internal set; }

    public static Goal operator ==(object left, Kanren right)
    {
        return Equals(left, right);
    }

    public static Goal operator ==(Kanren left, object right)
    {
        return Equals(left, right);
    }

    public static Goal operator ==(Kanren left, Kanren right)
    {
        return Equals(left, right);
    }
    ...
}

µKanren.NET variables override the equality operators to generate "Goals". A Goal is represented by a delegate that takes a State and generates a sequence of States:

public delegate IEnumerable<State> Goal(State state);

A State is simply a set of variable bindings that satisfy the equations described by the Goal:

public sealed class State
{
    internal Trie<Var, Kanren> subs;
    internal int next = 0;
    internal Func<Goal> incomplete;

    internal Kanren Get(Var x)
    {
        Kanren v;
        return subs.TryGetValue(x, out v) ? v : null;
    }

    internal State Extend(Var x, Kanren v)
    {
        return new State
        {
            subs = subs.Add(x, v),
            next = next,
            incomplete = incomplete,
        };
    }

    internal State Next()
    {
        return new State
        {
            subs = subs,
            next = next + 1,
            incomplete = incomplete,
        };
    }
    ...
}

Each new variable introduced by Kanren.Exists increments the unique variable index we track in State.next, so each variable gets a unique number. Variable bindings are tracked in an immutable hash map provided by Sasa.Collections.Trie<T> (a fast immutable IDictionary).

µKanren Operators

There are 3 core operations on Goals, constructing a Goal with Kanren.Exists, and combining Goals using either disjunction or conjunction:

public sealed class Kanren
{
    ...
    public static Goal Exists(Func<Kanren, Goal> body)
    {
        var arg = body.Method.GetParameters()[0].Name;
        return state =>
        {
            var goal = body(new Kanren { id = state.next, Name = arg });
            return goal(state.Next());
        };
    }

    public static Goal Conjunction(Goal left, Goal right)
    {
        return state => left(state).SelectMany(x => right(x));
    }

    public static Goal Disjunction(Goal left, Goal right)
    {
        return state => left(state).Concat(right(state));
    }

    public static Goal Recurse(Func<Kanren, Goal> body, Kanren x)
    {
        return state => new[]
        {
            new State
            {
                subs = state.subs,
                next = state.next,
                incomplete = () => body(x)
            }
        };
    }
    ...
}

Kanren.Exists just creates a fresh variable using the next number in the sequence, and invokes the function body that produces the Goal. The Goal is then evaluated using the given State, but with an updated variable counter to ensure variables are unique.

Conjunction consists of equations like x == 6 & y == 7. For every State generated by the 'left' Goal, conjunction generates a set of States built from applying the 'right' Goal to that same state. So, x == 6 produces a binding assigning x to 6, which is then passed to the Goal for y == 7, which then adds a binding for y = 7. Thus, "x[0] = 6, y[1] = 7" is produced.

Disjunction consists of equations like x == 6 | x == 7. These Goals are independent and don't interfere with each other, and so any States generated by 'right' are simply appended to the States generated by 'left'. The 'left' Goal produces x = 6, and 'right' produce x = 7, so the result will be:

x[0] = 6,
x[0] = 7,

Because C# features eager evaluation, we have to handle recursion in Kanren expressions manually. We do this via Kanren.Recurse, which takes a Goal constructor and a kanren variable, and returns an "incomplete" State, which basically means a State that was only partially evaluated and may produce further states if you resume evaluation:

public sealed class State
{
    ...
    // True if this state is final, such that all bindings have values,
    // false if some computation remains to be done.
    public bool IsComplete
    {
        get { return incomplete == null; }
    }

    // Get the pairs of bound variables and their values.
    public IEnumerable<KeyValuePair<Kanren, object>> GetValues()
    {
        return subs;
    }

    // Return the remaining stream of states, if any.
    public IEnumerable<State> Continue()
    {
        if (IsComplete) throw new InvalidOperationException("State is complete.");
        return incomplete()(this);
    }
}

The Kanren client is currently responsible for handling incomplete States while iterating over the results. It's possible to handle this in a more uniform manner with a little more execution overhead, but uKanren.NET doesn't currently do so.

Unification

Finally, the real meat of logic programming is resolving equality between logic variables and values. Logic programming solves equations using an algorithm called "unification":

public sealed class Kanren
{
    ...
    static object Walk(object uvar, State env)
    {
        // search for final Kanren binding in env
        Kanren v;
        while (true)
        {
            v = uvar as Kanren;
            if (ReferenceEquals(v, null)) break;
            var tmp = env.Get(v);
            if (ReferenceEquals(tmp, null)) break;
            uvar = tmp;
        }
        return uvar;
    }

    static State Unify(object uorig, object vorig, State s)
    {
        var u = Walk(uorig, s);
        var v = Walk(vorig, s);
        var uvar = u as Kanren;
        var vvar = v as Kanren;
        if (!ReferenceEquals(uvar, null) && !ReferenceEquals(vvar, null)
            && uvar.Equals(vvar))
            return s;
        if (!ReferenceEquals(uvar, null))
            return s.Extend(uvar, v);
        if (!ReferenceEquals(vvar, null))
            return s.Extend(vvar, u);
        if (u.Equals(v))
            return s;
        return null;
    }
    ...
}

Unification brings together all equality declarations between variables and values and ensures they are all consistent. If two variables are unified, then they are either the same variable, or the variable values must unify. If a variable and value are unified, then that value is bound to that variable in the State by extending it. If two values are unified, then they must be equal to each other. If none of these conditions hold, then a null State is returned, indicating an inconsistent set of equality conditions.

Wrap Up

The real uKanren.NET implementation provides a few tweaks to interleave certain streams, and to provide convenient operators on Goals so you can easily combine more sophisticated expressions:

static Goal Fives(Kanren x)
{
    return x == 5 | Kanren.Recurse(Fives, x);
}

static Goal Sixes(Kanren x)
{
    return 6 == x | Kanren.Recurse(Sixes, x);
}

static Goal FivesAndSixes()
{
    return Kanren.Exists(x => Fives(x) | Sixes(x));
}

// output of FivesAndSixes:
// x[0] = 5,
// x[0] = 6,
// x[0] = 5, x[0] = 5, ... [infinite stream of fives]
// x[0] = 6, x[0] = 6, ... [infinite stream of sixes]

However, the implementation described above covers the core operation of µKanren, and the rest is just syntactic sugar to make it nice in C#.

[1] The "Exists" operator was called "call/fresh" in the original paper and in most other implementations. I found it easier to read off uKanren expressions as English with the form I settled on.