Monday, March 25, 2013

Sasa.Weak<T> - Typed Weak References

This is the ninth post in my ongoing series covering the abstractions in Sasa. Previous posts:

System.WeakReference is a special runtime class used for collection and finalization of resources that have no other live references. The only problem is that the encapsulated value is of type "object", and so using it requires a lot more casting than it should. More often than not, a WeakReference will only encapsulate a value of one type, so this casting is often superfluous.

Enter Sasa.Weak<T>, which is a struct that wraps WeakReference and provides a typed interface to encapsulated values. As a struct, it does not incur any additional memory allocation costs, and the casting it performs are operations you would likely have to do anyway, so the overhead of the typed interface is virtually nil. Sasa.Weak<T> is available in the core Sasa.dll assembly.

Sasa.Weak<T>.HasValue

Sasa.Weak<T>.HasValue has the same purpose as Weak<T>.IsAlive, but fulfills the requirements to satisfy the IResolvable<T> interface:

string original = new string("hello world!".ToCharArray());
Weak<string> foo = original;
Console.WriteLine(foo.HasValue);

// lose all live references and run collection
original = null;
GC.Collect();
GC.WaitForPendingFinalizers();
Console.WriteLine(foo.HasValue);

// outputs:
// true
// false

Sasa.Weak<T>.IsAlive

Sasa.Weak<T>.IsAlive checks whether the target of the weak reference has yet to be collected:

string original = new string("hello world!".ToCharArray());
Weak<string> foo = original;
Console.WriteLine(foo.IsAlive);

// lose all live references and run collection
original = null;
GC.Collect();
GC.WaitForPendingFinalizers();
Console.WriteLine(foo.IsAlive);

// outputs:
// true
// false

Sasa.Weak<T>.Value

Sasa.Weak<T>.Value obtains the current value encapsulated in the weak reference, or null if the object has been collected:

string original = new string("hello world!".ToCharArray());
Weak<string> foo = original;
Console.WriteLine(foo.Value);

// lose all live references and run collection
original = null;
GC.Collect();
GC.WaitForPendingFinalizers();
Console.WriteLine(foo.Value ?? "<null>");

// outputs:
// hello world!
// <null>

The Value property also fulfills the requirements for the IValue<T> interface.

Sasa.Weak<T>.TryGetValue

Sasa.Weak<T>.TryGetValue checks whether a weak reference to a target is alive, and returns a live reference to that target in one step:

string original = new string("hello world!".ToCharArray());
Weak<string> foo = original;
Console.WriteLine(foo.TryGetValue(out original));
Console.WriteLine(original);

// lose all live references and run collection
original = null;
GC.Collect();
GC.WaitForPendingFinalizers();
Console.WriteLine(foo.TryGetValue(out original));
Console.WriteLine(original ?? "<null>");

// outputs:
// true
// hello world!
// false
// <null>

This method also satisfies the requirements for the IVolatile<T> interface.

Sunday, March 24, 2013

Tabular v1.0 - Import/Export Tabular Data to Excel and CSV

Awhile ago, I wrote a small library for describing, importing, and exporting tabular data to and from Excel XML, and CSV formats. I just never got around to releasing it, but I'm making a more concerted effort recently to push releases forward. The full documentation is online, and the binaries here. The license is LGPL.

Describing Tabular Data

Tabular.dll is an assembly via which you can declaratively describe tabular data:

var table1 = new Table
{
    Name = "SomeTable",
    Rows =
    {
        new Row { 1, "foo", 5.7 },
        new Row { 2, "bar", 99.99M },
        new Row { 3, "baz", 0.0 },
    }
};

There are really only three classes of interest, Table, Row, and Cell. A table consists of a series of rows, a row consists of a series of cells, and each cell consists of a string value together with an alleged data type describing the string contents.

The DataType enumeration is the list of recognized data strings. Cell provides numerous implicit coercions from CLR types to Cell and sets the appropriate data type, so declarative tables are simple to describe as the above code sample demonstrates.

It's also quite simple to describe a table that's derive from some enumerable source:

IEnumerable list = ...
var table = new Table
{
    Name = "EnumerableTable",
    Bind = list.Select((x,i) => new Row { i, x })
}

You can also imperatively build up the table using the AddRow method.

Import/Export of CSV Data

Tabular.Csv.dll provides import and export of tabular data to CSV format:

var table1 = new Table
{
    Name = "SomeTable",
    Rows =
    {
        new Row { 1, "foo", 5.7 },
        new Row { 2, "bar", 99.99M },
        new Row { 3, "baz", 0.0 },
    }
};
var csv = Csv.ToCsv(table1, CsvFormat.Quoted);
Console.WriteLine(csv.Data);

// output:
// "1","foo","5.7"
// "2","bar","99.99"
// "3","baz","0"

var table2 = Csv.FromCsv(csv);
// table2 is equivalent to table1

The CsvFormat enumeration describes whether the CSV data should be formatted in the safer quoted data format, or in the less safe raw format.

Import/Export of Excel Data

Tabular.Excel.dll provide import and export features for Excel XML file format. Excel 2002 XML schema is used for simplicity. See the docs for the full details, but here's a relatively straightforward overview:

var table1 = new Table
{
    Name = "SomeTable",
    Rows =
    {
        new Row { 1, "foo", 5.7 },
        new Row { 2, "bar", 99.99M },
        new Row { 3, "baz", 0.0 },
    }
};
// generate in-memory XDocument that you can work with
XDocument excel = Excel.ToXml(table1);

// or write directly to stream
using (var stream = File.OpenWrite("foo.xml"))
{
    Excel.WriteTo(table1, stream);
}

// you can also import the xml data
Table table2 = Excel.FromXml(excel);

// or import by reading from a stream
using (var stream = File.OpenRead("foo.xml"))
{
    table2 = Excel.ReadFrom(stream);
}

Edit: just an update, I've uploaded the Tabular v1.0 release to NuGet.

Saturday, March 23, 2013

Sasa.Types - Runtime Types And CLR Metadata

This is the eighth post in my ongoing series covering the abstractions in Sasa. Previous posts:

Sasa.Types is a static class containing a number of extension methods on System.Type, together with extensions that mirror some of the CLR metadata instructions which aren't typically available in C#. It's available in the core Sasa.dll.

Sasa.Types.Constructor

Sasa.Types.Constructor accepts a lambda expression with a "new" expression designating a type's constructor. It then extracts and returns the ConstructorInfo used that expression:

struct Foo
{
   public Foo(int i)
   {
     ...
   }
}
...
var x = Types.Constructor(() => new Foo(3));
Console.WriteLine(x);
// output:
// Void .ctor(Int32)

Sasa.Types.Create

Sasa.Types.Create is a static method used to create a dynamic type in a dynamic assembly, often for code generation purposes. It automates various steps and provides a boolean parameter indicating whether to save the assembly to a file, so you can run verification passes on it:

var newType = Types.Create("TypeFoo",
                           saveAssembly:true,
                           generate: typeBuilder =>
{
    // see Microsoft's docs on TypeBuilder:
    // http://msdn.microsoft.com/en-us/library/system.reflection.emit.typebuilder.aspx
    ...
});

Sasa.Types.Field

Sasa.Types.Field are a set of extension methods used to extract FieldInfo metadata from a simple member access expression for that field:

struct Foo
{
   public int Instance;
   public static string Static;
}
...
var instanceField = Types.Field<Foo, int>(x => x.Instance);
var staticField = Types.Field<int>(() => Foo.Static);
Console.WriteLine(instanceField);
Console.WriteLine(staticField);
// output:
// Int32 Instance
// String Static

The first overload is for static field, and the second overload is for instance fields since it accepts an expression taking an instance and returning the field value.

The one caveat is that the C# can't enforce that you properly reference fields, or that you're using the write overload to access static vs. instance fields. Instance fields require an instance in order to reference them, so you must use the overload that accepts two generic arguments. Static fields only require the use of one generic argument.

Sasa.Types.FieldName

Sasa.Types.FieldName is an extension method on FieldInfo that extracts a "normalized" field name. By "normalized", I mean that if the field was a compiler-generated backing field for an auto property, then it will extract the property name. Otherwise, it will just return the field name itself:

public class Foo
{
    public int normalField;
    public int AutoProperty { get; set; }
}
var backingField = Types.Property<Foo, int>(x => x.AutoProperty)
                        .GetBackingField();
var normalField = Types.Field<Foo, int>(x => x.normalField);

Console.WriteLine(backingField.FieldName());
Console.WriteLine(normalField.FieldName());
// output:
// AutoProperty
// normalField

Sasa.Types.GetBackingField

Sasa.Types.GetBackingField is an extension method on PropertyInfo that attempts to extract the compiler-generated field metadata:

public class Foo
{
    public int AutoProperty { get; set; }
}
var backingField = Types.Property<Foo, int>(x => x.AutoProperty)
                        .GetBackingField();
Console.WriteLine(backingField.Name.FieldName());
Console.WriteLine(backingField.Name);
// output:
// AutoProperty
// <AutoProperty>k__BackingField

Note that this method currently depends on the naming convention used by the compiler, so it may not be 100% future-proof. If the convention ever does change, I anticipate updating this implementation to reflect that.

Sasa.Types.GetTypeTree

Sasa.Types.GetTypeTree is an extension method on System.Type that extracts the whole sequence of generic type instances, declarations and arguments that make up a type:

var args = typeof(Pair<int, Pair<string,char>>).GetTypeTree();
// args = new[]
// {
//     typeof(Pair<int, Pair<string, char>>),
//     typeof(Pair<,>),
//          typeof(int),
//          typeof(Pair<string, char>),
//          typeof(Pair<,>),
//              typeof(string), typeof(char)
// };

Sasa.Types.HasAutoField

Sasa.Types.HasAutoField is an extension method on PropertyInfo that checks whether a property is an auto-property with a compiler-generated backing field:

class Foo
{
    public int Bar
    {
        get { return 0; }
    }
    public int AutoProp { get; set; }
}
var autop = Types.Property<Foo, int>(x => x.AutoProp);
var normp = Types.Property<Foo, int>(x => x.Bar);

Console.WriteLine(autop.HasAutoField());
Console.WriteLine(normp.HasAutoField());
// output:
// true
// false

Sasa.Types.IsBackingField

Sasa.Types.IsBackingField is an extension method on FieldInfo that checks whether a field is a compiler-generated backing field for a property:

public class Foo
{
    public int normalField;
    public int AutoProperty { get; set; }
}
var backingField = Types.Property<Foo, int>(x => x.AutoProperty)
                        .GetBackingField();
var normalField = Types.Field<Foo, int>(x => x.normalField);

Console.WriteLine(backingField.IsBackingField());
Console.WriteLine(normalField.IsBackingField());
// output:
// true
// false

Sasa.Types.IsGenericTypeInstance

Sasa.Types.IsGenericTypeInstance is an extension method on System.Type that checks whether the Type instance is an instantiated generic type definition:

struct Foo { }
struct Foo<T> { }

Console.WriteLine(typeof(Foo).IsGenericTypeInstance());
Console.WriteLine(typeof(Foo<>).IsGenericTypeInstance());
Console.WriteLine(typeof(Foo<int>).IsGenericTypeInstance());

// output:
// false
// false
// true

Sasa.Types.Method

Sasa.Types.Method is a static method that extracts the MethodInfo of a delegate expression:

struct Foo
{
   public static int DoSomething(string x);
   public int Otherwise();
}
...
var doSomething = Types.Method<Func<string, int>>(() => Foo.DoSomething);
var otherwise = Types.Method<Func<Foo, Func<int>>>(x => x.Otherwise);

Console.WriteLine(doSomething);
Console.WriteLine(otherwise);

// output:
// Int32 DoSomething(String)
// Int32 Otherwise()

The second overload is for instance methods, so it accepts an expression that takes an instance, and returns a delegate to the method of interest.

Sasa.Types.Property

Sasa.Types.Property are a set of extension methods that extract the PropertyInfo of a property access expression:

public class Foo
{
    public static int StaticProperty { get; set; }
    public int AutoProperty { get; set; }
}
var sprop = Types.Property(() => Foo.StaticProperty);
var iprop = Types.Property<Foo, int>(x => x.AutoProperty);

Console.WriteLine(sprop);
Console.WriteLine(iprop);
// output:
// Int32 StaticProperty
// Int32 AutoProperty

The first overload is for static properties, and the second overload is for instance properties since it accepts an expression taking an instance and returning the property value.

Sasa.Types.ShortGenericType

Sasa.Types.ShortGenericType is a set of extension methods on System.Type that generates an abbreviated string describing a generic type, ie. assembly references omit versioning and public key information:

var nullableInt = typeof(int?);
var nullable = nullableInt.GetGenericTypeDefinition();
Console.WriteLine(nullable.ShortGenericType(nullableInt.GetGenericArguments()));
// output:
// [System.Nullable`1[[System.Int32, mscorlib]], mscorlib]]

Sasa.Types.ShortName

Sasa.Types.ShortName is a set of extension methods on System.Type that generates an abbreviated string describing any type, including generic types:

Console.WriteLine(typeof(int?).ShortName());
// output:
// [System.Nullable`[[System.Int32, mscorlib]], mscorlib]

Sasa.Types.Subtypes

Sasa.Types.Subtypes is a set of extension methods on System.Type that check subtyping relationships on runtime types and type arguments:

Console.WriteLine(typeof(int).Subtypes(typeof(object)));
Console.WriteLine(typeof(int).Subtypes<object>());
Console.WriteLine(typeof(int).Subtypes<string>());

// output:
// true
// true
// false

However, this check has an important limitation when dealing with type parameters. See Type.IsAssignableFrom.

C# for Haskell and ML Programmers

An interesting question was posed to /r/Haskell today: is there a quick intro to C# from Haskell programmers?

Well there are plenty of explanations of C#'s basic syntax, classes, structs, etc., but nothing that specifically addresses the functional mindset a Haskell programmer would be starting with, so I wrote a reply providing links to the various familiar concepts from functional programming found in C#, and described various caveats that might be surprising to a Haskell user. I'll reproduce the post here for posterity:

I think there's a reasonable C# subset for functional programming, so if you stick to that you should be able to pick it up relatively quickly. Read up on:

  • LINQ -- you can use the query comprehension syntax, or the regular first-class function syntax. The former is sugar for the latter.
  • Tuples
  • Lambdas and delegates (delegates are first-class functions)
  • Parametric polymorphism is known as generics. You can place generic parameters on methods/functions, and type declarations.
  • The standard System.Func* and System.Action* delegates. These are delegates with generic parameters. Most first-class functions you deal with will be one of these two classes of delegates.
  • Type constraints

Caveats:

  • lambdas and delegates are nominally typed, so you can't implicitly or explicitly coerce a delegate of one type into a delegate of a compatible signature, ie. System.Predicate<int> is signature compatible with System.Func<int, bool>, but they are not interconvertible without doing some magic like I do in my Sasa.Func.Coerce library function.
  • methods and delegates are multiparameter, not curried like in OCaml and Haskell, thus leading to all the Func* and Action* overloads.
  • The "void" return type is not a type, so it can't be used as a generic argument. Hence the need for all the Action* delegate types distinct from the Func* delegate types. Action* differ only in the fact that they return void.
  • generic parameters on methods are strictly more general than generic parameters on delegates (which are types). Method generics support first-class polymorphism, while type declaration generics do not.
  • classes are always implicitly option types, ie. they are nullable, while struct types always have a "valid" value, ie. are not nullable. Struct types are then useful for eliminating null reference exceptions in programs, as long the default struct value is meaningful.

Friday, March 22, 2013

Sasa.Strings - General String Extensions

This is the seventh post in my ongoing series covering the abstractions in Sasa. Previous posts:

The Sasa.Strings static class provides a number of useful extension methods on System.String, including slicing, simple tokenizing, conversions and formatting. It is available in the core Sasa.dll.

Sasa.Strings.Format

Sasa.Strings.Format are extension methods that wrap System.String.Format:

Console.WriteLine("i = {0}".Format(1234567));

string invariant = "{0:#,##0.00}".Format(CultureInfo.InvariantCulture,
                                         10000);
Console.WriteLine(invariant);

string de = "{0:#,##0.00}".Format(CultureInfo.GetCultureInfo("de-de"),
                                  10000);
Console.WriteLine(de);
// output:
// i = 1234567
// 10,000.00
// 10.000,00

Sasa.Strings.FromBase64

Sasa.Strings.FromBase64 is a convenient extension method for converting a Base64 encoded string into an ordinary Unicode string, with an optional encoding parameter used to interpret the data:

string encoded = "foo bar".ToBase64(); // utf-8
Console.WriteLine(encoded);
string decoded = encoded.FromBase64(encoded);
Console.WriteLine(decoded);
Console.WriteLine(decoded.ToBase64("utf-16"));
// output:
// Zm9vIGJhcg==
// foo bar
// ZgBvAG8AIABiAGEAcgA=

Sasa.Strings.HardWrapAt

Sasa.Strings.HardWrapAt is a set of extension methods that insert new lines directly at the index specified in a function parameter, without regard to words or other considerations, and returns an enumerable sequence of lines:

string text = "Lorem ipsum dolor sit amet, consectetur";
foreach (var line in text.HardWrapAt(8))
{
    Console.WriteLine(line);
}
// output:
// Lorem ip
// sum dolo
// r sit am
// et, cons
// ectetur

Sasa.Strings.IfNullOrempty

Sasa.Strings.IfNullOrEmpty are a set of overloaded extension methods that allow you to return a string or other type of input if the input string is null or empty:

string[] list = new string[] { null, "", "foo!" };
foreach(var x in list)
{
    Console.WriteLine(x.IfNullOrEmpty("null or empty!"));
}
// output:
// null or empty!
// null or empty!
// foo!

The overload with the generic type parameter simply calls ToString() on its argument. It was added simply to avoid the overhead of calling ToString on the second argument before knowing whether the first string was empty:

string[] list = new string[] { null, "", "foo!" };
foreach(var x in list)
{
    Console.WriteLine(x.IfNullOrEmpty(3.4));
}
// output:
// 3.4
// 3.4
// foo!

Sasa.Strings.IsNullOrEmpty

Sasa.Strings.IsNullOrEmpty is a convenient extension method that simply calls string.IsNullOrEmpty:

string[] list = new string[] { null, "", "foo!" };
foreach(var x in list)
{
    if (!x.IsNullOrEmpty())
        Console.WriteLine(x);
}
// output:
// foo!

Sasa.Strings.Lines

Sasa.Strings.Lines is an extension method that splits a string into a sequence of lines:

var lines = "foo\r\n \t bar\r\n\r\nend".Lines();
foreach (var x in lines)
{
    Console.WriteLine(x);
}
// output:
// foo
//    bar
// end

Sasa.Strings.Slice

Sasa.Strings.Slice is an extension method that complements string.Substring. Substring takes an index and the number of characters to extract, where Slice takes two indices:

string large = "This is a sample sentence.";
string firstWord = large.Slice(0, 3);
Console.WriteLine(firstWord);
// output:
// This

Sasa.Strings.SliceEqual

Sasa.Strings.SliceEquals takes an input string, an index, and a substring and checks whether the substring is in the input string at the given index:

string[] list = new string[] { "Flub", "Foo", "Baz_Foob", "Bar Foo!" };
foreach(var x in list)
{
    Console.WriteLine(x.SliceEquals(4, "Foo"));
}
// output:
// false
// false
// true
// true

Sasa.Strings.Split

Sasa.Strings.Split is a set of extension method wrappers for System.String.Split that accept StringSplitOptions:

string var csv = "foo,comma,bar,,baz";
foreach (var x in csv.Split(StringSplitOptions.RemoveEmptyEntries, ','))
{
    Console.WriteLine(x);
}
// output:
// foo
// comma
// bar
// baz

The overloads take a variable length list of either characters or strings to use for splitting, basically reversing the order in the base class libraries for convenience.

Sasa.Strings.ToBase64

Sasa.Strings.ToBase64 is a convenient extension method for converting a Unicode string into a Base64 encoded string, using an optional text encoding parameter. If no encoding provided, UTF-8 is the default:

string encoded = "foo bar".ToBase64(); // utf-8
Console.WriteLine(encoded);
string decoded = encoded.FromBase64(encoded);
Console.WriteLine(decoded);
Console.WriteLine(decoded.ToBase64("utf-16"));
// output:
// Zm9vIGJhcg==
// foo bar
// ZgBvAG8AIABiAGEAcgA=

Sasa.Strings.Tokenize

Sasa.Strings.Tokenize is an extension method that searches an input string for a list of string tokens, and returns a sequence of token positions found in the string:

var operators = "4 + 5 - 6^2 == x".Tokenize("+", "-", "^", "==");
foreach (var x in operators)
{
    Console.WriteLine("Found {0} at index {1}", x.Tok, x.Index);
}
// output:
// Found + at index 2
// Found - at index 6
// Found ^ at index 9
// Found == at index 12

Sasa.Strings.Words

Sasa.Strings.Words is an extension method that splits a string along whitespace boundaries:

string words = "foo bar\r\n\tbaz\rfoobar\nbarbaz".Words();
foreach (var x in words)
{
    Console.WriteLine(x);
}
// output:
// foo
// bar
// baz
// foobar
// barbaz

Sasa.Strings.WordWrapAt

Sasa.Strings.HardWrapAt is an extension method that inserts newlines at the whitespace boundary nearest but less than the provided column parameter. In other words, contrary to Strings.HardWrapAt which inserts the newline at the specified index, this method searches back for the nearest whitespace boundary less than the column boundary:

string text = "Lorem ipsum dolor sit amet, consectetur";
foreach (var line in text.WordWrapAt(8))
{
    Console.WriteLine(line);
}
// output:
// Lorem
//  ipsum
//  dolor
//  sit
//  amet,
//  consectetur

Sasa.Numbers - Generic Number Extensions

This is the sixth post in my ongoing series covering the abstractions in Sasa. Previous posts:

The Sasa.Numbers static class provides a number of useful extensions methods on all of the CLR's number types. It is available in the core Sasa.dll.

Sasa.Numbers.UpTo

Generating a sequence of numbers is a pretty common operation in programming, so there's a special overload extension called Sasa.Numbers.UpTo:

// iseq = 1, 2, 3
IEnumerable<int> iseq = 1.UpTo(end: 4);
// iseq2 = 1, 3, 5, 7
IEnumerable<int> iseq2 = 1.UpTo(end: 8, step: 2);
// dseq = 5.0, 5.5, 6.0, 6.5, 7.0
IEnumerable<double> dseq = 5.0.UpTo(end: 7.5, step: 0.5);

The simplest overload simply takes the inclusive lower and exclusive upper bounds for the sequence, and the second overload additionally takes a step size designating the increment between each number emitted.

These are defined as generic methods which make use of Sasa.Operators<T>, Sasa's generic operators class.

Sasa.Numbers.DownTo

Sasa.Numbers.DownTo is the complement to Sasa.Numbers.UpTo, where instead of generating a sequence of increasing numbers, it generates a sequence of decreasing numbers:

// iseq = 4, 3, 2
IEnumerable<int> iseq = 4.DownTo(end: 1);
// iseq2 = 8, 6, 4, 2
IEnumerable<int> iseq2 = 8.DownTo(end: 1, step: 2);
// dseq = 7.5, 7.0, 6.5, 6.0, 5.5
IEnumerable<double> dseq = 7.5.DownTo(end: 5.0, step: 0.5);

The simplest overload simply takes the inclusive upper and exclusive lower bounds for the sequence, and the second overload additionally takes a step size designating the decrement between each number emitted.

These are defined as generic methods which make use of Sasa.Operators<T>, Sasa's generic operators class.

Sasa.Numbers.Bound

The Sasa.Numbers.Bounds extension method ensures that a value falls between lower and upper inclusive limits:

int x = 3.Bound(0, 5);          // x=3
int y = 99.Bound(0, 5);         // y=5
double z = 3.0.Bound(4.4, 5.7); // z=4.4

This is defined as a generic method which makes use of Sasa.Operators<T>, Sasa's generic operators class.

Sasa.Numbers.Max

The Sasa.Numbers.Max extension method returns the larger between two arguments it's given:

int x = Numbers.Max(3, 88);        // x=88
decimal d = Numbers.Max(234M, 8M); // d=234M

This is defined as a generic method which makes use of Sasa.Operators<T>, Sasa's generic operators class.

Sasa.Numbers.Min

The complement to Max, the Sasa.Numbers.Min extension method returns the smaller of two arguments:

int x = Numbers.Min(3, 88);        // x=3
decimal d = Numbers.Min(234M, 8M); // d=8M

This is defined as a generic method which makes use of Sasa.Operators<T>, Sasa's generic operators class.

Sasa.Numbers.ToSentence

The Sasa.Numbers.ToSentence method converts an integer to its English sentence representation:

var list = new[] { 3, 99, -99, int.MinValue };
foreach (var x in list)
{
   Console.WriteLine(Numbers.ToSentence(x));
}
// outputs:
// Three
// Ninety Nine
// Minus Ninety Nine
// Minus Two Billion One Hundred Fourty Seven Million Four Hundred Eighty Three Thousand Six Hundred Fourty Eight

Tuesday, March 19, 2013

Sasa.Result - Handling Exceptional Values

This is the fifth post in my ongoing series covering the abstractions in Sasa. Previous posts:

Sasa.Result<T> encapsulating the results of a computation, either it's return value, or the error it generated, if any. For instance, you could spawn a thread to perform some calculation and deposit the result into an instance of Result<T>. Sasa.Result<T> is available in the core Sasa.dll.

Result<T> implements all the usual equality tests on itself, and on T, so you can perform direct comparisons to values.

Sasa.Result<T>.Value

The Sasa.Result<T>.Value property allows clients to obtain the value that was returned from the computation. If an error was instead generated, then this throws InvalidOperationException with the InnerException property set to the original exception:

Result<int> hasValue = 3;
Result<int> noValue = new Result<int>(new ArgumentException("somearg"));
Console.WriteLine(hasValue.Value);
Console.WriteLine(noValue.Value); //InvalidOperationException
// outputs:
// 3
// InvalidOperationException

This property is also required by IValue<T>, which Result<T> implements.

Sasa.Result<T>.HasValue

The Sasa.Result<T>.HasValue property allows clients to check whether the result is a legitimate value, or if it's an error result. If HasValue returns true, then clients can safely access the Value property. If it's false, then doing so will throw InvalidOperationException:

Result<int> hasValue = 3;
Result<int> noValue = new Result<int>(new ArgumentException("somearg"));
Console.WriteLine(hasValue.HasValue);
Console.WriteLine(noValue.HasValue);
// outputs:
// true
// false

This property is also required by the IOptional<T>, which Option<T> implements.

Sasa.Result<T>.Error

The Sasa.Result<T>.Error property allows clients to access the exception that resulted from the computation, with the full stack trace intact:

void EnsureValue<T>(Result<T> result)
{
  if (!result.HasValue)
      Environment.FailFast("Unexpected result!", result.Error);
}

This property is also required by IResult<T>, which Result<T> implements.

Sasa.Result<T>.TryGetValue

TryGetValue permits accessing the encapsulated Value without throwing an exception if the result was an error:

Result<int> someInt = 3;
int x;
if (someInt.TryGetValue(out x))
  Console.WriteLine(x);
else
  Console.WriteLine(someInt.Error);
 
Result<int> noInt = new ArgumentException("noInt");
if (noInt.TryGetValue(out x))
  Console.WriteLine(x);
else
  Console.WriteLine(noInt.Error);
// output is:
// 3
// System.ArgumentException: ...

This method is required by the IVolatile<T> interface, which Result<T> implements.

Sasa.Result<T>.| operator

The C# null coalescing operator, ??, is hard-coded to only apply to reference types or System.Nullable<T>. However, the short-circuiting logical-or operator can be overridden, and Result<T> does so to provide result-coalescing:

Result<int> someInt = 3;
Result<int> noInt = new ArgumentException("noInt");
Console.WriteLine(someInt || 99);
Console.WriteLine(noInt || 99);
// output is:
// 3
// 99

Sasa.Result<T>.Select

The Sasa.Result<T>.Select is the first in the series of methods used in the LINQ query pattern:

Result<int> someInt = 3;
Result<int> noInt = new ArgumentException("noInt");
Console.WriteLine(someInt.Select(x => x > 0));
Console.WriteLine(noInt.Select(x => x > 0));
// output is:
// true
// Result<ArgumentException>

There is also an implementation of Select for the more abstract IResult<T> interface, but because Result<T> is a struct this would incur too much unnecessary boxing, so we override the Select and SelectMany extension methods with instance methods.

Sasa.Result<T>.Where

Sasa.Result<T>.Where method implements the LINQ where clause for query patterns on Result<T>:

Result<int> someInt = 3;
Result<int> noInt = Result.Fail<int>(new NotSupportedException("noInt"));
Console.WriteLine(someInt.Where(x => x > 0));
Console.WriteLine(someInt.Where(x => x < 0));
Console.WriteLine(noInt.Where(x => x == 0));
// outputs:
// true
// Result<ArgumentException>
// Result<NotSupportedException>

Sasa.Result<T>.SelectMany

The most essential LINQ operator, the Sasa.Result<T>.SelectMany overloads implement chaining for result values:

var val = from y in 2.ToResult()
          from j in 7.ToResult()
          from x in (y + 2 + j).ToResult()
          select x;
Console.WriteLine(val);
 
var noval = from y in 2.ToResult()
            from j in Result.Fail<int>(new ArgumentException("j"))
            from x in (y + 2 + j).ToResult()
            select x;
Console.WriteLine(noval);
// output:
// 11
// Result<ArgumentException>

There is also an implementation of SelectMany for the more abstract IResult<T> interface, but because Result<T> is a struct this would incur too much unnecessary boxing, so we override the Select and SelectMany extension methods with instance methods.

Sasa.Result.Try

The Sasa.Result.Try static extension methods allow clients to execute some code within an exception handling block, which then returns the appropriate result for you, ie. either a value or an error:

Result<int> someInt = Result.Try(() => 3);
Result<int> noInt = Result.Try<int>(() => throw new Exception());
Console.WriteLine(someInt);
Console.WriteLine(noInt);
// output is:
// 3
// Result<Exception>

Sasa.Result.Fail

The static Sasa.Result.Fail method creates a result failure of the appropriate type with the given exception:

Result<int> fail = Result.Fail<int>(new ArgumentException("fail"));
Console.WriteLine(fail);
// outputs:
// Result<ArgumentException>

Sasa.Result.Success

The static Sasa.Result.Success method creates a successful result value of the given type:

Result<int> x = Result.Success(99);
Console.WriteLine(x);
// outputs:
// 99

Sasa.Result.ToResult

The Sasa.Result.ToResult extension method overloads provide some simple conversions from values into results:

Result<int> someInt = 3.ToResult();
Result<int> optInt = 3.ToOption().ToResult();
Result<IEnumerable<int>> intList = new int[] { 3, 99, 66 }.ToResult();

The last overload over IEnumerable is useful because such sequences are often lazily evaluated, which means they may have hidden errors that you will only incur while iterating. This overload then turns a lazy sequence of possible unsafe errors into a lazy sequence of exception-safe result types.

Monday, March 18, 2013

Sasa.Option - Handling Optional Values

This is the fourth post in my ongoing series covering the abstractions in Sasa. Previous posts:

Sasa.Option is an abstraction to deal with nullable/optional values. It is available in the core Sasa.dll. Reference types are already nullable, and structs have System.Nullable, so why write yet another abstraction to deal with optional values?

The answer is pretty simple: there is no other way to write a function whose generic arguments are clearly and plainly optional. This is partially complicated because C# doesn't consider type constraints when selecting method overrides, otherwise you could do something like this:

int Foo<T>(T nullable)
  where T : class
{
    ...
}
int Foo<T>(T notNullable)
  where T : struct
{
    ...
}

The nullable overload is clear from the type information, but this type information is never used and C# complains about ambiguous methods. Technically, you could simply write Foo like so:

int Foo<T>(T possiblyNull)
{
    if (possiblyNull == null)
        ...
    else
        ...
}

This will work even if you pass in Nullable<int> because a Nullable<int> can be compared to null, and so will take the proper branch. However, it's not at all clear from the type signature of Foo that possiblyNull is an optional Value. So Sasa.Option was created to easily specify this sort of contract, and have the contract enforced by the type system:

int Foo<T>(Option<T> possiblyNull)
{
    if (possiblyNull == null)
        ...
    else
        ...
}

Option<T> has safe implicit conversions from T, and comparisons to null are fully defined.

Sasa.Option<T>.Value

Sasa.Option encapsulates a Value that may or may not be there. The Value property of the Option<T> struct returns the encapsulated Value if it exists, or throws InvalidOperationException if no Value exists:

Option<int> someInt = 3;
Console.WriteLine(someInt.Value);
Option<int> noInt = null;
Console.WriteLine(noInt.Value); //InvalidOperationException
// output is:
// 3
// throws InvalidOperationException

This property is also required by IValue<T>, which Option<T> implements.

Sasa.Option<T>.HasValue

To check whether an Option<T> has a Value, just check the HasValue property as you would on a Nullable<T>:

Option<int> someInt = 3;
Option<int> noInt = null;
Console.WriteLine(someInt.HasValue);
Console.WriteLine(noInt.HasValue);
// output is:
// true
// false

This property is also required by the IOptional<T>, which Option<T> implements.

Sasa.Option<T>.TryGetValue

TryGetValue permits accessing the encapsulated Value without throwing an exception if the Value is null:

Option<int> someInt = 3;
int x;
if (someInt.TryGetValue(out x))
  Console.WriteLine(x);
else
  Console.WriteLine("null");

Option<int> noInt = null;
if (noInt.TryGetValue(out x))
  Console.WriteLine(x);
else
  Console.WriteLine("null");
// output is:
// 3
// null

This method is required by the IVolatile<T> interface, which Option<T> implements.

Sasa.Option<T>.| operator

The C# null coalescing operation, ??, is hard-coded to only apply to reference types or System.Nullable<T>. However, the short-circuiting logical-or operator can be overridden, and Option<T> does so to provide option-coalescing:

Option<int> someInt = 3;
Option<int> noInt = null;
Console.WriteLine(someInt || 99);
Console.WriteLine(noInt || 99);
// output is:
// 3
// 99

Sasa.Option.ToOption

The Sasa.Option.ToOption extension methods allow simple conversions to option types, given any Value T or any nullable struct Value T?:

Option<int> someInt = 3.ToOption();
Option<int> noInt = new int?().ToOption();
Console.WriteLine(someInt);
Console.WriteLine(noInt);
// output is:
// 3
// null

Sasa.Option.ToNullable

Given any Option<T>, where T is a Value type, we can easily convert this to a Nullable<T> type using the ToNullable extension method:

int? someInt = new Option<T>.ToNullable();
Console.WriteLine(someInt ?? 99);
// output is:
// 3

Sasa.Option.Try

The Sasa.Option.Try static methods enable clients to execute some piece of code that returns a Value into an Option<T>. If the delegate throws an exception, then Option<T>.None is returned:

Option<int> someInt = Option.Try(() => 3);
Option<int> noInt = Option.Try<int>(() => throw new Exception());
Console.WriteLine(someInt);
Console.WriteLine(noInt);
// output is:
// 3
// null

Sasa.Option.Do

Sasa.Option.Do is an extension method that performs an action specified by a delegate only if the Option<T> has a Value:

Action<int> print = Console.WriteLine;
Option<int> someInt = 3;
Option<int> noInt = null;
someInt.Do(print);
noInt.Do(print);
// output is:
// 3

Sasa.Option.Select

Sasa.Option.Select are extension methods implementing the LINQ select query pattern on Option<T> and Nullable<T>:

Option<int> someInt = 3;
Option<int> noInt = null;
Console.WriteLine(someInt.Select(x => x > 0));
Console.WriteLine(noInt.Select(x => x > 0));
// output is:
// true
// null

Sasa.Option.Where

Sasa.Option.Where extension methods implement the LINQ where clause for query patterns on Option<T> and Nullable<T>:

Option<int> someInt = 3;
Option<int> noInt = null;
Console.WriteLine(someInt.Where(x => x > 0));
Console.WriteLine(someInt.Where(x => x < 0));
Console.WriteLine(noInt.Where(x => x == 0));
// output is:
// true
// null
// null

Sasa.Option.SelectMany

The last essential LINQ operator, Sasa.Option.SelectMany implements all permutations of SelectMany with Nullable<T> and Option<T>, which enables you to use option values in queries:

var val = from y in new int?(2)
          from j in 7.ToOption()
          from x in (y + 2 + j).ToOption()
          select x;
Console.WriteLine(val);

var noval = from y in new int?(2)
            from j in new int?()
            from x in (y + 2 + j).ToOption()
            select x;
Console.WriteLine(noval);
// output:
// 11
// null

Sasa.Func - Type-Safe Delegate Combinators

This is the third post in my series of posts on useful Sasa abstractions:

This post will deal with a the Sasa.Func static class in the stand-alone core Sasa assembly. This core assembly is concerned mainly with addressing limitations in the core .NET base class libraries. For instance, it contains type-safe, null-safe and thread-safe event operations, extensions on IEnumerable, useful extensions to numbers, and so on.

Sasa.Func is particularly concerned with providing type-safe extensions on delegates. You can view the whole API online. Sasa.Func is available in the core Sasa.dll.

Sasa.Func.Id

The simplest starting point is Sasa.Func.Id. Use this method whenever you need a delegate that simply returns its argument. This is fairly common when using the System.Linq API. Usage:

int[][] nested = new int[][]
{
   { 0, 1, 2 },
   { 3, 4, 5 }, 
   { 6, 7, 8 },
};
IEnumerable<int> flattened = nested.SelectMany(Func.Id);

foreach (var x in flattened) Console.Write(" {0},"x);
// prints out:
//  0, 1, 2, 3, 4, 5, 6, 7, 8,

Sasa.Func.Create

Func.Create is a type-safe wrapper for Delegate.CreateDelegate. Usage:

class Foo
{
   public int Bar(int i);
}
...
var delegate1 = Func.Create<Func<int, int>>(
                new Foo(),
                typeof(Foo).GetMethod("Bar"));

However, Func.Create is strictly more powerful than Delegate.CreateDelegate because it addresses certain limitations of the CLR I discovered two years ago. It was previously impossible to create an open instance delegate to either virtual methods, or to generic interface methods. Func.Create handles both cases by automatically generating a small dispatch thunk which wraps the invocation for you, and returns a delegate of the appropriate type.

The example above uses reflection to access the method metadata, but Sasa does provide a type-safe way to obtain the MethodInfo without reflection via Sasa.Types. This will be covered in a future post.

Sasa.Func.Combine

Func.Combine is basically a type-safe wrapper for Delegate.Combine. Usage:

Action<int> delegate1 = i => Console.WriteLine("i = {0}", i);
Action<int> delegate2 = i => Console.WriteLine("i*2 = {0}", i * 2);
Action<int> combined = Func.Combine(delegate1, delegate2);
// invoking combined(3) prints out:
// i = 3
// i*2 = 6

Sasa.Func.Remove

Similar to Func.Combine, Func.Remove is a type-safe wrapper for Delegate.Remove. Usage:

Action<int> delegate1 = i => Console.WriteLine("i = {0}", i);
Action<int> delegate2 = i => Console.WriteLine("i*2 = {0}", i * 2);
Action<int> combined = Func.Remove(Func.Combine(delegate1, delegate2),
                                  delegate2);
// invoking combined(3) prints out:
// i = 3

Sasa.Func.AsFunc

One somewhat frustrating limitation of the CLR is that System.Void is not considered a value, and so cannot be used as a type parameter that is used in return position. So for instance, you can't create a Func<void>. This relegates void to second-class status, where all other types produce values as first-class citizens.

This effectively divides the logical space of function types into those that return void (System.Action), and those that return a value (System.Func), and you cannot mix the two. Every operation that abstracts over the return type must then be written twice: once for functions that return a value, and again for functions that return void.

Sasa.Func.AsFunc provides a wrapper around the various System.Action delegates, effectively transforming them into the corresponding System.Func instance with a return value of Sasa.Empty. Func.AsFunc is also an extension method on the System.Action overloads, to make this wrapping as concise as possible. Usage:

Action<int> delegate1 = i => Console.WriteLine("i = {0}", i);
Func<int, Empty> delegate2 = delegate1.AsFunc();
// invoking delegate2(3) prints out:
// i = 3

Sasa.Func.Getter

A somewhat recurring pattern in C# programming is generating a delegate to access the value of a property. It's a little wasteful to generate a whole new delegate that closes over an object instance and then accesses the property, considering the object already has a method getter, ie. for property Foo, the C# compiler generates a get_Foo method.

Sasa.Func.Getter allows you specify an expression naming a property, and will return a typed delegate to the direct method getter for that property. Usage:

struct Foo
{
    public int SomeInt { get; set; }
}
...
var getter = Func.Getter<Foo, int>(x => x.SomeInt);
var foosInt = getter(someFoo);

At the moment, the whole expression tree is generated every time this method is invoked, but a future extension to Sasa's ilrewriter will eliminate this entirely and generate direct operations on CIL metadata.

Sasa.Func.Setter

The dual to Sasa.Func.Getter, Sasa.Func.Setter obtains a typed delegate for the direct setter method of an object. Usage:

struct Foo
{
    public int SomeInt { get; set; }
}
...
var setter = Func.Setter<Foo, int>(x => x.SomeInt);
setter(someFoo, 99);

Similar to Sasa.Func.Getter, this will soon be inlined completely when using Sasa's ilrewriter.

Sasa.Func.Invoke

Sasa.Func.Invoke is a simple typed raw method invocation, ie. a typed equivalent of Delegate.DynamicInvoke. It's defined as an extension method on MethodInfo for clarity. Usage:

// SomeMethod is of type (string,float) -> int
MethodInfo someMethod = typeof(Foo).GetMethod("SomeMethod");
var arg1 = 3.4F;
var returnInt = someMethod.Invoke<int>("arg0", arg1);

Sasa.Func.Constant

Use Sasa.Func.Constant whenever you need a delegate that always returns a constant value, regardless of the parameter passed in. Usage:

int[] numbers = new int[] { 0, 1, 2, 3 };
var sameNumber = Func.Constant<int, int>(99);

foreach (var x in numbers.Select(sameNumber))
{
    Console.Write(" {0},", x);
}
// prints out
// 99, 99, 99, 99,

Sasa.Func.Open, Sasa.Func.OpenAction

A delegate designating Object.ToString can take two forms:

  1. The typical closed delegate has type System.Func<string> and encapsulates the reference to the object being converted to a string.
  2. The open instance delegate would have type System.Func<object, string>, so the object being converted to a string must be passed in each time.

Sasa.Func.Open and Sasa.Func.OpenAction methods serve the same purpose, namely to create a so-called open instance delegate, where the 'this' parameter is not encapsulated within the delegate itself, but is itself the first parameter passed to the delegate.

This allows you to reuse the same delegate multiple times on different objects without needing a different delegate for each object you want to convert to a string, or whatever other operation desired. This is also how efficient dispatch works in Sasa.Dynamics, ie. the cached delegate in Type<T>.Reduce(IReduce, T) is a statically computed, cached open instance delegate to the method that handles type T in the IReduce interface.

Usage:

class Foo
{
    public int Value { get; set; }
    public int Bar() { return Value; }
}
...
var foo = new Foo { Value = 3 };
Func<int> closed = foo.Bar;
Func<Foo, int> open = Func.Open<Foo, int>(closed);
var foo2 = new Foo { Value = 9999 };
Console.WriteLine(open(foo));
Console.WriteLine(open(foo2));
// prints:
// 3
// 9999

Sasa.Func.VOpen, Sasa.Func.VOpenAction

These methods serve the same purpose as Sasa.Func.Open and Sasa.Func.OpenAction above, but they operate directly on value types (hence the V prefix). As before, the first parameter to a method is always a reference to the object being manipulated, but structs aren't reference types, so a method for struct T actually accepts a "ref T" as its first argument. Thus, open instance delegates that modify their struct argument must have a different signature, namely that of Sasa.VAction [1, 2, 3, 4] and Sasa.VFunc [1, 2, 3, 4] all of which take a "ref T" as the first argument.

Sasa.Func.VOpen creates open instance delegates that return values, ie. Sasa.VFunc, and Sasa.Func.VOpenAction creates open instance delegates that return void, ie. Sasa.VAction. Usage:

struct Foo
{
    public int Value { get; set; }
    public void Double() { Value *= 2; }
}
...
var foo = new Foo { Value = 3 };
Action closed = foo.Double;
VAction<Foo> open = Func.VOpen<Foo>(closed);
var foo2 = new Foo { Value = 444 };
open(ref foo);
open(ref foo2);
Console.WriteLine(foo.Value);
Console.WriteLine(foo2.Value);
// prints:
// 6
// 888

I'm not entirely satisfied with the naming of Sasa.Open, Sasa.VOpen, Sasa.OpenAction, and Sasa.VOpenAction, so I'm very open to suggestions. Part of the problem is that overload resolution does not take type constraints into account, so even though Open and VOpen have constraints T:class and T:struct respectively, they need a different name or the compiler complains of ambiguous methods. We also seem to need different names for Open/OpenAction or there is another ambiguity as to whether we want a delegate that returns a value, or that returns void.

Sasa.Func.Operator

Sasa.Func.Operator creates a delegate for the operator provided:

var add = Func.Operator<Func<int, int, int>>(Operator.Add);
Console.WriteLine(add(1, 2));
// output: 
// 3

If the types involved are all primitives and so have no operators, then a function is dynamically generated with the corresponding CIL instructions for the operator. Sasa.Func.Operator is the main function powering Sasa.Operator<T>, Sasa.Operators<T0, T1>, and Sasa.Operators<T0, T1, T2>.

Sasa.Func.Fix

Sasa.Func.Fix are a set of overloaded methods to generate recursive lambdas. Delegates built from lambdas can't refer to themselves to make recursive calls. Sasa.Func.Fix addresses this by providing what's known as a fixpoint function. Usage:

// equivalent recursive function:
// int Sum(int x) { return x > 0 ? x + Sum(x-1) : 0; }
var f1 = Func.Fix<int, int>(
         f => x => x > 0 ? x + f(x - 1) : 0);
Console.WriteLine(f1(2));
Console.WriteLine(f1(5));
// prints
// 3
// 15

Sasa.Func.Coerce

The type language for delegates is somewhat limited compared to the type language for interfaces and classes. For instance, interface methods support first-class polymorphism, where delegates do not. Combined with the fact that delegates are nominal types in their own right, this causes a proliferation of delegate types that are identical in type signature, but differ in nominal type and so cannot be substituted for each other. For instance, System.Predicate<T> is equivalent to System.Func<T, bool>, but you cannot use a delegate of one type in a place where the other delegate type is expected.

The Sasa.Func.Coerce extension method allows you to coerce one delegate type into another equivalent type. Usage:

Predicate<int> isthreep = x => x == 3;
Func<int, bool> isthreef = isthreep.Coerce<Func<int, bool>>();
Console.WriteLine("{0}, {1}", isthreep(3), isthreef(3));
Console.WriteLine("{0}, {1}", isthreep(4), isthreef(4));
// prints:
// true, true
// false, false

Sasa.Func.Generate

The Sasa.Func.Generate method overloads are the real workhorses behind the scenes. These methods eliminate all the boilerplate in generating and debugging a DynamicMethod, and is statically typed in the delegate type to generate. The most useful overloads by far are the ones that accept a boolean "saveAssembly" parameter. If saveAssembly:true is passed in, then the method is generated in a newly created assembly that is written to disk. You can then run peverify on it to check for any verification errors in your dynamically generated code. A single change to this variable can switch between debugging and production modes.

Func.Generate is used throughout Sasa, even in Sasa.Func to create the thunks to dispatch open instance delegates for virtual methods and generic interface methods. Usage:

Func<int, int> addone = Func.Generate<Func<int, int>>(
                         type: typeof(int),
                         methodName: typeof(int).Name + "_addone",
                         generator: il =>
{
    il.Emit(OpCodes.Ldarg_0);
    il.Emit(OpCodes.Ldc_I4_1);
    il.Emit(OpCodes.Add);
    il.Emit(OpCodes.Ret);
});
Console.WriteLine("{0}, {1}, {2}", addone(0), addone(9), addone(998));
// prints
// 1, 10, 999

Friday, March 15, 2013

Sasa.Parsing - An Overview

Continuing my series of posts describing the new features in Sasa v0.9.4, I'm going to cover the new Sasa.Parsing.dll assembly. As the name implies, the assembly provides abstractions to define grammars and build parsers.

Sasa.Parsing.Lexing is the namespace that provides abstractions for lexical analysis. Sasa.Parsing.Pratt is the namespace that provides an abstraction to specify a grammar that will be lexed and parsed by a Pratt parser, aka a top-down operator precedence parser.

Sasa.Parsing.Lexing

Understanding lexing in Sasa starts with a very simple lexing interface that is general enough to tokenize pretty much any sort of string:

/// <summary>
/// Abstract lexer construction interface.
/// </summary>
/// <typeparam name="T">The type of a built lexer.</typeparam>
/// <typeparam name="TLexerState">The lexer state.</typeparam>
public interface ILexerBasic<T, TLexerState>
    where TLexerState : ILexerState, new()
{
    /// <summary>
    /// Lex nothing.
    /// </summary>
    /// <returns>
    /// A lexer that matches nothing, but returns trivially true as
    /// matched.
    /// </returns>
    T Empty { get; }

    /// <summary>
    /// Lex a character terminal.
    /// </summary>
    /// <param name="character">The character to match.</param>
    /// <returns>
    /// A lexer that matches <paramref name="character"/>.
    /// </returns>
    T Terminal(char character);

    /// <summary>
    /// Choose amongst a set of possible lexes.
    /// </summary>
    /// <param name="lexers">The list of lexers to apply.</param>
    /// <param name="merge">
    /// A function that collapses the set of lexes.
    /// </param>
    /// <returns>
    /// A lexer that matches given any of <paramref name="lexers"/>.
    /// </returns>
    T Alternate(LexMerge<TLexerState> merge, params T[] lexers);

    /// <summary>
    /// Compose a series of lexers.
    /// </summary>
    /// <param name="lexers">The list of lexers that must match, in
    /// sequence.</param>
    /// <returns>
    /// A lexer that matches an ordered sequence of lexers.
    /// </returns>
    T Sequence(params T[] lexers);

    /// <summary>
    /// Construct a recursive lexer.
    /// </summary>
    /// <param name="lexer">A function that builds a recursive lexer
    ///  given the current lexer.</param>
    /// <returns>A recursive lexer.</returns>
    T Recursive(T lexer);

    /// <summary>
    /// Applies a tag to the lexer state if lexing succeeds.
    /// </summary>
    /// <param name="identifier">
    /// The identifier to associate with the lexing rule.
    /// </param>
    /// <param name="lexer">The lexer to wrap.</param>
    /// <returns>
    /// A lexer that sets the identifier on success.
    /// </returns>
    T Tag(string identifier, T lexer);
}

The basic interface defines the operations from which you can construct a lexer that can recognize individual characters, compose a fixed sequence of lexers, choose from a fixed set of alternate lexers, or recursively apply a lexer until it terminates. The "Tag" operation simply adds a descriptive name to the lexer for debugging purposes. This interface is sufficient to lex pretty much any sort of token stream you'd need to, but won't necessarily build the most efficient lexer for complex strings.

The Alternate operation additionally accepts a "merge" parameter of the following delegate type:

/// <summary>
/// A lex merging function used to process all alternates using
/// only the stack.
/// </summary>
/// <typeparam name="T">The type of lexer state.</typeparam>
/// <param name="root">The actual lexer state returned.</param>
/// <param name="original">
/// The original lexer state before applying alternates.
/// </param>
/// <param name="current">
/// The lexer state from the current alternate.
/// </param>
public delegate void LexMerge<T>(ref T root, T original, ref T current)
    where T : ILexerState;

The Alternate lexer processes a series of possible lexes, and then invokes the LexMerge function to choose among them. LexMerge can keep all the lexes and the parser would return a parse forest, or it can apply some set of rules to choose the best lex to use, eg. choose longest match, or enforce only a single match.

We add efficiency to the basic lexing interface with the extended lexer interface:

/// <summary>
/// An extended lexer interface providing complex, performance-optimized
/// lexer combinators.
/// </summary>
/// <typeparam name="T">The type of the built lexer.</typeparam>
/// <typeparam name="TState">The lexer state.</typeparam>
public interface ILexer<T, TState> : ILexerBasic<T, TState>
    where TState : ILexerState, new()
{
    /// <summary>
    /// Lex a string terminal.
    /// </summary>
    /// <param name="x">The string literal to match.</param>
    /// <returns>A lexer that matches <paramref name="x"/>.</returns>
    /// <remarks>
    /// This is merely a performance optimization. The following code
    /// snippet demonstrates the equivalence:
    /// <code>
    /// Literal("foo") == ILexerBasic.Sequence(
    ///                   ILexerBasic.Terminal('f'), 
    ///                   ILexerBasic.Terminal('o'), 
    ///                   ILexerBasic.Terminal('o'))
    /// </code>
    /// </remarks>
    T Literal(string x);

    /// <summary>
    /// Lex a character range.
    /// </summary>
    /// <param name="lower">
    /// The inclusive lower bound on the character range.
    /// </param>
    /// <param name="upper">
    /// The inclusive upper bound on the character range.
    /// </param>
    /// <returns>A lexer that matches a character range.</returns>
    /// <remarks>
    /// This is functionally equivalent to the base lexer interface as 
    /// follows:
    /// <code>
    /// Range('A', 'z') == ILexerBasic.Alternate(
    ///                    ILexerBasic.Terminal('A'),
    ///                    ILexerBasic.Terminal('B'),
    ///                    ...,
    ///                    ILexerBasic.Terminal('z'));
    /// </code>
    /// </remarks>
    T Range(char lower, char upper);

    /// <summary>
    /// Match a character using a predicate.
    /// </summary>
    /// <param name="matches">The predicate to check.</param>
    /// <returns>A lexer that matches the given predicate.</returns>
    /// <remarks>
    /// Whatever predicate is provided can be matched by a composition
    /// of the base lexer combinators.
    /// </remarks>
    T Where(Predicate<char> matches);

    /// <summary>
    /// A lexer that repeatedly attempts to match an underlying lexer.
    /// </summary>
    /// <param name="lexer">The underlying lexer.</param>
    /// <returns>A lexer that matches one or more times.</returns>
    /// <remarks>
    /// This is equivalent to:
    /// <code>
    ///  OneOrMore(X) ==  ILexerBasic.Recursive(x =>
    ///                   ILexerBasic.Sequence(X, x))
    /// </code>
    /// </remarks>
    T OneOrMore(T lexer);

    /// <summary>
    /// A lexer that repeatedly attempts to match an underlying lexer.
    /// </summary>
    /// <param name="lexer">The underlying lexer.</param>
    /// <returns>A lexer that matches zero or more times.</returns>
    /// <remarks>
    /// This is equivalent to:
    /// <code>
    /// ZeroOrMore(X) == ILexerBasic.Recursive(x =>
    ///                  ILexerBasic.Alternate(ILexerBasic.Empty, X, x))
    /// </code>
    /// </remarks>
    T ZeroOrMore(T lexer);

    /// <summary>
    /// A lexer that matches any one of a number of characters.
    /// </summary>
    /// <param name="chars"></param>
    /// <returns></returns>
    /// <remarks>
    /// This is equivalent to:
    /// <code>
    /// Any(c0, c1, ... cN) == ILexerBasic.Alternate(
    ///                        ILexerBasic.Terminal(c0),
    ///                        ILexerBasic.Terminal(c1),
    ///                        ...,
    ///                        ILexerBasic.Terminal(cN))
    /// </code>
    /// </remarks>
    T Any(string chars);

    /// <summary>
    /// A lexer that succeeds whether or not it matches.
    /// </summary>
    /// <param name="lexer">The lexer to apply.</param>
    /// <returns>An optional lexer.</returns>
    /// <remarks>
    /// This is equivalent to:
    /// <code>
    /// Optional(X) == ILexerBasic.Alternate(ILexerBasic.Empty, X)
    /// </code>
    /// </remarks>
    T Optional(T lexer);

    /// <summary>
    /// Lex using a regular expression.
    /// </summary>
    /// <param name="format">The regex to use.</param>
    /// <param name="options">The regex options.</param>
    /// <returns>
    /// A lexer that scans using a regular expression.
    /// </returns>
    T Regex(string format, RegexOptions options = RegexOptions.None);
}

These extended lexers don't add any lexing power, since they can be implemented in terms of the basic lexer operations as the code comments show. However, the implementations were added to optimize performance.

Given any string, here are some examples of building lexers:

ILexer<T, TState> x = ...
var whitespace   = x.OneOrMore(x.Where(char.IsWhiteSpace));
var digits       = x.OneOrMore(x.Where(char.IsDigits));

// the following recognizes valid variable names in some programming
// language ie. variables must start with a letter, followed by a
// sequence of letters or numbers
var variableName = x.Sequence(x.Where(char.IsLetter),
                              x.ZeroOrMore(x.Where(char.IsLetterOrDigit)))

In general, the lexer takes a string as input and produces a series of outputs designating the tokens. For more detailed information, you can review ILexerState and the lexer implementation details, but for the most part you will never need to know these details since lexing is completely abstract when defining grammars.

Sasa.Parsing.Pratt

Having covered lexing and produced a stream of tokens, we can now move to parsing those tokens using a simple, yet efficient Pratt parser. Pratt parsing performs no backtracking and is technically Turing-complete, so you should be able to parse as complicated a token stream as you like. However, we can start simply with the canonical calculator example. Here's an interface that defines the semantics of simple arithmetic:

/// <summary>
/// The semantics of a math language.
/// </summary>
/// <typeparam name="T">The type of math values.</typeparam>
interface IMathSemantics<T>
{
    T Int(string lit);
    T Add(T lhs, T rhs);
    T Sub(T lhs, T rhs);
    T Mul(T lhs, T rhs);
    T Div(T lhs, T rhs);
    T Pow(T lhs, T rhs);
    T Neg(T arg);
    T Pos(T arg);
    T Fact(T arg);
}

This clearly defines simple arithmetic operations on an abstract number type T. Here's a simple implementation for T = int:

/// <summary>
/// A simple arithmetic interpreter.
/// </summary>
sealed class MathInterpreter : IMathSemantics<int>
{
    public int Int(string lit) { return int.Parse(lit); }
    public int Add(int lhs, int rhs) { return lhs + rhs; }
    public int Sub(int lhs, int rhs) { return lhs - rhs; }
    public int Mul(int lhs, int rhs) { return lhs * rhs; }
    public int Div(int lhs, int rhs) { return lhs / rhs; }
    public int Pow(int lhs, int rhs) { return (int)Math.Pow(lhs, rhs); }
    public int Neg(int arg) { return -arg; }
    public int Pos(int arg) { return arg; }
    public int Fact(int arg)
    {
        return arg == 0 || arg == 1 ? 1 : arg * Fact(arg - 1);
    }
}

There shouldn't be anything too surprising here, so let's proceed to defining the parser for this math language. We start defining a grammar by inheriting from Grammar<T>:

/// <summary>
/// The structure of abstract arithmetic parsers.
/// </summary>
/// <typeparam name="T">The type of element being parsed.</typeparam>
class MathGrammar<T> : Sasa.Parsing.Pratt.Grammar<T>

Grammar inherits from a concrete implementation of ILexer, and thus exposes all the lexing operations, and additionally some operations for declaring parsing rules. Here's the full math parser:

/// <summary>
/// The structure of abstract arithmetic parsers.
/// </summary>
/// <typeparam name="T">The type of element being parsed.</typeparam>
class MathGrammar<T> : Sasa.Parsing.Pratt.Grammar<T>
{
    public MathGrammar(IMathSemantics<T> math)
       : base(Lexers.LongestMatch)
    {
        Infix("+", 10, math.Add);   Infix("-", 10, math.Sub);
        Infix("*", 20, math.Mul);   Infix("/", 20, math.Div);
        InfixR("^", 30, math.Pow);  Postfix("!", 30, math.Fact);
        Prefix("-", 100, math.Neg); Prefix("+", 100, math.Pos);

        Group("(", ")", int.MaxValue);
        Match("(digit)", OneOrMore(Where(char.IsDigit)), 0, math.Int);
        Skip("(whitespace)", OneOrMore(Where(char.IsWhiteSpace)));
    }
}

The MathGrammar accepts a math interpreter as an argument, and while declaring various parsing rules, registers callbacks into the interpreter to process the parsed values. The base interface of Sasa.Parsing.Pratt.Grammar accepts a LexMerge function to choose among alternate lexes. Sasa parsing provides a number of choices, though longest match will probably meet the most common needs.

Proceeding with operator declarations, here's the signature for the Infix operators:

/// <summary>
/// Left-associative infix symbol.
/// </summary>
/// <param name="op">The operator symbol.</param>
/// <param name="bindingPower">The binding power of the operator.</param>
/// <param name="selector">
/// The function transforming the infix token.
/// </param>
/// <returns>A right-associative operator symbol.</returns>
Rule<T> Infix(string op, int bindingPower, Func<T, T, T> selector)

Rule<T> is a parsing rule that encapsulates the lexer, the precedence of the operator, and the function to apply when this rule is triggered. The expanded declaration for the addition operator should now be clear:

Infix(op: "+", bindingPower: 10, selector: math.Add);

The "op" parameter is simply a shortcut that declares the operator symbol to associate with this rule, and creates a lexer that recognizes only that symbol. The parser triggers this rule when it recognizes a token stream where this symbol appears in infix position, ie. where this symbol is surrounded by two valid parses.

The "bindingPower" parameter is the absolute precedence of the declared operator. You can see from the math grammar above, that addition has lower precedence than multiplication and equal precedence to subtraction, just as we've all learned in elementary arithmetic. Given the definition of Infix, the Postfix and Prefix should be obvious as declaring postfix and prefix operators. Infix is left-associative by default, and InfixR is simply Infix with a right-associative default.

The Group rule is defined as follows:

/// <summary>
/// A grouping operator, typically parenthesis of some sort.
/// </summary>
/// <param name="bindingPower">The binding power of the operator.</param>
/// <param name="leftGrouping">The left grouping symbol.</param>
/// <param name="rightGrouping">The right grouping symbol.</param>
/// <returns>A grouping operator.</returns>
Rule<T> Group(string leftGrouping, string rightGrouping, int bindingPower)

It simply declares that a token stream that starts with "leftGrouping", followed by some other parse, and terminated by "rightGrouping", should trigger this rule with the given precedence. Brackets have the highest precedence of all symbols in arithmetic, so the math parser declares groupings with precedence = int.MaxValue.

Grammar's Match rule is where the parsing all starts, since it recognizes and parses numbers. Here is the signature:

/// <summary>
/// A rule that matches a custom lex.
/// </summary>
/// <param name="id">The identifier for this symbol.</param>
/// <param name="lex">The lexer for this rule.</param>
/// <param name="bindingPower">Operator's left binding power.</param>
/// <param name="parse">
/// Parsing function taking a string to a <typeparamref name="T"/>.
/// </param>
/// <returns>Literal symbol.</returns>
Rule<T> Match(string id, TLexerState lex, int bindingPower,
                    Func<string, T> parse)

The "id" parameter is simply a unique identifier for this rule, similar to how tagging is used for identifying lex rules. The Match rule accepts an arbitrary lexer as a parameter so you can recognize any sort of token sequence as belonging to this rule. The "bindingPower" is the precedence of this rule, as before, and the "parse" parameter is where we actually take the string recognized by the lexer, and turn it into a real value produced by the parser.

The only values we're concerned with are numbers, so the declaration for parsing numbers should be clear:

Match("(digit)", OneOrMore(Where(char.IsDigit)), 0, math.Int);

This declares a rule called "(digit)" with a lexer that recognizes characters that are digits. The rule has precedence of 0, so it has the lowest of all the rules in math. When a number is recognized, it then dispatches into the math interpreter which can turn a digit string into a value of type T, which are the values operated on by that interpreter.

What's particularly noteworthy in this design is that the types of values produced by the parser are completely abstract, and so we don't even need to produce actual numbers for values. For instance, we can create an implementation of IMathSemantics that produces an abstract syntax tree describing the expression. Furthermore, we can actually extend the parser definition more or less arbitrarily to add new operations, new parseable values, etc.

Extensible Parsers

Starting with IMathSemantics, we can easily extend the math language by inheriting from IMathSemantics and adding variable declarations and applications:

/// <summary>
/// The semantics of an equation language.
/// </summary>
/// <typeparam name="T">The type of equation values.</typeparam>
interface IEquationSemantics<T> : IMathSemantics<T>
{
    T Var(string name);
    T Let(T x, T value, T body);
}

Note that the type T must be able to support both values and bindings to values, so we can't provide an implementation of math semantics where T = int. We must add a notion of an environment into which bindings are injected. This is pretty straightforward. Here's the type we'll use to unify numbers and environments:

delegate Pair<Env<string, int>, int> Eval(Env<string, int> env);

So T will be a delegate that accepts an environment as a parameter, and produces a new environment and an integer value as a result. Here's the corresponding implementation of IEquationSemantics:

class EquationSemantics : IEquationSemantics<Eval>
{
    public Eval Var(string name)
    {
       return e => Tuples.Create(e.Bind(name, 0),
                      e.FirstOrDefault(x => x.Key == name).Value);
    }
    public Eval Int(string lit)
    {
        return e => Tuples.Create(e, int.Parse(lit));
    }
    public Eval Add(Eval lhs, Eval rhs)
    {
        return e => Tuples.Create(e, lhs(e).Second + rhs(e).Second);
    }
    public Eval Sub(Eval lhs, Eval rhs)
    {
        return e => Tuples.Create(e, lhs(e).Second - rhs(e).Second);
    }
    public Eval Mul(Eval lhs, Eval rhs)
    {
        return e => Tuples.Create(e, lhs(e).Second * rhs(e).Second);
    }
    public Eval Div(Eval lhs, Eval rhs)
    {
        return e => Tuples.Create(e, lhs(e).Second / rhs(e).Second);
    }
    public Eval Pow(Eval lhs, Eval rhs)
    {
        return e => Tuples.Create(e, (int)Math.Pow(lhs(e).Second,
                                        rhs(e).Second));
    }
    public Eval Neg(Eval arg)
    {
        return e => Tuples.Create(e, -arg(e).Second);
    }
    public Eval Pos(Eval arg)
    {
        return e => Tuples.Create(e, arg(e).Second);
    }
    int Fact(int arg)
    {
        return arg == 0 || arg == 1 ? 1 : arg * Fact(arg - 1);
    }
    public Eval Fact(Eval arg)
    {
        return e => Tuples.Create(e, Fact(arg(e).Second));
    }
    public Eval Let(Eval x, Eval value, Eval body)
    {
        return e =>
        {
            var name = x(e).First.First();
            var bound = value(e).Second;
            var eval = body(e.Bind(name.Key, bound));
            return Tuples.Create(e, eval.Second);
        };
    }
}

A little more involved than the simple math language, but most of that is due to passing around the environment. The original operations are all essentially the same, just operating on the tuple's second int value. Only the new Let/Var operations are markedly different. Let adds a new binding to the environment, and var looks up the most closest binding with the given name in the current environment.

The parser for this extended language is actually quite trivial:

/// <summary>
/// A parser for math expressions with lexically-scoped variables.
/// </summary>
/// <typeparam name="T">The type of equation values.</typeparam>
class EquationGrammar<T> : MathGrammar<T>
{
    public EquationGrammar(IEquationSemantics<T> eq) : base(eq)
    {
        Match("(ident)", Sequence(Where(char.IsLetter),
              ZeroOrMore(Where(char.IsLetterOrDigit))), 0, eq.Var);
        TernaryPrefix("let", "=", "in", 90, eq.Let);
        Disambiguate("let", "in", "(ident)");
    }
}

Like the semantics, the equation grammar inherits the implementation of the simple math grammar and simply adds a few new rules. The Match rule identifies variable names that must start with a letter, followed by an arbitrarily long sequence of letters and numbers.

The TernaryPrefix rule identifies operators that have the following form: PREFIX expression INFIX expression POSTFIX. This is again a shortcut for defining a common operator type. In this case, we use the let-binding syntax from F# and other ML languages, ie. let x = 3 in x + 1.

The Disambiguate declaration is intended to simplify the common case where we only want a single valid parse. Basically, the keywords "let" and "in" also match the "(ident)" rule for variable names. Without the disambiguate declaration, some parses would be ambiguous, and since we want to produce a single parse for any given string, Disambiguate allows you to specify the priority of the parses. In this case, the "let" operator rule dominates over the identifier rule, so "let" will never be parsed as an identifier.

As with MathGrammar, the concrete type of the parse, T, is completely abstract, so we can produce values, or parse trees, or any other type that can satisfy the constraints of IEquationSemantics. This parsing design allows independent extensions of grammars, and parsed values, the latter of which can include direct interpreters, compilers, or other data structures that satisfy the required semantics.

Turing-Complete Parsing

So far we've reviewed simple parses with a set of pre-defined parse structures, like infix, prefix and ternary operators. However, Pratt parsing is Turing-complete, so the parsing can be arbitrarily complex. Digging into this machinery will demonstrate how the above operator rules were created, and how you can define your own operator rules.

I will demonstrate with a simple example of parsing lists of numbers:

sealed class ListGrammar : Grammar<IEnumerable<int>>
{
    public ListGrammar()
        : base(Lexers.SingleMatch)
    {
        Match("(int)", OneOrMore(Where(char.IsDigit)), 0, Int);
        var rule = Rule(new Rule<IEnumerable<int>>("(list)")
        {
            Lex = Terminal('['),
            LeftBindingPower = 10,
            NullDenotation = parser =>
            {
                var list = new List<int>();
                for (var tok = parser.Token;
                     tok.Id == "(int)";
                     tok = parser.Token)
                {
                    list.AddRange(parser.Parse(0));
                }
                parser.Advance("]");
                return list;
            }
        });
        Symbol("]");
        Skip("(whitespace)", OneOrMore(Where(char.IsWhiteSpace)));
    }
    static IEnumerable<int> Int(string s)
    {
        yield return int.Parse(s);
    }
}

I haven't bothered separating out the semantics for numbers and lists into its own interface for this simple example. The basic idea of defining parsing rules is defining the symbols that should appear in the token stream, and then registering either a NullDenotation, or a LeftDenotation, depending on the type of rule it's suppose to be.

A NullDenotation is a rule that has no parsed values to its left. The rule simply recognizes the starting tokens, and then takes the current parser as an argument and proceeds to manually process a token stream, ultimately returning a parsed value:

NullDenotation: Func<PrattParser<T>, T>

A simple example of a null denotation is a prefix operator like negation. Parses to the right of a negation operator exist, but none to its left.

A LeftDenotation is a rule that expects some parses to its left. The signature of a LeftDenotation is:

LeftDenotation: Func<Token<T>, PrattParser<T>, T, T>

A LeftDenotation accepts the token representing the operator, the current parser, and the parsed value immediately to the left. Infix and Postfix operators are simple examples of a LeftDenotation.

The operation of the Infix, Postfix and Prefix declarations previously covered should now be clear:

Rule<T> Infix(string op, int bindingPower, Func<T, T, T> selector)
{
    var sym = Symbol(op, bindingPower);
        sym.LeftDenotation = (tok, parser, left) =>
                             selector(left, parser.Parse(bindingPower));
    return sym;
}
Rule<T> Postfix(string op, int bindingPower, Func<T, T> selector)
{
    var sym = Symbol(op, bindingPower);
        sym.LeftDenotation = (tok, parser, left) => selector(left);
    return sym;
}
Rule<T> Prefix(string op, int bindingPower, Func<T, T> selector)
{
    // do not set binding power since no left denotation
    var sym = Symbol(op, 0);
        sym.NullDenotation = parser =>
                             selector(parser.Parse(bindingPower));
    return sym;
}

The parser.Parse(int) operation performs a parse according to the expected precedence of the next rule. This makes Pratt parsing a simple and efficient forward-only parser.

We can use any looping or testing constructs within NullDenotation and LeftDenotation, so the token streams that can be recognized are arbitrarily complex. The grammars definable using Sasa.Parsing are declarative, simple and quite efficient, so download and give it a whirl!