Sunday, December 5, 2010

Sasa v0.9.3 Released!

I recently realized that it's been over a year since I last put out a stable Sasa release. Sasa is in production use in a number of applications, but the stable releases on Sourceforge have lagged somewhat, and a number of fixes and enhancements have been added since v0.9.2.

So I decided to simply exclude the experimental and broken abstractions and push out a new release so others could benefit from everything Sasa v0.9.3 has to offer.

The changelog contains a full list of changes, too numerous to count. I'll list here a few of the highlights.

IL Rewriter


C# unfortunately forbids certain types from being used as generic type constraints, even though these constraints are available to CIL. For instance, the following is legal CIL but illegal in C#:
public void Foo<T>(T value)
where T : Delegate
{
...
}
Sasa now provides a solution for this using its ilrewrite tool. The above simply uses the Sasa.TypeConstraint<T> in the type constraint and the rewriter will erase all references to TypeConstraint leaving the desired constraints in place:
public void Foo<T>(T value)
where T : TypeConstraint<Delegate>
{
...
}
The Sasa library itself makes pervasive use of these type constraints to provide generic versions of static functions on System.Enum, thread-safe delegates add/remove, null-safe delegate calls, and more.
Simply call ilrewrite like so:
ilrewrite /verify /dll:[your dll] /[Debug | Release]

The /verify option runs peverify to ensure the rewrite produced verifiable IL. Pass /Debug if you're rewriting a debug build, and /Release if you're rewriting a release build. I have it set up as a Visual Studio post-build event, so I call it with /$(ConfigurationName) for this parameter.

Thread-safe and Null-Safe Event Handling


The CLR provides first-class functions in the form of delegates, but there are various problems that commonly creep up which Sasa.Events is designed to fix:

Invoking delegates is not null-safe


Before invoking a delegate, you must explicitly check whether the delegate is null. If the delegate is in a field instead of a local, you must first copy the field value to a local or you leave yourself open to a concurrency bug, where another thread may make the field null between the time you checked and the time you call it (this is commonly known as a TOCTTOU bug, ie. Time-Of-Check-To-Time-Of-Use).

This involves laborious and tedious code duplication that the C# compiler could have easily generated for us. The Sasa.Events.Raise overloads solve both of the above problems. Instead of:
var dlg = someDelegate;
if (dlg != null) dlg(x, y, z);

you can simply call:
someDelegate.Raise(x, y, z);
This is null-safe, and thread-safe.

Event add/remove is not thread-safe


Events are a pretty useful idiom common to .NET programs, but concurrency adds a number of hazards for which the C# designers provided less than satisfactory solutions.

For instance, declaring a publicly accessible event creates a hidden "lock object" that the add/remove handlers first lock before modifying the event property. This is not only wasteful in memory, it's also expensive in highly concurrent scenarios. Furthermore, this auto-locking behaviour is completely different for code residing inside the class as compared to code outside the class. Needless to say, this unnecessarily subtle semantics was constantly surprising C# developers.

Enter Sasa.Events.Add/Remove. These overloads accept a ref to a delegate field, and perform an atomic compare and exchange on the field directly, eliminating the need for lock objects, and providing more scalable event registration/unregistration. Code that looked like this:
event Action fooEvent;
...
fooEvent += newHandler;
or like this:
event Action fooEvent;
...
lock (this) fooEvent += newHandler;
can now both be replaced by this:
Action fooEvent;
...
Events.Add(ref fooEvent, newHandler);
This code has less overhead than the standard event registration code currently generated by any C# compiler, in both concurrent and non-concurrent settings.

Safe, Statically-Typed and Blazingly-Fast Reflection


Reflection is incredibly useful, and incredibly dangerous. You are forced to work with your objects as untyped data which makes it difficult to write correct programs, and the compiler can't help you.

Most operations using reflection are functions operating over the structure of types. To make reflection safe, we only need a single reflective function that breaks apart an object into a stream of field values. The client then provides a stream processing function (the reflection function) that handles all the type cases that it might encounter.

Sasa.Dynamics is here to help. Type<T>.Reflect is a static reflection function for type T which breaks up instances of type T into its fields (use DynamicType.Reflect if you're not sure of the concrete type).

The client need only provide an implementation of IReflector, which defines a callback-style interface completely describing the CLR's primitive types and providing you with an efficient ref pointer to the field's value for get/set purposes, and a FieldInfo instance providing access to the field's metadata:
public interface IReflector
{
void Bool(ref bool field, FieldInfo info);
void Int16(ref short field, FieldInfo info);
...
void Object<T>(ref T field, FieldInfo info);
}
The compiler ensures that you handle every case in IReflector. You handle non-primitive objects in IReflector.Object<T>, by recursively calling DynamicType.Reflect(field, this, fieldInfo).

Type<T> and DynamicType use lightweight code generation to implement a super-fast dispatch stub that invokes IReflector on each field of the object. These stubs are cached, so over time the overhead of reflection is near-zero. Contrast to the typical reflection overheads, and not only is this technique safer, it's significantly faster as well.

Extensible, Statically-Typed Turing-Complete Parsing


I covered the implementation of the Pratt parser before, and the interface has changed only a little since then. Pratt parsing is intrinsically Turing complete, so you can parse literally any grammar. The predefined combinators are for context-free grammars, but you can easily inject custom parsing functions.

What's more, each grammar you define is extensible in that you can inherit from and extend it in the way you would any other class. Here is a grammar from the unit tests for a simple calculator:
abstract class MathSemantics<T> : Grammar<T>
{
public MathSemantics()
{
Infix("+", 10, Add); Infix("-", 10, Sub);
Infix("*", 20, Mul); Infix("/", 20, Div);
InfixR("^", 30, Pow); Postfix("!", 30, Fact);
Prefix("-", 100, Neg); Prefix("+", 100, Pos);

Group("(", ")", int.MaxValue);
Match("(digit)", char.IsDigit, 1, Int);
SkipWhile(char.IsWhiteSpace);
}

protected abstract T Int(string lit);
protected abstract T Add(T lhs, T rhs);
protected abstract T Sub(T lhs, T rhs);
protected abstract T Mul(T lhs, T rhs);
protected abstract T Div(T lhs, T rhs);
protected abstract T Pow(T lhs, T rhs);
protected abstract T Neg(T arg);
protected abstract T Pos(T arg);
protected abstract T Fact(T arg);
}

The unit tests then contain an implementation of the grammar which is an interpreter:
sealed class MathInterpreter : MathSemantics<int>
{
protected override int Int(string lit) { return int.Parse(lit); }
protected override int Add(int lhs, int rhs) { return lhs + rhs; }
protected override int Sub(int lhs, int rhs) { return lhs - rhs; }
protected override int Mul(int lhs, int rhs) { return lhs * rhs; }
protected override int Div(int lhs, int rhs) { return lhs / rhs; }
protected override int Pow(int lhs, int rhs) { return (int)Math.Pow(lhs, rhs); }
protected override int Neg(int arg) { return -arg; }
protected override int Pos(int arg) { return arg; }
protected override int Fact(int arg)
{
return arg == 0 || arg == 1 ? 1 : arg * Fact(arg - 1);
}
}
Instead of interpreting directly, you could just as easily have created a parse tree.

The tests also contain an extended grammar that inherits from MathSemantics and adds lexically-scoped variables (see EquationParser at the link):

sealed class EquationParser : MathSemantics<Exp>
{
public EquationParser()
{
Match("(ident)", char.IsLetter, 0, name => new Var { Name = name });
TernaryPrefix("let", "=", "in", 90, Let);
}
...
}

MIME Parsing


Sasa contains a simple stand-alone assembly devoted to MIME types, file extensions, and functions mapping between the two (Sasa.Mime).

The Sasa.Net.Mail namespace in the Sasa.Net assembly, contains functions for parsing instances of System.Net.Mail.MailMessage from strings, including attachments in every encoding I've come across in the past few years. This code has been in production use in an autonomous e-mail processing program which has processed tens of thousands of e-mails over many years, with very few bugs encountered.

It can also format MailMessage instances into string form suitable for transmission over texty Internet protocols.

Miscellaneous


The library is also better factored than before, and has numerous handy extensions to IEnumerable/LINQ, strings, numbers, tuples, endian encoding, url-safe binary encoding, some purely functional collections, easy file system manipulation (first described here), Stream extensions (including stream-to-stream copying), endian-aware binary encoding, non-blocking futures, atomic exchange extensions, non-nullable types, lazy types, statically typed weak refs, code generation utilities (first described here), statistical functions, and much more.

The full docs for this release are available online.

Deprecated


Unfortunately, the efficient, compact binary serializers from the last release have been deprecated, and the replacements based on Sasa.Dynamics are not yet ready. The ASP.NET page class that is immune to CSRF and clickjacking attacks first released in v0.9 has been removed for now as well, since it depended on the compact binary serializer.

I have plenty of new developments in the pipeline too. 2010 saw many interesting safety enhancement added to Sasa as outlined above, and 2011 will be an even more exciting year I assure you!

Edit: the original download on sourceforge was missing the ilrewrite tool. That oversight has now been addressed.