Sunday, December 5, 2010

Sasa v0.9.3 Released!

I recently realized that it's been over a year since I last put out a stable Sasa release. Sasa is in production use in a number of applications, but the stable releases on Sourceforge have lagged somewhat, and a number of fixes and enhancements have been added since v0.9.2.

So I decided to simply exclude the experimental and broken abstractions and push out a new release so others could benefit from everything Sasa v0.9.3 has to offer.

The changelog contains a full list of changes, too numerous to count. I'll list here a few of the highlights.

IL Rewriter

C# unfortunately forbids certain types from being used as generic type constraints, even though these constraints are available to CIL. For instance, the following is legal CIL but illegal in C#:
public void Foo<T>(T value)
where T : Delegate
Sasa now provides a solution for this using its ilrewrite tool. The above simply uses the Sasa.TypeConstraint<T> in the type constraint and the rewriter will erase all references to TypeConstraint leaving the desired constraints in place:
public void Foo<T>(T value)
where T : TypeConstraint<Delegate>
The Sasa library itself makes pervasive use of these type constraints to provide generic versions of static functions on System.Enum, thread-safe delegates add/remove, null-safe delegate calls, and more.
Simply call ilrewrite like so:
ilrewrite /verify /dll:[your dll] /[Debug | Release]

The /verify option runs peverify to ensure the rewrite produced verifiable IL. Pass /Debug if you're rewriting a debug build, and /Release if you're rewriting a release build. I have it set up as a Visual Studio post-build event, so I call it with /$(ConfigurationName) for this parameter.

Thread-safe and Null-Safe Event Handling

The CLR provides first-class functions in the form of delegates, but there are various problems that commonly creep up which Sasa.Events is designed to fix:

Invoking delegates is not null-safe

Before invoking a delegate, you must explicitly check whether the delegate is null. If the delegate is in a field instead of a local, you must first copy the field value to a local or you leave yourself open to a concurrency bug, where another thread may make the field null between the time you checked and the time you call it (this is commonly known as a TOCTTOU bug, ie. Time-Of-Check-To-Time-Of-Use).

This involves laborious and tedious code duplication that the C# compiler could have easily generated for us. The Sasa.Events.Raise overloads solve both of the above problems. Instead of:
var dlg = someDelegate;
if (dlg != null) dlg(x, y, z);

you can simply call:
someDelegate.Raise(x, y, z);
This is null-safe, and thread-safe.

Event add/remove is not thread-safe

Events are a pretty useful idiom common to .NET programs, but concurrency adds a number of hazards for which the C# designers provided less than satisfactory solutions.

For instance, declaring a publicly accessible event creates a hidden "lock object" that the add/remove handlers first lock before modifying the event property. This is not only wasteful in memory, it's also expensive in highly concurrent scenarios. Furthermore, this auto-locking behaviour is completely different for code residing inside the class as compared to code outside the class. Needless to say, this unnecessarily subtle semantics was constantly surprising C# developers.

Enter Sasa.Events.Add/Remove. These overloads accept a ref to a delegate field, and perform an atomic compare and exchange on the field directly, eliminating the need for lock objects, and providing more scalable event registration/unregistration. Code that looked like this:
event Action fooEvent;
fooEvent += newHandler;
or like this:
event Action fooEvent;
lock (this) fooEvent += newHandler;
can now both be replaced by this:
Action fooEvent;
Events.Add(ref fooEvent, newHandler);
This code has less overhead than the standard event registration code currently generated by any C# compiler, in both concurrent and non-concurrent settings.

Safe, Statically-Typed and Blazingly-Fast Reflection

Reflection is incredibly useful, and incredibly dangerous. You are forced to work with your objects as untyped data which makes it difficult to write correct programs, and the compiler can't help you.

Most operations using reflection are functions operating over the structure of types. To make reflection safe, we only need a single reflective function that breaks apart an object into a stream of field values. The client then provides a stream processing function (the reflection function) that handles all the type cases that it might encounter.

Sasa.Dynamics is here to help. Type<T>.Reflect is a static reflection function for type T which breaks up instances of type T into its fields (use DynamicType.Reflect if you're not sure of the concrete type).

The client need only provide an implementation of IReflector, which defines a callback-style interface completely describing the CLR's primitive types and providing you with an efficient ref pointer to the field's value for get/set purposes, and a FieldInfo instance providing access to the field's metadata:
public interface IReflector
void Bool(ref bool field, FieldInfo info);
void Int16(ref short field, FieldInfo info);
void Object<T>(ref T field, FieldInfo info);
The compiler ensures that you handle every case in IReflector. You handle non-primitive objects in IReflector.Object<T>, by recursively calling DynamicType.Reflect(field, this, fieldInfo).

Type<T> and DynamicType use lightweight code generation to implement a super-fast dispatch stub that invokes IReflector on each field of the object. These stubs are cached, so over time the overhead of reflection is near-zero. Contrast to the typical reflection overheads, and not only is this technique safer, it's significantly faster as well.

Extensible, Statically-Typed Turing-Complete Parsing

I covered the implementation of the Pratt parser before, and the interface has changed only a little since then. Pratt parsing is intrinsically Turing complete, so you can parse literally any grammar. The predefined combinators are for context-free grammars, but you can easily inject custom parsing functions.

What's more, each grammar you define is extensible in that you can inherit from and extend it in the way you would any other class. Here is a grammar from the unit tests for a simple calculator:
abstract class MathSemantics<T> : Grammar<T>
public MathSemantics()
Infix("+", 10, Add); Infix("-", 10, Sub);
Infix("*", 20, Mul); Infix("/", 20, Div);
InfixR("^", 30, Pow); Postfix("!", 30, Fact);
Prefix("-", 100, Neg); Prefix("+", 100, Pos);

Group("(", ")", int.MaxValue);
Match("(digit)", char.IsDigit, 1, Int);

protected abstract T Int(string lit);
protected abstract T Add(T lhs, T rhs);
protected abstract T Sub(T lhs, T rhs);
protected abstract T Mul(T lhs, T rhs);
protected abstract T Div(T lhs, T rhs);
protected abstract T Pow(T lhs, T rhs);
protected abstract T Neg(T arg);
protected abstract T Pos(T arg);
protected abstract T Fact(T arg);

The unit tests then contain an implementation of the grammar which is an interpreter:
sealed class MathInterpreter : MathSemantics<int>
protected override int Int(string lit) { return int.Parse(lit); }
protected override int Add(int lhs, int rhs) { return lhs + rhs; }
protected override int Sub(int lhs, int rhs) { return lhs - rhs; }
protected override int Mul(int lhs, int rhs) { return lhs * rhs; }
protected override int Div(int lhs, int rhs) { return lhs / rhs; }
protected override int Pow(int lhs, int rhs) { return (int)Math.Pow(lhs, rhs); }
protected override int Neg(int arg) { return -arg; }
protected override int Pos(int arg) { return arg; }
protected override int Fact(int arg)
return arg == 0 || arg == 1 ? 1 : arg * Fact(arg - 1);
Instead of interpreting directly, you could just as easily have created a parse tree.

The tests also contain an extended grammar that inherits from MathSemantics and adds lexically-scoped variables (see EquationParser at the link):

sealed class EquationParser : MathSemantics<Exp>
public EquationParser()
Match("(ident)", char.IsLetter, 0, name => new Var { Name = name });
TernaryPrefix("let", "=", "in", 90, Let);

MIME Parsing

Sasa contains a simple stand-alone assembly devoted to MIME types, file extensions, and functions mapping between the two (Sasa.Mime).

The Sasa.Net.Mail namespace in the Sasa.Net assembly, contains functions for parsing instances of System.Net.Mail.MailMessage from strings, including attachments in every encoding I've come across in the past few years. This code has been in production use in an autonomous e-mail processing program which has processed tens of thousands of e-mails over many years, with very few bugs encountered.

It can also format MailMessage instances into string form suitable for transmission over texty Internet protocols.


The library is also better factored than before, and has numerous handy extensions to IEnumerable/LINQ, strings, numbers, tuples, endian encoding, url-safe binary encoding, some purely functional collections, easy file system manipulation (first described here), Stream extensions (including stream-to-stream copying), endian-aware binary encoding, non-blocking futures, atomic exchange extensions, non-nullable types, lazy types, statically typed weak refs, code generation utilities (first described here), statistical functions, and much more.

The full docs for this release are available online.


Unfortunately, the efficient, compact binary serializers from the last release have been deprecated, and the replacements based on Sasa.Dynamics are not yet ready. The ASP.NET page class that is immune to CSRF and clickjacking attacks first released in v0.9 has been removed for now as well, since it depended on the compact binary serializer.

I have plenty of new developments in the pipeline too. 2010 saw many interesting safety enhancement added to Sasa as outlined above, and 2011 will be an even more exciting year I assure you!

Edit: the original download on sourceforge was missing the ilrewrite tool. That oversight has now been addressed.


John Zabroski said...

Sasa is shaping up to be pretty amazing. You should consider writing an OOPSLA practitioner's report next year about it. I think it would be the first time in history that I've read a practitioner's report and didn't fall asleep.

Sandro Magi said...

Thanks for the kind words! I'll consider that when Sasa gets a little closer to 1.0. Some of these features don't enjoy the degree of test coverage I'd like since they were just fun hacks at the time.

I'm not even sure what abstractions people find interesting. The ones I described in this post are the most innovative or unusual for .NET libraries that I've encountered, but I haven't received much feedback beyond the MIME library.

I have demo projects in mind for parsing (a small programming language), and serialization will be a good demonstration of dynamics, but I don't have a good demo in mind for the event functions. A simple concurrent program would be nice, so if you have any thoughts please let me know!

John Zabroski said...

Maybe I'll contribute some documentation to Sasa, then.

I've got a couple of projects I am working on now as-is: (1) SQLPSX PowerShell project's monitoring API and dashboardd (2) an ORM for Haskell (3) Porting Google Refine to .NET [non-trivial] (4) Porting Gource from C++ and SDL to F# with the help of C# SDL .NET library.

I just finished another port recently from a WPF app to Silveright (with improved abstractions and code re-use). I don't know why, but huge ports have been a fun hobby for me lately.

If you have anything you want help with, just make a list.


John Zabroski said...

Actually, maybe I could start with a tutorial on the Enum stuff in Sasa. I did some research a few months ago on all the available tools for manipulating Enum's in .NET when I wanted to add an EnumFlagsCSharpRepresentationVisualizer for Visual Studio 2008, similar to the same feature available in VS 2005+ for C++. I kind of went overboard with the research since all I needed to do was a coercive cast, but I just got really curious with the design options.

Have you seen Jon Skeet's Enum library?

Sandro Magi said...

Re: unconstrained-melody, yes, some of the features I implemented were inspired by Skeet's blog posts. I just took another quick look at the source there, and there is some odd code there for the enums. He builds some dictionaries for descriptions, and returns IList instead of IEnumerable, so it's definitely not as efficient or minimal as the interface Sasa provides.

As for what Sasa is missing, I think just tutorials and tests, mainly for the concurrent abstractions really. I briefly looked into using MS's CHESS concurrency checker to more rigourously verify the futures and event extensions, but haven't had time to really dig into it. That's definitely an interesting project though if you're looking for something cool to do.

Each assembly has a TODO.txt file detailing some of the plans I have, or incomplete/unsatisfactory abstractions, with a rough timeline I'm trying to stick to. You might be interested in the IL rewriting extensions I plan to add, such as pattern matching inlining, and an analysis to ensure a struct's default constructor is never called (mainly to make NonNull more useful). But anything you'd like to do is welcome!

John Zabroski said...

Give me some time and space to think about the pattern matching tricks.

I'll just focus on writing tutorials for the existing stuff, since it will also push me to read more of the code.

With regards to IL Rewriting, I consider that to be a very big topic. Have you seen the CCI-Metadata project on Codeplex? It is written by the same dude @ MSFT who wrote ILMerge. He pretty much just wanted a less crappy ILMerge. He should've used F# for CCI-Metadata though. I imagine the code being simpler and executing faster.

Also, as my friend jokes, "Nemerle, C# 5.0, Today!"

Sandro Magi said...

Re: CCI-Metadata, thanks for the ref, it's new to me. I'm just using Mono.Cecil 2.0 for rewriting, which sounds similar though less ambitious.

I was excited about Nemerle for awhile, but it's really difficult to gauge its status, and the devs aren't very interested in developing a community.

Re: pattern matching, I wasn't looking at anything fancy, just to inline the compiler-generated delegates for matching on Either types. It certainly could get more interesting considering runtime tests and casts are more efficient than vtable dispatch, but that's further down the road.

Anyhow, let me know the tutorials go! Feel free to submit bugs, suggestions or feature requests.

John Zabroski said...

I see CCI-Metadata as fitting into the PHX (Phoenix) compiler stack...

Peter said...

I found a small bug in PrattParser.cs (Sasa 0.9.3); the line counts are not updated for skipped tokens (see lines 567-573).

For my purposes I also had to add a "CurrentToken" property to PrattParser ('var t' from the main loop), else I couldn't see a way to access the Nud for a token during evaluation of its Led ('Token' has already advanced). I hope that explanation makes sense. It appears to not be an issue in the examples I've seen due to the operators tending to be fixed strings. Perhaps I have missed something.

Finally, can I propose the following addition, which I found made my life alot easier when producing scanners for my parser:

protected static Scanner ScanRegex(string regex)
Regex r = new Regex(@"\G" + regex);
return (input, start) =>
Match m = r.Match(input, start);
return m.Success ? start + m.Length : start;

Sandro Magi said...

Hey Peter,

Re: line counts, I believe you're correct. I'll implement a fix tonight or tomorrow.

Re: CurrentToken, can you provide a simple example? Your mention of fixed operators suggests you might be extending the symbol table dynamically with user-defined operators. I haven't applied PrattParser in this domain yet, so it might indeed be missing something important.

Re: ScanRegex, that's eminently doable. I generally try to avoid using control strings, so I'd prefer to actually add combinators for the regex matching you've found useful, but that's certainly a useful interim solution. Thanks!

Sandro Magi said...

I've fixed the line count bug, and also found another bug with the Grammar.Match declaration, so thanks Peter!

I've added the Regex scanner as well, with an optional argument for RegexOptions.

I haven't done anything regarding the previous token, which Peter named "CurrentToken". I know which token this refers to, but I'd like to better understand the specific circumstances where it's needed before making this change.

Assuming it's needed, I think the best solution would be to extend Token with a property designating the previous Token, so you can look back an arbitrary number of tokens.

Peter said...

Hi Sandro, thanks for your response.

The language I was parsing is GML, from the ICFP 2000 ray tracer task. I believe that falls into the category of "dynamic, functional programming language", which according to Doug Crockford's article should be easy with a Pratt parser.

It's really just lists of tokens (including nested lists for functions, arrays). I ended up adding a Led to every symbol:

protected Symbol ListSymbol(string id)
var lex = Symbol(id, 50);
lex.Led =
(parser, left) =>
list(left, parser.CurrentToken.Nud(parser));
return lex;

"list" is essentially 'cons' here. Notice though that I need to parse the CurrentToken, and because I need the match string, essentially, I have to call Token.Nud.

(By the way, the parser worked fine with those minor changes, and I went on to get a working ray tracer, so thanks!)

I expect I'm missing something obvious, as I'm new to Pratt parsers. Possibly, there could just be an alternate Parse method for returning a list of tokens (where there's a sort of default Led of list append).

I haven't needed this, but don't some languages allow you to apply a function f in infix position by writing 'f or `f or something? e.g. "1 'plus 2" Perhaps that would be an interesting case to prototype?

Incidentally, I think the \G I was prepending to the regex expression is important, otherwise the regex may match past the start location. It appears to be missing from the SVN version.

Sandro Magi said...

I added a few more convenience declarations to handle lists of values, and I committed a trivial list grammar to the test suite. It's basically just integer values and lists of integers.

I used a temporary hack to check token membership, but the essence of list parsing is there. I don't specify a Led anywhere, just a Nud on the list symbol, and no dependency on previous tokens. I basically just took the implementation of the Group declaration, and instead of just returning a single value between the open and close delimiters, I process a list of values.

It's not purely functional like your version, but the Pratt parsing algorithm is inherently imperative, so no great loss there IMO. I think a purely functional version is possible, it's just too late to think about it now. :-)

Re: regular expression and \G, I don't use regexes much, and I couldn't find much documentation on \G. Can you provide a simple test case that I can add to the test suite?

Peter said...

About the \G in RegEx strings, see e.g.

In short, it ensures the scanner will only match at the 'start' position given.

Given the current Grammer.RegEx method, a test would be something like:

Scanner s = RegEx("abc");
Assert(s("123abc", 2) == 2); // no match
Assert(s("123abc", 3) == 6); // match

I think the first assertion would fail as the code is right now. I suppose alternatively it could be left to the caller to add the @"\G" themselves, but that seems like an avoidable source of bugs.

Sandro Magi said...

Thanks for the references Peter, I just couldn't seem to find any discussion of \G. Why Match would take an index but seemingly ignore the parameter entirely unless you specify a cryptic regex option is beyond me.

I've committed the addition to Sasa.