Skip to main content


Showing posts from 2009

Purely Functional Collections vs. Mutable Collections

I've been experimenting with some purely functional collections, also known as persistent collections, and began to wonder precisely how much overhead they incur as compared to the standard framework collections on a VM platform such as .NET.

My open source Sasa library has long had a persistent stack type, considered a list in functional languages, and I was considering adding some additional persistent collections. I therefore wanted a better understanding of the various costs involved.

I had also wanted to test an interesting design alternative to standard class-based collections. One of the consistent nuisances on widely deployed VM platforms is the widespread presence of null values. C#'s extension methods somewhat mitigate this problem since you can call extension methods on null values and properly handle it, but calling instance methods on a null value throws a NullReferenceException. Therefore, you cannot invoke ToString or Equals without first checking for null.

This is…

Easy File System Path Manipulation in C#

I came across this type-safe module for handling file paths in the Haskell subreddit this week, and thought it looked kind of neat. Handling paths as strings, even with System.IO.Path always bugged me.

So I created a close C# equivalent of the Haskell type and added it to the Sasa library, to be available in the upcoming v0.9.3 release:
public struct FsPath : IEnumerable<string>, IEnumerable
public FsPath(IEnumerable<string> parts);
public FsPath(string path);

public static FsPath operator /(FsPath path1, FsPath path2);
public static FsPath operator /(FsPath path, IEnumerable<string> parts);
public static FsPath operator /(FsPath path, string part);
public static FsPath operator /(FsPath path, string[] parts);
public static FsPath operator /(IEnumerable<string> parts, FsPath path);
public static FsPath operator /(string part, FsPath path);
public static FsPath operator /(string[] parts, FsPath path);
public static implicit operator FsPath(string path);

public Fs…

Extensible, Statically Typed Pratt Parser in C#

I just completed a statically typed Pratt-style single-state extensible lexer+parser, otherwise known as a top-down operator precedence parser, for the Sasa library. The implementation is available in the Sasa.Parsing dll, under Sasa.Parsing.Pratt. Two simple arithmetic calculators are available in the unit tests.

This implementation is novel in two ways:
Aside from an alleged implementation in Ada, this is the only statically typed Pratt parser I'm aware of.
Pratt parsers typically require a pre-tokenized input before parsing semantic tokens, but I've eliminated this step by using the symbol definitions to drive a longest-match priority-based lexer.
Symbols by default match themselves, but you can optionally provide a scanning function used to match arbitrary string patterns. The symbol selected for the current position of the input is the symbol that matches the longest substring. If two symbols match equally, then the symbol with higher precedence is selected.

The design was he…

Abstracting over Type Constructors using Dynamics in C#

I've written quite a few times about my experiments with the CLR type system [1, 2, 3]. After much exploration and reflection, I had devised a pretty good approach to encoding ML-style modules and abstracting over type constructors in C#.

A recent question on Stack Overflow made me realize that I never actually explained this technique in plain English.

The best encoding of ML modules and type constructor polymorphism requires the use of partly safe casting.
An ML signature maps to a C# interface with a generic type parameter called a "brand". The brand names the class that implements the interface, ie. the module implementation.
An ML module maps to a C# class. If the module implements a signature, then it implements the corresponding interface and specifies itself as the signature's brand.
Since classes and interfaces are first-class values, an ML functor also maps to a class.
An ML type component maps to an abstract class that shares the same brand as the module. This e…

Sasa v0.9.2 Released

The newest Sasa release contains a number of bugfixes and increased compliance with MIME standards, including proper MIME MailMessage subject decoding. Perhaps of wider interest to the .NET community, there are a number of extension methods for thread safe and null-safe event raising; safe event raising is a major bane of the .NET developer's existence.

There are specific overloads for all System.Action delegate types. A general RaiseAny extension method is also provided that isn't very efficient, but which should match the signature of any event in the class libraries. More overloads can be provided for more efficient execution if needed.

Here is the complete list of changes from the changelog:
= v0.9.2 =

* fixed bug where quoted printable encoding failed when = was the last
* added Pop3Session.Reset method.
* compact serializer no longer uses stream positions to track cached
objects, so non-indexable streams, like DeflateStream, are now
* ISerializing interfac…

SlimTimer Timesheet Processing Utility

I use the very respectable SlimTimer to help me track my hours. Unfortunately, while they display a consistent total across all reports, the entries in each report do not necessarily add up to that total due to the fractional time units and the rounding involved.

Unfortunately, the accounting department tends to frown upon inconsistencies like this, no matter the reason. My process thus far has been to simply export a full timesheet with the report settings specifying time units as precisely as possible, and then performing the sum myself on the resulting chart.

This got a bit tedious, so I wrote a program to compile the necessary tallies over however many timesheet files I wanted to process. The source and binaries are available for download. Simply drag and drop any number of timesheets generated from SlimTimer, and the utility will generate a new csv file for each timesheet, with an extension "-out.csv".

The format of the output csv file is formatted for my invoice structure…

Mobile Code in C# via Finally Tagless Interpreters

Awhile back I described an idea to transparently execute code server-side or client-side given the same program. I've finally gotten around to implementing this using my encoding for type constructor polymorphism in C#.

Here is an example you can run from the original paper showcasing exponentiation which can execute transparently either server-side or client-side. The server-side code is intentionally limited to bases less than 100 and exponents less than 20.

Here is some simplified code that defines an exponentiation function:
void Build<B, R>(B _)
where B : ISymantics<B>, new()
var e = _.Lambda<int, Func<int, int>>(
x => _.Fix<int, int>(self => _.Lambda<int, int>(n =>
_.If(_.Lte(n, _.Int(0)), () => _.Int(1),
() => _.Mul(x, _.Apply(self, _.Add(n, _.Int(-1))))))));
The parameter "_" is the tagless interpeter and it's type is ISymantics<B>,…

Sasa v0.9 Released

Sasa v0.9 has been released. See the changelog for a detailed description of the changes. Here is the original release description for Sasa v0.8. This post will describe only changes from v0.8.

Backwards-incompatible changes:
Renamed Sasa.Collections.List to Sasa.Collections.Seq to avoid clashes with System.Collections.Generic.List
Restructured the list operators to better support chaining
Useful additions include:
Sasa.Weak<T> which wraps WeakReference
Additional string processing functions, like StringExt.SliceEquals
Array-processing combinators under Sasa.Collections.Arrays (Slice, Dup, etc.)
Stream functions under Sasa.IO (CopyTo, ToArray)
Support for MIME decoding and encoding for MailMessage parsing
Better conformance to RFCs for Pop3Client and MailMessage parsing
Concurrency bugfix in Sasa.Lazy.
MIME MailMessage parsing and the Pop3Client are already in use in production code, and conformance appears adequate after hundreds of processed messages.

Experimental Additions
There a…

Garbage Collection Representations Continued II

The tagged pointer representation described in Xavier Leroy's ZINC paper is compelling in its simplicity. Most data manipulated in typical programs is integer or pointer-based in some way, so using a 1-bit tagged representation allows unboxed integers, resulting in an efficient representation of the most common data types.

I've never been satisfied with the size header in heap-allocated types though, and the two bits reserved for GC pretty much forces you to use some mark-sweep variant. Integrating something like age-oriented collection with reference counting for the old generation would require an entirely new word for every block of memory. This is a prohibitive cost in functional programs.

Removing the Size Header
Reading about BIBOP techniques used in the Streamflow allocator gave me the idea of using page masking to determine the block size.

As in Streamflow, consider an allocator where each page obtained from the OS was partitioned into a list of equally-sized blocks. Each …

Sasa Reborn!

My .NET Sasa library has fallen by the wayside as I experimented with translating various functional idioms in my FP# library. Reading up on what a few other generic class libraries have been experimenting with, like Mono Rocks, spurred me to putting those experiments to use and updating Sasa. I significantly simplified a lot of the code, documented every class and method, and generalized as much as possible. The license is LGPL v2, and you can download the source via svn:
svn co sasa
Sasa Core v0.8
A set of useful extensions to core System classes and some useful classes for high assurance development.
Named tuple types: Pair, Triple, Quad.
Either types, representing one of many possible values. There are Either types for 2, 3, and 4 parameters, mimicking the Pair, Triple, and Quad structure. Tuples are "product" types, while Either is a "sum" type, and products and sums are duals. Since products are useful, I fig…

The cost of type tests and casts in C#

Awhile back I ran some tests comparing the dispatching performance of runtime tests+casts against double dispatch. Turned out runtime type tests and casting were noticeably faster than dispatching, probably because they avoid more pipeline stalls.

Unfortunately, there is a "common wisdom" in the .NET world that an "is" test followed by an "as" cast is performing two casts, and in fact one should simply perform the "as" cast then check the result against null:
// prefer this form
string a = o as string;
if (a != null)

// to this form:
if (o is string)
string a = o as string;
In fact, that's not the case, as any compiler worth its salt will coalesce the two tests into a single cast and branch operation. I took the tests from the above dispatching and altered them to perform the cast-and-null check, then I ran the tests again with the original is-then-as form. The latter form was about 6% faster on ev…

Embedded Stack Language for .NET - Redux

Awhile ago, I had posted about an embedding of a stack language in C#. The type signatures of the functions and the stack object encoded the consumption and production of stack values, so if your program compiled, it ran correctly.

Unfortunately, the prior structure had a safety problem when generating code which I noted, but didn't have time to address.

The new structure provided below does not have the safety problem, and any functions that compile are guaranteed to execute correctly. I have also altered the style to emphasize the row variable representing the "rest of the record" which the operation knows nothing about. The row variable is denoted by "_".

This is still a fairly limited embedding, but I have added a few convenience functions, and may yet add more. Here is a sample program:
var d = new DynamicMethod("test", typeof(void), null);
var s =
1.Load() // load constant: { int }
.Int(2) // load constant: { in…