Sunday, April 20, 2014

Immutable Sasa.Collections.Tree vs. System.Collections.Dictionary vs. C5 HashDictionary

I've previously posted about Sasa's hash-array mapped trie, but I never posted any benchmarks. I recently came across this post on Stackoverflow which provided a decent basic benchmark between .NET's default Dictionary<TKey, TValue>, the C5 collection's hash dictionary, F#'s immutable map, and .NET's new immutable collections.

I slightly modified the file to remove the bench against the F# map and the new immutable collections since I'm still using VS 2010, and I added a simple warmup phase to ensure the methods have all been JIT compiled and the GC run to avoid introducing noise:

static void Warmup()
{
    var x = Tree.Make<string, object>();
    var y = new C5.HashDictionary<string, object>();
    var z = new Dictionary<string, object>();
    z.Add("foo", "bar");
    for (var i = 0; i < 100; ++i)
    {
        x = x.Add("foo" + i, "bar");
        y.Add("foo" + i, "bar");
        z.Add("foo" + i, "bar");
        var tmp1 = x["foo" + i];
        var tmp2 = y["foo" + i];
        var tmp3 = z["foo" + i];
    }
    x = default(Tree<string, object>);
    y = null;
    z = null;
    GC.Collect();
}

The results are still somewhat representative. This is a sample of an average output, where "Imm" is Sasa's immutable HAMT:

# - 100
SCGD -          0 MS -         25 Ticks
C5   -          0 MS -        887 Ticks
Imm  -          0 MS -        387 Ticks

# - 1000
SCGD -          0 MS -        257 Ticks
C5   -          0 MS -        294 Ticks
Imm  -          0 MS -        368 Ticks

# - 10000
SCGD -          1 MS -       4084 Ticks
C5   -          1 MS -       5182 Ticks
Imm  -          1 MS -       5436 Ticks

# - 100000
SCGD -         28 MS -      85742 Ticks
C5   -         32 MS -      99280 Ticks
Imm  -         32 MS -      97720 Ticks

Observations:

  1. C5's standard deviation was somewhat wider than both Sasa's HAMT and SCGD, so it's performance seems slightly less predictable
  2. Sasa's immutable HAMT appears to perform within 5% of the mutable C5 collection at all collection sizes
  3. Sasa's immutable HAMT appears to perform within 15% of the mutable SCGD for large collections where the hash table with higher load factors
  4. Small collections requiring a small load factor clearly advantage the mutable SCGD by up to an order of magnitude, an advantage not shared by C5 for some reason (possibly they maintain a higher load factor)
  5. C5's terrible performance on very small collections of 100 items was consistent on every test run, again possibly because they maintain a high load factor before resizing
  6. Sasa's HAMT takes just as much time to load 1000 items as it takes to load 100 items; this was consistent across every test run, and it's not clear why

Finally, while not exactly apples-to-apples, Sasa's HAMT is easily 3-4× faster than F#'s map given the numbers cited in the above Stackoverflow post. F# still has an advantage for very small collections though. Sasa's HAMT also appears to be at least 2× faster than the new immutable collections.

Also keep in mind that this benchmark only tests lookup performance. F#'s map would have an advantage over Sasa's HAMT in load performance because the HAMT does not yet include a "bulk-load" operation, which the F# map does appear to support.

Tuesday, April 1, 2014

A Truly Slim Read/Write Lock in C#

It's pretty well known that the CLR's ReaderWriterLock and ReaderWriterLockSlim have unappealing performance characteristics. Each class also encapsulates signficant state, which precludes its use in fine-grained concurrency across large collections of objects.

Enter Sasa.Concurrency.RWLock in the core Sasa assembly. This is the most lightweight R/W lock I could come up with, particularly in terms of resources used. It's a struct that encapsulates a simple integer that stores the number of readers and a flag indicating whether a writer is active.

The interface is similar to ReaderWriterLockSlim, although there are a few differences which are needed to keep the encapsulated state so small:

public struct RWLock
{
  // this field is the only state needed by RWLock 
  private int flags;

  public void EnterReadLock();
  public void ExitReadLock();
  public bool TryEnterReadLock();

  public void EnterWriteLock(object sync);
  public bool TryEnterWriteLock(object sync);
  public void ExitWriteLock(object sync);
}

Conceptually, EnterWriteLock calls Monitor.Enter(sync), which ensures that only a single writer acquires the write lock. It then sets the write bit in the "flags" state, and loops yielding its time slice until all read locks are released.

EnterReadLock also loops yielding its time slice until the write flag is cleared, and then it uses Interlocked.Increment to acquire a read lock, and Interlocked.Decrement to release the read lock.

The TryEnterReadLock and TryEnterWriteLock provide non-blocking semantics, so there is no looping. If the lock on 'sync' cannot be acquired, or the write flag is set, TryEnterWriteLock and TryEnterReadLock respectively return false immediately. They never block or loop under any circumstances.

The RWLock implementation is about 150 lines of heavily commented code, so it's easily digestible for anyone whose interested in the specifics. There are also some rules to abide by when using RWLock:

  1. The same 'sync' object must be passed to all write lock calls on a given RWLock. Obviously if you use a different object, more than one writer can proceed. Different objects can be used for different RWLocks of course.
  2. Recursive write locks are forbidden and will throw LockRecursionException. Recursive read locks are permitted.
  3. You cannot acquire a read lock inside a write lock, or a write lock inside a read lock. If you do, your program will immediately deadlock.

Unlike the base class libraries, none of my concurrency abstractions accept timeout parameters. Timeouts hide concurrency bugs and introduce pervasive non-determinism, which is partly why concurrent programs are traditionally hard to debug. Timeouts should be rare, and specified separately at a higher level than these low-level concurrency primitives.

Friday, February 14, 2014

Clavis 1.0.0-alpha2 Released

Stan Drapkin, of SecurityDriven.NET fame, was nice enough to provide a little feedback on the Clavis implementation. He pointed out a possible issue with parameter names not being properly URL-encoded, which this release fixes. I also applied a few minor optimizations so generating URLs should be a little faster. I've been using Clavis in production for a couple of years now, so it's fairly stable and user-friendly.

The Clavis wiki and issue tracker are available here.

Monday, January 27, 2014

Clavis Rebooted: Secure, Type-Safe URLs for ASP.NET

A few years ago, I wrote about a web security microframework for ASP.NET which provided a few primitives for secure parameter-passing and navigation. I've just released a public alpha on Nuget for anyone who's willing to try it.

The previous article covered the theoretic foundation of Clavis well enough, but it has undergone a few small revisions to make it easier to use and integrate more seamlessly with ASP.NET. This post will serve as an end-user introduction to Clavis, the rationale behind the design decisions, and the benefits it provides. As a brief summary to whet your appetite, here are the advantages that the Clavis library provides for an otherwise standard ASP.NET web forms or MVC project:

  • By default, URLs are derived from types, so the compiler ensures that every page that will be displayed actually exists. The default URL generated can be overridden via an attribute.
  • Declarative specification of the types and number of parameters a page accepts, which the compiler checks for you. Any type can appear in this specification.
  • Query string parameter names are managed for you, so you rarely need to mess about with parameter strings. In fact, parameter strings never appear anywhere unless you want to override the default parameter name.
  • Taint-checking by declarative specification of unprotected page parameters, which the compiler ensures you handle properly.
  • Incremental integration into existing projects, so adopting Clavis is not an all-or-nothing proposition.

If any of this interests you, please read on.

Overview

Query Parameters: The Problems

The web provides a few mechanisms to pass data to servers. The most commonly used are probably query parameters, like http://google.com/search?q=http+query+string. This is a fairly straightforward means of providing the server with named parameters carrying data. You can view an HTTP request as a function call that returns some content, and the query parameters as the named arguments for that function.

This mental model actually works pretty well. An HTTP server generates content with embedded function calls that the client's browser invokes to access more content. Unfortunately, the limitations of query parameters become immediately obvious. Because function invocations are transparent, malicious clients can easily alter server-specified parameter values. Sometimes this behaviour is allowed, like the previous google query example, but there are many scenarios where we don't want clients to be able to make such changes. For instance, we often want to specify query parameters that happen to be easily guessable data layer identifiers for objects, but because these parameters can be changed and are easily guessable, clients can easily obtain access to data they shouldn't.

Unfortunately, there's no simple means of preventing clients from changing such parameters, which means raw query parameters can't be used to carry trusted content. Cookies and sessions were then invented to address some of the limitations of query parameters, but they carry their own set of problems.

Implicit Sessions: The Problems

Fundamentally, most site problems with scaling, usability and security can be traced back to the implicit sessions that ASP.NET creates for you.

The security problems are well known: the session encourages a style of server-side parameter passing that immediately opens you up to cross-site request forgery attacks (CSRF). This is typically addressed by adding a new security mechanism to close this hole, but this is arguably the wrong approach which is endemic to some security practices. Instead, don't "add security", remove insecurity.

The scaling problem is simply due to the fact that the session itself can't be part of the presentation layer since it's stateful, but it also doesn't live in the data layer and so doesn't benefit from the robustness and replication of that layer either. In order to scale sessions, they require all the same mechanisms of a data layer which seemingly defeats the purpose of separating it from the data layer to begin with. However, session state is typically not placed in the data layer because it's not part of the domain model. Session state logically belongs to the navigation/interaction component of the presentation layer, so most developers don't want to "pollute" their domain model.

The usability problems are simple: sessions are often stored in cookies which are shared across different browser instances. It's thus far too easy to introduce surprising behaviour for users that view multiple pages of your site at once. The more state you store in the session, the more likely this will occur. For instance, it seems pretty common to store an object instance that's being "edited" in the session, and I was guilty of this too when I first started using ASP.NET because it's so convenient. The immediate implication is that a user can't edit two instances of the same class type at the same time in the same browser. Furthermore, a server upgrade or restart would reset these sessions so that users lose all of their changes.

This is often addressed by out-of-process sessions, but this introduces a whole new set of problems due to serialization. For instance, if the object being edited is one that was upgraded in an incompatible way, deserialization won't succeed so the user can't proceed where they left off. Also, the object could contain unserializable state. For instance, NHibernate lazy references and collections contain an embedded reference to a particular SQL connection. This connection can't be sent out of process, and whatever object instance is deserialized on a subsequent request can't be easily connected to the new SQL connection.

The same serialization and state lifecycle problems occur for sessions stored in an SQL database, although at least these benefit from data layer replication.

The Solution

While the prospects may seem rather bleak at this point, we need only revisit the options to see if there's a simpler alternative that addresses the requirements. For reasons explained in my previous post, the right way to send this sort of data to the server are query parameters, and not cookies and sessions which introduce far too many complications of their own. We only need some standard mechanism to ensure that certain query parameter values can't be changed.

In fact, a simple mechanism for preventing message tampering is already known: the HMAC. So basically, we just need to include an HMAC of the protected query parameters in every URL, and the server can easily ensure that those query parameters are unchanged on all subsequent requests.

The URL must have some standard form so that this checking can be automated by the framework, and in Clavis this URL takes the form of an additional query parameter named "clavis". A site URL that previously looked like:

http://host.com/foo?param1=bar

where 'param1' should be protected from tampering, will now look something like:

http://host.com/foo?-param1=bar&clavis=ab52H_6jKVeyH4h8

You can see above that Clavis prefixes the query parameter with a '-' character in order to distinguish the protected from unprotected parameters. Clavis currently utilizes RIPEMD-160, with the upper 80 bits folded into the lower 80 bits to generate the "clavis" parameter. The 80 bits are encoded using a URL-safe base64 encoding. That makes the clavis parameter 16 characters, which is sufficiently short to be human-readable, buit sufficiently unguessable for most purposes. Future updates may make the the choice of HMAC and parameter length configurable.

A web app using Clavis need only call Continuation.Validate() prior to processing the request, and if no exception is thrown then the protected parameters were unchanged. As long as all application state between pages is passed via protected query parameters, this also makes your app automatically immune to CSRF.

In reality, your app will be immune to CSRF as long as state that identifies the user is protected in this manner. Even if attackers can guess the user identifier, they can't generate a valid URL because they don't know the server-side private key used in the HMAC, and so CSRF is impossible.

Data Types

Logically, there are 3 types of values passed as query parameters: basic data values (string, int, etc.), basic data values that are proxies for server-side objects (like database keys), and lists of either of the previous two types. These are precisely the basic data types provided by Clavis. Basic data values are simply the primitive CLR types that implement IConvertible. For instance, here is a page accepting an int, a string and a decimal:

public class Foo : IContinuation<int, string, decimal>
{
  ...
}

If you don't know what a continuation is, don't worry about the specifics. You just need to understand that a page specification is declared via IContinuation<...> with the parameter types filled in. By default, all parameter types specified in an IContinuation<...> declaration are assumed to be protected. If we want to allow the client to change the decimal value, then we simply wrap it in Unsafe<T> like so:

public class Foo : IContinuation<int, string, Unsafe<decimal>>
{
  ...
}

The Unsafe<T> wrapper declares the value to be unprotected, and so it will not be included in the HMAC. Continuations containing no protected parameters, ie. all unsafe parameters or no parameters, will not generate a "clavis" HMAC parameter, so you can always generate semantically meaningful URLs when you really need them. This also makes it easy to support forms submitted via the HTTP GET method.

Suppose we want Foo to accept a list of strings. The declaration will now look like this:

public class Foo : IContinuation<int, IEnumerable<string>, Unsafe<decimal>>
{
  ...
}

List parameters generate multiple entries in the query string, as is the standard approach with URLs. For instance, one possible URL for the above Foo might be:

~/Foo?-Int32=0&-String=foo&-String=bar&-String=hello+world&Decimal=3.14

For an explanation of how the parameter names are generated, see the section below on URL generation. You can also nest list and unsafe declarations, so an unsafe list of string parameters will look like:

public class Foo : IContinuation<int, Unsafe<IEnumerable<string>>, Unsafe<decimal>>
{
  ...
}

Finally, Foo can also implement multiple continuation types:

public class Foo : IContinuation<int, Unsafe<IEnumerable<string>>, Unsafe<decimal>>
                 , IContinuation<DateTime, Unsafe<char>>
{
  ...
}

Note that the above continuation parameters can specify any type, since this declaration is supposed to be a logical specification of the parameters Foo accepts. For instance, here's a Foo that accepts InventoryItem and Customer objects:

public class Foo : IContinuation<InventoryItem, Customer>
{
  ...
}

The actual representation of those parameters when generating URLs is specified separately.

The parameters that are not IConvertible can also exploit proxy values, which are basic IConvertible values that represent the server-side non-IConvertible object, like InventoryItem above. This proxy type is Clavis.Id<TProxy, TType>, which is basically a logical identifier of type TProxy that designates an object instance of type TType. For instance, a Customer with an integer identifier 1234 has an Id<int, Customer> = 1234.

The Id<TProxy, TType> type shouldn't appear in a continuation specification, although you can if you really want to. It's typically used only inside the continuation when accepting/parsing URL parameters for non-IConvertible types. See below for more details.

In summary, any type can be used in a continuation declaration, and the special types Unsafe<T> and IEnumerable<T> hold special meaning in Clavis.

URL Generation

Clavis autogenerates URLs by convention based on the fully qualified type name. The translation is simply:

Some.Namespace.SomeClass+Inner => /Some/Namespace/SomeClass+Inner?

The assembly is not included because Clavis assumes that an external component actually resolves paths to concrete instances, which is consistent with standard ASP.NET conventions.

You generate a URL by calling the Continuation.ToUrl() overloads with the required parameters:

Continuation.ToUrl<Foo, InventoryItem, Customer>(
  item.AsParam(item.Id),
  customer.AsParam(customer.Id));

Alternately, you could use Continuation.Params().ToUrl(), which exploits a little more of C#'s type inference and so is perhaps more convenient:

Continuation.Params(item.AsParam(item.Id),customer.AsParam(customer.Id))
            .ToUrl<Foo>();

Note how InventoryItem and Customer must specify an IConvertible type as a proxy/representation for the query param, because they are not IConvertible themselves. Parameters ultimately all reduce to IConvertible primitive types, and you will receive a compile-time error if you try to specify a non-IConvertible type as a representation.

By default, Clavis generates the URL parameter names based on the class name, so the above two URLs will look like:

~/Foo?-InventoryItem=123&-Customer=456&clavis=asd1gh823mP

Each Param.AsParam() overload accepts an optional "key" parameter to override the parameter name, but this requires ensuring that the same name is used everywhere which can become rather inconvenient. It's much more convenient to just let Clavis handle naming for you wherever possible, and you can control parameter naming via class names which are checked by the compiler.

Processing Parameters

Now that we know how to specify continuations and their parameters, and we know how to create continuation URLs, let's see how URL parameters are actually processed inside a continuation. Clavis provides some overloaded static methods for extracting parameters, which are of the form Param.TryParseX<T>(out T value), where X is the index of the parameter in the continuation specification. Consider the following continuation:

public class Counter: System.Web.Page, IContinuation<int>
{
  int counter;
  ...
}

In the page's OnInit method, we can extract the integer parameter like so:

public class Counter: System.Web.Page, IContinuation<int>
{
  int counter;
  override protected void OnInit(EventArgs e)
  {
    if (this.TryParse0(out counter))
      this.lblMessage.Text = "Current: " + counter;
    else
      this.lblMessage.Text = "New counter: 0";

    base.OnInit(e);
  }
}

TryParse0 means "parse parameter 0 of the continuation specification", which in this case is of type Int32. TryParseX returns true if the query parameter exists and was successfully parsed, else it returns false. If we were trying to process the Nth continuation parameter, then we would call TryParseN.

As you can see, Clavis handles all the tedious IConvertible parameter parsing for you. In the case of non-IConvertible types, we utilize Id<TProxy, TType>:

public class CustomerDetails : Page, IContinuation<Customer>
{
  Customer cust;
  override protected void OnInit(EventArgs e)
  {
    Id<int, Customer> custId;
    if (this.TryParse0(out custId))
      cust = SomeDb.Customers.Single(x => x.Id == custId.Key);
    else
      throw new InvalidOperationException("No customer specified!");

    base.OnInit(e);
  }
}

Note again that Clavis handles the tedious parsing of the proxy value for you, and leaves it up to you to load the instance via Linq, NHibernate, etc. Sometimes parsing failure can be treated as an error like above, but when pages implement multiple continuation types, it's not necessarily an error:

public class Generic : Page, IContinuation<Customer>, IContinuation<Product>
{
  Customer cust;
  Product prod;
  override protected void OnInit(EventArgs e)
  {
    Id<int, Customer> custId;
    if (this.TryParse0(out custId))
      cust = SomeDb.Customers.Single(x => x.Id == custId.Key);

    Id<int, Product> prodId;
    if (this.TryParse0(out custId))
      prod = SomeDb.Products.Single(x => x.Id == prodId.Key);

    // NOTE: could optionally throw error here
    //if (cust == null && prod == null) throw new ArgumentNullException("prod or cust");

    base.OnInit(e);
  }
}

Clavis can even handle nesting, like parsing unsafe lists of object identifiers:

public class Generic: Page,IContinuation<Unsafe<IEnumerable<Customer>>>
{
  IEnumerable<Customer> customers;
  override protected void OnInit(EventArgs e)
  {
    Unsafe<IEnumerable<Id<int, Customer>>> custIds;
    if (this.TryParse0(out custIds))
    {
      var ids = custIds.Value.Select(x => x.Key);
      customers = SomeDb.Customers.Where(x => ids.Contains(x.Id));
    }
    base.OnInit(e);
  }
}

Conclusion

ASP.NET isn't the ideal web framework, but I've used Clavis in a few projects to automate some of the tedium and eliminate some common errors and security holes in such applications. Because URLs are derived from types, the compiler ensures that programs have no dangling page references. Furthermore, because unsafe/unprotected parameters that the client can change are identified by a distinct type, the compiler also ensures that you're aware of every point a potentially unsafe value is used.

Finally, you can declare a high-level specification of the types a page accepts as parameters, and the compiler ensures that any pages that redirect to this page call it with the appropriate number of parameters, with the correct types, and it automatically generates the parameter names for you so you don't have to mess about with strings.

All of this is achieved in a piecemeal fashion, so you don't have to adopt Clavis within an entire project all at once. You can instead just convert one page at a time, updating all pages that redirect to it.

Full API documentation is available online, or as a downloadable .CHM file. The latest Clavis version can be downloaded here, or via Nuget.

For support, see the Clavis Trac server where all development can be tracked. The ticket system is open for bug reports, requests or questions.

Saturday, December 21, 2013

Sasa v0.12.0 Released

Just a quick announcement that Sasa v0.12.0 was just released. You can obtain the individual assemblies via Nuget, or the whole set from Sourceforge. The docs are available as a CHM file on sourceforge, or online here.

Overview

This release includes a few fixes, the most prominent of which are in the HTML parser and the parsing of MIME linked resources.

Note that since v0.11.0, some extension methods on MailMessage have been deprecated due to Microsoft's recommended usage guidelines. The headers those extension methods accessed are overwritten whenever a MailMessage is sent via SmtpClient, so it's better not to rely on them.

A new feature is the sasametal utility, which is basically a wrapper around sqlmetal that normalizes some of the bizarre property names from sqlmetal into more CLR friendly names. A forthcoming blog post will cover the use of this tool.

Other new features include first-class references, and first-class slots. These are currently in the core Sasa assembly, but may be moved to a satellite assembly in the future if they don't see enough use.

Other changes cover backend work that simply expands the power of pre-existing features, like a more efficient and reusable PIC dispatch, now supporting up to 16 type arguments. There's also a new assembly, Sasa.Partial, which provides partial application overloads for System.Func and System.Action, up to 10 arguments.

Changelog

 * added Strings.RemoveLast extension which is a convenient way of removing
   characters from the end of a StringBuilder.
 * made MailMessage parsing stricter to conform to Microsoft's usage
   recommendations for MailMessage
 * deprecated certain header parsers, like ContentType() and
   ContentTransferEncoding(), since .NET strips these when sending mail anyway
 * HTML views are no longer unpacked into MailMessage.Body since docs say this
   property is reserved for plain/text
 * QuotedPrintable decoding is now more permissive to encoding errors
 * merged sasametal tool that normalizes the output of sqlmetal
 * ilrewrite now only outputs pdb file if an input pdb was available
 * adapted MailMessage parsing code to .NET 4.0 (mainly ReplyToList)
 * added text encoding extension methods to ContentType
 * added extension method for filtering a sequence of attachments by media type 
 * added extension method for extracting attachment data as a string using the
   encapsulated encoding
 * added extension method to overwrite an attachment's data using string
   content using the encapsulated encoding
 * Strings.SliceEquals now has an overload accepting a StringComparison
   parameter to customize the comparison type performed
 * Tokenizer now takes an optional parameter for the type of string
   comparisons
 * HTML parser now performs case-insensitive comparisons
 * now correctly parsing alternate view linked resources
 * added implementations for first-class references to all core CLR types
 * added implementations for first-class slots to all core CLR types
 * PIC now based on a concurrency-safe map that looks up a tuple of System.Type
 * PIC and CLR expression compiler can now efficiently dispatch up to 16 params
 * Sasa.Dynamics simplified by switch to new PIC
 * added .NET 3.5-specific overloads of System.Func and System.Action for up to
   16 params
 * added Sasa.Partial assembly, which provides partial application overloads
   for System.Func and System.Action for up to 10 params
 * fixed a couple of bugs in Sasa.Dynamics codegen

Monday, November 18, 2013

First-Class Slots for .NET

Yesterday, I posted about the new extension of IRef<T> to arbitrary reference semantics for the CLR, including referencing inner fields, properties, and array slots. First-class references make it simple to operate on specific mutable data without caring about the underlying type of that data.

I just pushed another abstraction that handles a related, but different case: first-class slots.

ISlot<T>

An object "slot" is a value that designates a mutable location of a specific class of values, not a mutable location of a specific instance like first-class references. Where first-class references hide the underlying object type, slots expose the object type and allow you to mutate the slots of multiple objects at once, as long as they are subtypes of the slot's object type. Here's the declaration of ISlot<T>:

public interface ISlot<TObj, T>
{
    T Get(TObj obj);
    T Set(TObj obj, T value);
}

As you can see, the object we're manipulating is passed in while operating on it, so an ISlot is actually a direct descriptor that can access members of any instance of a subtype of TObj. If you're wondering when you would ever need this, rest assured that there are applications that need to write algorithms of this type. For instance, consider an object-relational mapper (ORM), which uses reflection to extract the members that need to be get/set on object flush/load from a database. Essentially, the ORM is reflecting over all slots of an object being flushed or loaded from a database, but it does so in a manner that isn't very reusable, and the object hydration code is coupled tightly to the database access code as a result.

Reifying slots as a distinct, first-class abstraction makes them independently testable, and the reflection code and database access code is now very loosely coupled. An ORM is but one example of an application that makes use of generic object slots.

Similar to first-class references, the Slot class exposes some static constructors for creating slots:

public abstract class Slot
{
  public static Member<TObj, T> Create<TObj, T>(Func<TObj, T> get, Action<TObj, T> set);
  public static Array<T> Create<T>(int index);
  public static Member<TObj, T> Create<TObj, T>(MemberInfo member)
  public static Member<TObj, T> Create<TObj, T>(Expression<Func<TObj, T>> member)

  public sealed class Array<T> : Slot, ISlot<T[], T> { ... }
  public sealed class Member<TObj, T> : Slot, ISlot<TObj, T> { ... }
}

These operations are exactly what you saw in the last article, where you can create slots from object members and array indices.

It might seem at first that slots are strict generalizations of first-class references, but this is deceptive. It's true that any algorithm you'd could write using references could be rewritten to use slots, but the number of type parameters and value parameters could increase non-linearly to the point where it's unwieldy and could easily obfuscate the underlying algorithm.

Limitations

There is one limitation at the moment. Specifically, value types need to be passed by reference during update in order for assignment to be visible to callers of ISlot.Set, but this isn't currently possible given the interface. As a result, there's currently a type constraint on TObj to restrict it to reference types.

A simple solution would simply be to return TObj from ISlot.Set, so the calling context can simply overwrite its own local value with the one modified by the slot. Another possibility is to make the TObj parameter to ISlot.Set a by-ref parameter. I'm considering these and a few other options, and Sasa's v0.12.0 release will probably contain the final solution.

Friday, November 15, 2013

First-Class References for .NET

References in C# are second-class citizens, which is to say, that you cannot pass them around as values. They can appear in function parameter position, but that's it:

public static void DoFoo(ref int value)
{
    Action captureRef = () => value = 3; // ERROR: ref can't escape scope
}

This makes it easy to verify their safety statically, and makes them very efficient, but it somewhat limits their expressiveness.

First-class citizens can be passed around as values, and otherwise used in any way that you'd use any other object. The above capture would work, for instance. To make references first-class citizens means constructing an object that exposes get and set operations, and that can reference the internals of any .NET type.

IRef<T>

Sasa has had the IRef<T> type for quite some time, but its full potential as a first-class reference hasn't been realized. There was only a single implementation, and that was a simple mutable slot as found in ML. I've just pushed a series of changes that effectively expands the usefulness of IRef.

You can now create a reference to an object property, a field, and a reference into an array:

public class Ref
{
    /// <summary>
    /// First-class reference indexing into an array.
    /// </summary>
    /// <typeparam name="T"></typeparam>
    public class Array<T> : Ref, IRef<T>
    ...

    /// <summary>
    /// First-class reference into an object property.
    /// </summary>
    /// <typeparam name="T"></typeparam>
    public class Member<T> : Ref, IRef<T>
    ...
}

The static Ref class contains a series of static, overloaded Create methods to construct instances of the appropriate IRef type. The signatures look like this:

public class Ref
{
    public static Ref<T> Create<T>(T value);
    public static Ref<T> Create<T>();
    public static Array<T> Create<T>(T[] array, int index);
    public static Member<T> Create<T>(Func<T> get, Action<T> set);
    public static IRef<T> Create<TObject, T>(TObject obj, MemberInfo member);
    public static IRef<T> Create<TObject, T>(TObject obj, FieldInfo field);
    public static Member<T> Create<T>(object obj, PropertyInfo property);
    public static IRef<T> Create<TObject, T>(TObject obj, Expression<Func<TObject, T>> member);
}

The first two overloads were the pre-existing constructors for slots as found in MLs, like OCaml. The third overload creates a reference that indexes into the given array. The fourth overload taking two delegates creates a ref to object members, like properties and fields. For properties these delegates are precisely the get/set methods of the property, so there's no additional overhead to using them.

The overloads taking the reflection members should be obvious as creating references to those members. The last overload is simply a convenient means of specifying the member you wish to access in a type-safe manner using LINQ expressions. For instance, here's an example from the test suite:

class Foo
{
    public int Bar { get; set; }
}

[Fact]
static void TestClassRef()
{
    var f = new Foo { Bar = 3 };
    var fref = Ref.Create(f, x => x.Bar);
    Assert.Equal(f.Bar, fref.Value);
    fref.Value = 99;
    Assert.Equal(99, f.Bar);
    Assert.Equal(f.Bar, fref.Value);
}

The reference variable fref is constructed via a simple LINQ expression at compile-time, instead of via error-prone runtime reflection. If Foo.Bar were a field instead of a property, the above code would be unchanged. The above code is for instance members. A static member looks a little different because the reference to provide is null:

class FieldObj
{
    public static int Foo = 3;
}

[Fact]
static void TestStaticFieldRef()
{
    var fref = Ref.Create(default(FieldObj), x => FieldObj.Foo);
    Assert.Equal(3, fref.Value);
    fref.Value = 99;
    Assert.Equal(99, fref.Value);
}

I use default(FieldObj) so type inference resolves the type parameters to Ref.Create, but you can use null and specify the type parameter manually if you like.

First-class references enable you to decouple programs using simple get/set semantics from the underlying representation being modified. The same algorithm exploiting IRef can transparently manipulate arrays, instance fields and properties, or static fields and properties.

These new ref types will be in the upcoming Sasa v0.12.0 release, or you can grab the source from Sourceforge and play with them now!