Skip to main content

Reflection, Attributes and Parameterization

I used to be a big fan of reflection, and C#'s attributes also looked like a significant enhancement. Attributes provide a declarative way to attach metadata to fields, methods, and classes, and this metadata is often used during reflection.

The more I learned about functional programming, type systems, and so on, the more I came to realize that reflection isn't all it's cracked up to be. Consider .NET serialization. You can annotate fields you don't want serialized with the attribute [field:NonSerialized].

However, metadata is just data, and every usage of attributes can be replaced with a pair of interfaces. Using [field:NonSerialized] as an example, we can translate this class:
class Foo {
[field:NonSerialized]
object bar;
}

Into one like this:
// these two interfaces take the place of a NonSerializableAttribute declaration
interface INonSerialized {
void Field<T>(ref T field);
}
interface IUnserializableMembers {
void Unserializable(INonSerialized s);
}
class Foo : IUnserializableMembers {
object bar;
void Unserializabe(INonSerializable s) {
s.Field(ref bar);
}
}

Essentially, we are replacing reflection and metadata with parameterization, which is always safer. This structure is also much more efficient than reflection and attribute checking.

Consider another example, the pinnacle of reflection, O/R mapping. Mapping declarations are often specified in separate files and even in a whole other language (often XML or attributes), which means they don't benefit from static typing. However, using the translation from above, we can obtain the following strongly type-checked mapping specification:
// specifies the relationships of objects fields to table fields
interface IRelation
{
// 'name' specifies the field name in the table
void Key<T>(ref T id, string name);
void Field<T>(ref T value, string name);
void Foreign<T>(ref T fk, string name);
void ForeignInverse<T>(ref T fk, string foreignName);
void List<T>(ref IList<T> list);
void List<T>(ref IList<T> list, string orderBy);
void Map<K, T>(ref IDictionary<K, T> dict);
}
// declares an object as having a mapping to an underlying table
interface IPersistent
{
void Map(IRelation f);
}
// how to use the above two interfaces
class Bar: IPersistent
{
int id;
string foo;
public void Map(IRelation f)
{
f.Key(ref id, "Id");
f.Field(ref foo, "Foo");
}
public string Foo
{
get { return foo; }
}
}

There are in general two implementors of IRelation: hydration, when the object is loaded from the database and the object's fields are populated, and write-back, when the object is being written back to the database. The IRelation interface is general enough to support both use-cases because IRelation accepts references to the object's fields.

This specification of the mappings is more concise than XML mappings, and is strongly type-checked at compile-time. The disadvantage is obviously that the domain objects are exposed to mapping, but this would be the case with attributes anyway.

Using XML allows one to cleanly separate mappings from domain objects, but I'm coming to believe that this isn't necessarily a good thing. I think it's more important to ensure that the mapping specification is concise and declarative.

Ultimately, any use of attributes in C# for reflection purposes can be replaced by the use of a pair of interfaces without losing the declarative benefits.

Comments

Emir Uner said…
Hi, can you give a code example for a hypotethical code for using
the first use case -- a serializer using your proposal instead of
reflection+attributes.

An example reflection+attribute using version may be:

void serialize(Object o) {
foreach(var field in o.GetType().GetFields()) {
if(field.Attributes.NonSerialized) {
// Do not serialize
} else {
// Serialize
}
}
}
Sandro Magi said…
There are two ways to provide serialize via interfaces: via positive or negative information.

The NonSerialized attribute is negative information, and I was confusing it with the implementation for positive information, so here's the actual interface:

interface INonSerialized {
void Field(string fieldName);
}

A serializer would still use reflection, or some other means to access the object internals and would maintain the set of fields it should skip:

class Serializer : INonSerialized {
ISet skip = new Set();// any set semantics will do

void Serialize(object obj, Stream s) {
if (obj is IUnserializableMembers) {
(obj as IUnserializableMembers).Unserializable(this);
}
foreach(var field in o.GetType().GetFields()) {
if (skip.Contains(field.Name)) {
// do not serialize
} else { ... }
}
}

void Field(string fieldName) {
skip.Add(fieldName);
}
}

I don't like this implementation much, and I prefer the converse:

interface ISerializer {
void Version(string version);
void Field<T>(ref T value);
void Deleted<T>(string version);
}
interface ISerializable {
void Serialize(ISerializer s);
}

This is a minimal serialization interface that I think can replace 90% of the standard serializer, including versioning, and yet produce binaries that are more compact. Example usage:

class Test : ISerializable {
int i;
string name;
...
public void Serialize(ISerializer s) {
s.Version("1.0.0");
s.Field(ref i);
s.Field(ref name);
}
}

ISerializable implementors must ensure that they do not reorder Field calls. New fields must go at the end. Any Field calls to be deleted should be replaced with a Deleted call, along with the version number at which the field was deleted.

Pros:
1. Runs at full speed, ie. no reflection penalty.
2. Statically typed.

Cons:
1. Client must implement the interface (though compilers can easily derive the necessary function).
2. Slightly higher implementation burden.

It's an interesting alternative design at the very least.
Sandro Magi said…
Another Pro of the alternate serialization design is the clearer versioning semantics; .NETs versioning is rather complicated by comparison.

Popular posts from this blog

async.h - asynchronous, stackless subroutines in C

The async/await idiom is becoming increasingly popular. The first widely used language to include it was C#, and it has now spread into JavaScript and Rust. Now C/C++ programmers don't have to feel left out, because async.h is a header-only library that brings async/await to C! Features: It's 100% portable C. It requires very little state (2 bytes). It's not dependent on an OS. It's a bit simpler to understand than protothreads because the async state is caller-saved rather than callee-saved. #include "async.h" struct async pt; struct timer timer; async example(struct async *pt) { async_begin(pt); while(1) { if(initiate_io()) { timer_start(&timer); await(io_completed() || timer_expired(&timer)); read_data(); } } async_end; } This library is basically a modified version of the idioms found in the Protothreads library by Adam Dunkels, so it's not truly ground bre...

Easy Automatic Differentiation in C#

I've recently been researching optimization and automatic differentiation (AD) , and decided to take a crack at distilling its essence in C#. Note that automatic differentiation (AD) is different than numerical differentiation . Math.NET already provides excellent support for numerical differentiation . C# doesn't seem to have many options for automatic differentiation, consisting mainly of an F# library with an interop layer, or paid libraries . Neither of these are suitable for learning how AD works. So here's a simple C# implementation of AD that relies on only two things: C#'s operator overloading, and arrays to represent the derivatives, which I think makes it pretty easy to understand. It's not particularly efficient, but it's simple! See the "Optimizations" section at the end if you want a very efficient specialization of this technique. What is Automatic Differentiation? Simply put, automatic differentiation is a technique for calcu...

Easy Reverse Mode Automatic Differentiation in C#

Continuing from my last post on implementing forward-mode automatic differentiation (AD) using C# operator overloading , this is just a quick follow-up showing how easy reverse mode is to achieve, and why it's important. Why Reverse Mode Automatic Differentiation? As explained in the last post, the vector representation of forward-mode AD can compute the derivatives of all parameter simultaneously, but it does so with considerable space cost: each operation creates a vector computing the derivative of each parameter. So N parameters with M operations would allocation O(N*M) space. It turns out, this is unnecessary! Reverse mode AD allocates only O(N+M) space to compute the derivatives of N parameters across M operations. In general, forward mode AD is best suited to differentiating functions of type: R → R N That is, functions of 1 parameter that compute multiple outputs. Reverse mode AD is suited to the dual scenario: R N → R That is, functions of many parameters t...