Skip to main content

Easy Automatic Differentiation in C#

I've recently been researching optimization and automatic differentiation (AD), and decided to take a crack at distilling its essence in C#.

Note that automatic differentiation (AD) is different than numerical differentiation. Math.NET already provides excellent support for numerical differentiation.

C# doesn't seem to have many options for automatic differentiation, consisting mainly of an F# library with an interop layer, or paid libraries. Neither of these are suitable for learning how AD works.

So here's a simple C# implementation of AD that relies on only two things: C#'s operator overloading, and arrays to represent the derivatives, which I think makes it pretty easy to understand. It's not particularly efficient, but it's simple! See the "Optimizations" section at the end if you want a very efficient specialization of this technique.

What is Automatic Differentiation?

Simply put, automatic differentiation is a technique for calculating the derivative of a numerical function with roughly constant time and space overhead relative to computing the function normally.

AD is typically preferred over numerical differentiation as the latter introduces imprecision due to the approximations employed and the limits of floating point precision inherent to our number types. AD is also often preferred to symbolic differentiation, as the latter can sometimes yield large and awkward terms which results in inefficient execution times, without any appreciable increase in precision.

Differentiation is critical to all sorts of optimization problems in many domains, so this is quite a useful technique!

Automatic Differentiation Basics

The most straightforward implemenation of AD is based on dual numbers, where each regular number is augmented with an extra term corresponding to it's derivative:

real number x   =(dual number transform)=>   x + x'*ϵ

Arithmetic and other mathematical functions then have translations to operating on these extended number types as follows:

Operator Translated
<x, x'> + <y, y'> <x + y, x' + y'>
<x, x'> - <y, y'> <x - y, x' - y'>
<x, x'> * <y, y'> <x*y, y'*x - y*x'>
<x, x'> / <y, y'> <x / y, (x'*y - x*y')/y2>
<x, x'>k <xk, k*x(k-1)*x'>

See the wikipedia page for more transformations, like standard trig functions.

Dual Numbers from Operator Overloading

Invoking a function "f" with dual numbers operates like this, in math notation:

f(x0 + ϵx1, x1 + ϵx2, x2 + ϵx2)

Each parameter carries an extra 'ϵ' value corresponding to the derivative, and this extra value is distinct from the 'ϵ' values of all other parameters. However, as you can see in the translation table, these derivatives interact with one another in some operators, so a general number type carries a vector for the coefficients of all other parameters. Here's the basic number type:

public readonly struct Number
    public readonly double Magnitude;
    public readonly double[] Derivatives;

    internal Number(double m, params double[] d)
        this.Magnitude = m;
        this.Derivatives = d;

A differentiable function of 3 parameters would have this signature:

Number Function(Number x0, Number x1, Number x2)

Internally, differentiation invokes the function like this:

public static Number DifferentiateAt(
    double x0, double x1, double x2, Func<Number, Number, Number, Number> func) =>
    func(new Number(x0, 1, 0, 0),
         new Number(x1, 0, 1, 0),
         new Number(x2, 0, 0, 1));

Each parameter is initialized with zeroes everywhere except for a 1 in the derivative slot corresponding to its position in the argument list. This is the necessary starting configuration for automatic differentiation in order to compute the derivatives for any of the parameters.

Operators are now pretty straightforward, corresponding to operations on the magnitude and pairwise operations on each index of the array:

public static Number operator +(Number lhs, Number rhs) =>
    new Number(lhs.Magnitude + rhs.Magnitude,
               lhs.Derivatives + rhs.Derivatives);

public static Number operator *(Number lhs, Number rhs) =>
    new Number(lhs.Magnitude * rhs.Magnitude,
               lhs.Derivatives * rhs.Magnitude + rhs.Derivatives * lhs.Magnitude);

Obviously you can't add or multiply two double[] like I've shown here, but the actual implementation hides the array behind a Derivatives type that exposes the arithmetic operators:

public static Derivatives operator +(Derivatives lhs, Derivatives rhs)
    var v = new double[lhs.Count];
    for (int i = 0; i < lhs.vector.Length; ++i)
        v[i] = lhs.vector[i] + rhs.vector[i];
    return new Derivatives(v);

public static Derivatives operator *(Derivatives lhs, Derivatives rhs)
    var v = new double[lhs.Count];
    for (int i = 0; i < lhs.vector.Length; ++i)
        v[i] = lhs.vector[i] * rhs.vector[i];
    return new Derivatives(v);

Once you have your return value of type Number, you can access the derivatives of each parameter by its index:

var y = Calculus.DifferentiateAt(x0, x1, function);
Console.WriteLine("x0' = " + y.Derivatives[0]);
Console.WriteLine("x1' = " + y.Derivatives[1]);

Hopefully, that's clear enough for most people to understand the magic behind automatic differentiation. It's an incredibly useful tool that I'm only now exploring, and it deserves to be more widely known!


Most presentations of automatic differentiation show examples where you differentiate a function with respect to only a single parameter, but the technique described above computes every derivative simultaneously. Obviously that's more general, but you typically don't need all of the derivatives which makes this technique a little wasteful.

So as a first optimization, review the starting configuration for automatic differentiation described above and consider what happens when you're interested in the derivative of x0 only:

public static Number Differentiate_X0(
    double x0, double x1, double x2,
    Func<Number, Number, Number, Number> func) =>
    func(new Number(x0, 1, 0, 0),
         new Number(x1, 0, 0, 0),
         new Number(x2, 0, 0, 0));

When you only want one of the derivatives, the ϵ coefficient of all other parameters would be zero, and all of those array slots filled with zeroes would stay zero throughout the whole computation. So toss them out!

Create a specialized Number type that doesn't incur any array allocations at all by replacing Derivatives with a single System.Double corresponding to the one parameter that's being differentiated. That parameter gets a 1 as the extra term when differentiating, the rest all start with 0.

So while you can only differentiate with respect to one variable at a time with this more specialized Number type, you only need to carry around an extra double for each step in the calculation. This would be very space and time efficient!


Popular posts from this blog

Software Transactional Memory in Pure C#

Concurrent programming is a very difficult problem to tackle. The fundamental issue is that manual locking is not composable , which is to say that if you have two concurrent programs P0 and P1 free of deadlocks, livelocks and other concurrency hazards, and you try to compose P0 and P1 to create a program P2, P2 may not be free of concurrency hazards. For instance, if P0 and P1 take two locks in different orders, then P2 will deadlock. Needless to say, this is a serious problem because composition is the cornerstone of all programming. I've been toying with some ideas for software transactional memory (STM) in C# ever since I started playing with FRP and reactive programming in general. The problem in all of these domains is fundamentally about how to handle concurrent updates to shared state, and how to reconcile multiple, possibly conflicting updates to said state. Rx.NET handles concurrency essentially by removing the identity inherent to shared state. An IObservable<T&g

async.h - asynchronous, stackless subroutines in C

The async/await idiom is becoming increasingly popular. The first widely used language to include it was C#, and it has now spread into JavaScript and Rust. Now C/C++ programmers don't have to feel left out, because async.h is a header-only library that brings async/await to C! Features: It's 100% portable C. It requires very little state (2 bytes). It's not dependent on an OS. It's a bit simpler to understand than protothreads because the async state is caller-saved rather than callee-saved. #include "async.h" struct async pt; struct timer timer; async example(struct async *pt) { async_begin(pt); while(1) { if(initiate_io()) { timer_start(&timer); await(io_completed() || timer_expired(&timer)); read_data(); } } async_end; } This library is basically a modified version of the idioms found in the Protothreads library by Adam Dunkels, so it's not truly ground bre

µKanren.NET - Featherweight Relational Logic Programming in C#

The µKanren paper is a nice introduction to a lightweight logic programming language which is a simplification of the miniKanren family of languages. The existing µKanren implementation in C# was a translation from Scheme, and thus is verbose, untyped with lots of casts, and non-idiomatic. I also found most of the other Kanren implementations unnecessarily obscure, heavily relying on native idioms that aren't clear to newcomers. uKanren.NET provides a clear presentation of the core principles of µKanren using only IEnumerable<T> and lambdas, showing that µKanren's search is fundamentally just a set of combinators for transforming sequences of states. The values of the sequence are sets of bound variables that satisfy a set of equations. For instance, given the following expression: Kanren.Exists(x => x == 5 | x == 6) You can read it off as saying there exists an integer value to which we can bind variable x, such that x equals either 5 or 6 [1]. Solving this eq