Skip to main content

CIL Verification and Safety

I've lamented here and elsewhere some unfortunate inconveniences and asymmetries in the CLR -- for example, we have nullable structs but lack non-nullable reference types, an issue I address in my Sasa class library.

I've recently completed some Sasa abstractions for safe reflection, and an IL rewriter based on Mono.Cecil which allows C# source code to specify type constraints that are supported by the CLR but unnecessarily restricted in C#. In the process, I came across another unjustified decision regarding verification: the jmp instruction.

The jmp instruction strikes me as potentially incredibly useful for alternative dispatch techniques, and yet I recently discovered that it's classified as unverifiable. This seems very odd, since the instruction is fully statically typed, and I can't think of a way its use could corrupt the VM.

In short, the instruction performs a control transfer to a named method with a signature matching exactly the current method's signature, as long as the evaluation stack is empty and you are not currently in a try-catch block (see section 3.37 of the ECMA specification).

This seems eminently verifiable given a simple control-flow analysis, an analysis which the verifier already performs to verify control-flow safety of some other verifiable instructions. If anyone can shed some light on this I would appreciate it.

Comments

Rodrigo Kumpera said…
You forgot to mention an additional restriction required to make jmp verifiable. No managed pointers must be passed as arguments.

It would otherwise require much more sophisticated verification rules.

Besides possible historical artifacts and mistakes, one issue is that some implementation specific constraints could make it impossible to implement jmp. And unlike the tail prefix, it doesn't allow for conditional execution.

Honestly, I never got it, the need for jmp. A tail call can do the exect same thing and be JIT'd to the same code sequence.
Sandro Magi said…
I don't think I understand your objection. What do you consider a "managed pointer"?

If you just meant an ordinary reference, then I disagree that this is unsafe.

If you meant an actual unsafe pointer, well we're already handling unsafe types so verification isn't an issue.

I agree that general tail calls are better, but MS is kind of stubborn that tail calls remain inefficient. JMP just seemed like a sort of middle ground that might be useful in lieu of tail calls for some scenarios.

Finally, I disagree that it doesn't allow for conditional execution. All that's required is that the evaluation stack is empty and you're not in a protected block. This should be more or less valid:
IL_001: ldnull
IL_002: brtrue IL_004
IL_003: jmp void Target1(void)
IL_004: jmp void Target2(void)
Rodrigo Kumpera said…
A managed pointer is 335 parlance for a byref argument/local.

Tail calls/jmp to methods with such kind of parameters are not verifiable without complex dataflow analysis or significant restrictions.

I did use a very bad wording for 'conditional execution' of tail calls. I meant that a CLI implementation is not required to obey a tail prefix, while it must for jmp.
Sandro Magi said…
Right, passing a byref argument to a local should be prevented. It's pretty trivial to do this conservatively, ie. detect when there's both a jmp and an starg of a local address in the same function.

A control-flow analysis to detect whether the starg and the jmp are ever on the same execution path is just a nice refinement.
Qwertie said…
I agree, not only would a jmp be useful for exotic dispatch, but often (in performance-sensitive code) I write two versions of a method, a "public" version that verifies the arguments and a "protected" one that assumes the arguments are valid:

void RemoveAt(int i)
{ /* verify the index and call RemoveAtCore */ }
void RemoveAtCore(int i)

This allows code inside the class to bypass the index check if the caller knows it's safe. A tail call helps, but a jmp might be more efficient since you know the JIT won't duplicate the argument(s).

It is irritating also that tail calls cannot be guaranteed to work, so self-recursive functions can cause stack overflow in cases where the equivalent code in Haskell couldn't.

Anyway, surely the verifier should at least allow jmp in cases where there are no "ref" args (given that it would be so little work to implement.)

Popular posts from this blog

async.h - asynchronous, stackless subroutines in C

The async/await idiom is becoming increasingly popular. The first widely used language to include it was C#, and it has now spread into JavaScript and Rust. Now C/C++ programmers don't have to feel left out, because async.h is a header-only library that brings async/await to C! Features: It's 100% portable C. It requires very little state (2 bytes). It's not dependent on an OS. It's a bit simpler to understand than protothreads because the async state is caller-saved rather than callee-saved. #include "async.h" struct async pt; struct timer timer; async example(struct async *pt) { async_begin(pt); while(1) { if(initiate_io()) { timer_start(&timer); await(io_completed() || timer_expired(&timer)); read_data(); } } async_end; } This library is basically a modified version of the idioms found in the Protothreads library by Adam Dunkels, so it's not truly ground bre...

Easy Automatic Differentiation in C#

I've recently been researching optimization and automatic differentiation (AD) , and decided to take a crack at distilling its essence in C#. Note that automatic differentiation (AD) is different than numerical differentiation . Math.NET already provides excellent support for numerical differentiation . C# doesn't seem to have many options for automatic differentiation, consisting mainly of an F# library with an interop layer, or paid libraries . Neither of these are suitable for learning how AD works. So here's a simple C# implementation of AD that relies on only two things: C#'s operator overloading, and arrays to represent the derivatives, which I think makes it pretty easy to understand. It's not particularly efficient, but it's simple! See the "Optimizations" section at the end if you want a very efficient specialization of this technique. What is Automatic Differentiation? Simply put, automatic differentiation is a technique for calcu...

Building a Query DSL in C#

I recently built a REST API prototype where one of the endpoints accepted a string representing a filter to apply to a set of results. For instance, for entities with named properties "Foo" and "Bar", a string like "(Foo = 'some string') or (Bar > 99)" would filter out the results where either Bar is less than or equal to 99, or Foo is not "some string". This would translate pretty straightforwardly into a SQL query, but as a masochist I was set on using Google Datastore as the backend, which unfortunately has a limited filtering API : It does not support disjunctions, ie. "OR" clauses. It does not support filtering using inequalities on more than one property. It does not support a not-equal operation. So in this post, I will describe the design which achieves the following goals: A backend-agnostic querying API supporting arbitrary clauses, conjunctions ("AND"), and disjunctions ("OR"). Implemen...