Most abstractions of interest have a natural dual, for instance, IEnumerable and IObservable, induction and co-induction, algebra and co-algebra, objects and algebraic data types, message-passing and pattern matching, etc.
Programs are more concise and simpler when using the proper abstraction, be that some abstraction X or its dual. For instance, reactive programs written using the pull-based processing semantics of IEnumerable are far more unwieldy than those using the natural push-based semantics of IObsevable. As a further example, symbolic problems are more naturally expressed via pattern matching than via message-passing.
This implies that any system is most useful when we ensure that every abstraction provided is accompanied by its dual. This also applies to virtual machine instruction sets like CIL, as I recently discovered while refining my safe reflection abstraction for the CLR.
A CIL instruction stream embeds some necessary metadata required for the CLR's correct operation. Usually this metadata is type information of some sort, such as the type of a local or the name and/or signature of a target method. For instance, here's a Hello World! example:
Unfortunately, the CIL instruction set suffers from some asymmetries which make some types of programming difficult.
For example, the ldtoken instruction takes an embedded metadata token and pushes the corresponding runtime type handle onto the evaluation stack; this is the instruction executed when using the typeof() operator in C#.
While this operation is useful, we sometimes want its dual, which is to say, we want the metadata used in a subsequent instruction to depend on the object or type handle at the top of the evaluation stack. A related operation is supported on the CLR, namely virtual dispatch, which depends on the concrete type, but dispatch is not general enough to support all of these scenarios because the vtable is immutable.
Consider a scenario where you have an untyped object, like a reference to System.Object "obj", and you want to call into some generic code, like a method Foo<T>(T value), but pass the concrete type of obj for T, instead of System.Object. Currently, you must go through a laborious process where you call GetType() on the object to obtain it's type handle, then obtain the method handle via reflection or some clever CIL hackery, then call MethodInfo.MakeGenericMethod in order to instantiate the type argument on Foo<T>, and finally, you must perform a dynamic invocation via reflection or allocate a custom delegate of type Action<T> and perform a virtual call, even though the call is statically known.
Each step of this process is expensive, and it makes typeful programming on the CLR painful when working on the boundary between typed and untyped code. Many reflection problems, like serialization, become simpler once we're dealing with fully typeful code.
Consider a dual instruction to ldtoken called "bind" which takes a type handle obtained via GetType() and then binds the resulting type handle into the executing instruction stream for the subsequent generic call to Foo<T>. This instruction could be easily and efficiently supported by any VM. Some restrictions are clearly needed for this instruction to remain verifiable, namely that the type constraints required by the target method are a subset of the type constraints of the statically known type, but the verifier already performs this analysis, and all of the information needed is available at the call site.
Fully polymorphic functions like Foo<T> trivially satisfy this condition since it has no constraints whatsoever. Serialize<T>/Deserialize<T> operations are duals, and in fact exhibit exactly the same sort of fully polymorphic type signature as Foo<T>.
There are many more programs that exhibit this structure, but they are unnecessarily difficult to write due to these asymmetries in CIL. This unfortunately requires developers to write a lot of ad-hoc code to get the results they want, and more code results in more bugs, more time, and more headaches.
Programs are more concise and simpler when using the proper abstraction, be that some abstraction X or its dual. For instance, reactive programs written using the pull-based processing semantics of IEnumerable are far more unwieldy than those using the natural push-based semantics of IObsevable. As a further example, symbolic problems are more naturally expressed via pattern matching than via message-passing.
This implies that any system is most useful when we ensure that every abstraction provided is accompanied by its dual. This also applies to virtual machine instruction sets like CIL, as I recently discovered while refining my safe reflection abstraction for the CLR.
A CIL instruction stream embeds some necessary metadata required for the CLR's correct operation. Usually this metadata is type information of some sort, such as the type of a local or the name and/or signature of a target method. For instance, here's a Hello World! example:
ldstr "Hello World!"
call void [mscorlib]System.Console::WriteLine(string)
Unfortunately, the CIL instruction set suffers from some asymmetries which make some types of programming difficult.
For example, the ldtoken instruction takes an embedded metadata token and pushes the corresponding runtime type handle onto the evaluation stack; this is the instruction executed when using the typeof() operator in C#.
While this operation is useful, we sometimes want its dual, which is to say, we want the metadata used in a subsequent instruction to depend on the object or type handle at the top of the evaluation stack. A related operation is supported on the CLR, namely virtual dispatch, which depends on the concrete type, but dispatch is not general enough to support all of these scenarios because the vtable is immutable.
Consider a scenario where you have an untyped object, like a reference to System.Object "obj", and you want to call into some generic code, like a method Foo<T>(T value), but pass the concrete type of obj for T, instead of System.Object. Currently, you must go through a laborious process where you call GetType() on the object to obtain it's type handle, then obtain the method handle via reflection or some clever CIL hackery, then call MethodInfo.MakeGenericMethod in order to instantiate the type argument on Foo<T>, and finally, you must perform a dynamic invocation via reflection or allocate a custom delegate of type Action<T> and perform a virtual call, even though the call is statically known.
Each step of this process is expensive, and it makes typeful programming on the CLR painful when working on the boundary between typed and untyped code. Many reflection problems, like serialization, become simpler once we're dealing with fully typeful code.
Consider a dual instruction to ldtoken called "bind" which takes a type handle obtained via GetType() and then binds the resulting type handle into the executing instruction stream for the subsequent generic call to Foo<T>. This instruction could be easily and efficiently supported by any VM. Some restrictions are clearly needed for this instruction to remain verifiable, namely that the type constraints required by the target method are a subset of the type constraints of the statically known type, but the verifier already performs this analysis, and all of the information needed is available at the call site.
Fully polymorphic functions like Foo<T> trivially satisfy this condition since it has no constraints whatsoever. Serialize<T>/Deserialize<T> operations are duals, and in fact exhibit exactly the same sort of fully polymorphic type signature as Foo<T>.
There are many more programs that exhibit this structure, but they are unnecessarily difficult to write due to these asymmetries in CIL. This unfortunately requires developers to write a lot of ad-hoc code to get the results they want, and more code results in more bugs, more time, and more headaches.
Comments
To make it happen one has to take care of a lot of non trivial things like:
Valuetype semantics, the object must be unboxed before passed to the called method.
Constraints, those require a non trivial amount of checks at invocation site.
Interface calls, those are really tricky if you have variance and/or multiple implementations.
What you're asking is basically something like the invokedynamic instruction of java7. Which is one order of magnitude easier to implement on top of Java's trivial type system.
There's no reason you can't just use something like DLR's inline caching. Which is high performing and could be even more with some small help from the runtime.
I don't see how constraints are a problem. The method call is already verified to succeed with whatever supertype is statically known at the call site. I'm just talking about binding the type variable T to the concrete type for the subsequent call.
invokedynamic would do it, but is probably overkill. There are many possible solutions though, which ultimately all boil down to a "typecase", aka type-indexed functions. There are many compilation techniques for type-indexed functions, and in fact the CLR already has one form with its type-passing. The problem is that type-passing is not pervasive, it's selective, so the loss of type information means this instruction is required.
Allowing vtable extensions at runtime is another way, though sounds the most difficult to me. Then again, if you have a PIC, this approach becomes feasible too.
This can be implemented entirely with the existing CLR machnery. There are various optimizations to make this even more efficient.
A key feature of my idea would be a preference for compile-time metaprogramming over run-time metaprogramming. Because it seems like 75% of the things we use reflection and dynamic methods for could be done at compile-time instead, lowering run-time cost to zero. The remaining 25% could be accelerated too, by offering better-designed reflection APIs.
* Sorry, I don't mean to imply corporations are people.
For a long established safe systems language, we have Ada. It even supports some limited compile-time metaprogramming via Ada's ASIS.
Regarding "index 0", all methods are dispatched via a unique index in a table of function pointers (vtable). Rodrigo was implying that the pattern I was after was difficult to implement on the CLR. I suggested that it could be easily implemented if the runtime reserved index 0 in every vtable to implement a double dispatch.
Imagine a system that could theoretically support nearly all programming scenarios, and still allow garbage collection, reflection, easy remoting, and run-time code generation; and which also supports multiple programming languages interoperably.
I think what's needed is a set of standards for language interoperability, reflection, JIT compilation, GC and standard libraries, that allows the FOSS community to develop the pieces independently, and does not mandate every feature for every situation (e.g. one should be able to substitute GC for simple reference counting in environments where GC is infeasible or unwanted, and reflection support could vary by language and platform.)
I wish, basically, that .NET could be changed from a cathedral to a bazaar [Eric Raymond's concept] in order to stimulate development in all directions, without losing high-level interoperability between programming languages. So the key features would be cross-language interfaces, cross-language reflection, a mechanism for languages to opt-in to a compacting GC, and a basic standard library that provides standard ways to share data efficiently across languages. An intermediate language and JIT would also be needed for running code in a sandboxed environment, or simply to allow multi-platform executables.
The cross-language part is key, since there are so many new languages being created that can't get traction because they can't interoperate with other languages beyond the most basic level (namely, by exporting and importing C-style functions). I mean, that's why I don't even consider using Haskell. I've written all this great C# code, and I just can't mix it with any of the non-.NET languages.
I'm convinced my idea is possible in principle, but maybe it'll never actually happen.
That would be difficult to achieve all in one package, since JITs are typically rather memory hungry. At best, I think something like the CLR's ahead-of-time compilation used at application install time is the best bet; you can do this with LLVM as well.