Writing IL according to the specification

I have to deal with IL quite a bit, whether it’s to write IL code or a C # code that simulates running IL instructions according to C# spec.

In order to write an IL, of course, you need to have a basic knowledge of IL instructions, stack operations, and maybe other certain things.

But there is really no need to write a large method from scratch because you can always write it in C# and see through a decompiler tool how it looks in IL.

So if for example, you need to emit some code (whether with Reflection.Emit or with Cecil, or any other framework), you can write the C# code that emulates the method and then write the decompiled IL.

On the other hand, sometimes the thing is that I do not have to write the IL (because it can be copied from an existing method), but to write code that simulates the execution of the code in the sense of how it will run according to C# spec. (e.g. if you writing an interpreter or asserting generating IL).

And this requires deep knowledge, not only in IL but in CLR and C# spec.

What does that mean to write it according to spec?

Assume we have these types: (All examples are on x64 in release mode).

interface MyInterface{} class MyGenericObject<T>{}

And there’s the following code: (1)

static void IsInterfaceGeneric(T t)
{
    System.Console.WriteLine(t is MyInterface);
}

Pretty simple, we got a generic parameter and we check it against another type.
If you just want to write this piece of code in IL, it’s very simple:

IL_0000: ldarg.0
IL_0001: box !T 
IL_0006: isinst MyInterface
IL_000b: ldnull
IL_000c: cgt.un
IL_000e: call void [System.Console]System.Console::WriteLine(bool)
IL_0013: ret

Why the box? Because if the object is a generic parameter (or a value type), the object is boxed first.

Let’s see another example: (2)

private static void AsInterfaceGeneric(T t)
{
    System.Console.WriteLine(t as MyInterface);
}

We got a generic parameter and we treat it as another type. The code is almost identical, all of us see is and as in C# every day. So what about IL?

IL_0000: ldarg.0
IL_0001: box !T
IL_0006: isinst MyInterface
IL_000b: call void [System.Console]System.Console::WriteLine(object)
IL_0010: ret

As you see, it’s two instructions less although both of them are translated to isinst instruction. And if you trying to be identical to what the C# compiler generating, it’s important to know the differences.

The isinst instruction check object against type token, but in the is example, we want to know if that check succeeded, and in the as example, we don’t care about the success result, just about the returning isinst result (null or target object)

Now let’s see another very similar code: (3)

private static void CastInterfaceGeneric(T t)
{
    System.Console.WriteLine((MyInterface)t);
}

What will be the difference? Maybe you already know that the difference between as and a direct cast is if the runtime will throw an exception in case of failure.

IL_0000: ldarg.0
IL_0001: box !T
IL_0006: castclass MyInterface
IL_000b: call void [System.Console]System.Console::WriteLine(object)
IL_0010: ret

As you guessed, or not, the result is indeed exactly the same.

Now let’s do it a little more complicated by using constraints

What will be in each example if the type was written like this:

interface MyInterface{
class MyGenericObjec<T> where T : MyInterface {}

Now the generic type has a constraint, is it change the generated IL?

Let’s start with the 1st example.

IL_0000: ldarg.0
IL_0001: box !T
IL_0006: ldnull
IL_0007: cgt.un
IL_0009: call void [System.Console]System.Console::WriteLine(bool)
IL_000e: ret

Note that the isinst instruction has been removed.

And for the 2nd the 3rd examples:

IL_0000: ldarg.0
IL_0001: box !T
IL_0006: call void [System.Console]System.Console::WriteLine(object)
IL_000b: ret

Yes, they will have the exact IL although as translated to isinst and direct cast translated to castclass.

The reason for this change is that we added the constraint on the generic type, this let the compiler optimize the output because he now understands that T type must be from MyInterface  type, so no need to cast ( castclass ) or check ( isinst ).

So why the 1st example still has the isinst?

This is because as I wrote, the is C# keyword returns not the type itself, but the result of the check, and we need that result.

And more complicated when considering optimizations

But this rise another question if the compiler knows this info and is considering it in the generating process, why he is not taking it another step farther and removing ldnull and cgt.un and simply pushing 1 to stack?

And again it returns us to read the specification, where is specified that in the isinst instruction, we should return false in case of the object is null so we can’t remove this check because we must test it against cases that t is null.

Make sense, so what will be here?

static void IsWithNotNullObject()
{
    System.Console.WriteLine(new MyObject() is MyInterface);
}

In this case, the compiler can see that obj is not null so no need to check, right?

But as we can see the check still exists in the IL:

IL_0000: newobj instance void MyObject::.ctor()
IL_0005: ldnull
IL_0006: cgt.un
IL_0008: call void [System.Console]System.Console::WriteLine(bool)
IL_000d: ret

But in this case, if we will look at the machine code we will see the difference:

In the “maybe null” example it will look like this:

L0000: test rdx, rdx
L0003: setne cl

And in the “not null” example, it will look like this:

L0004: mov ecx, 1

A bit more with constriants

Now that we saw the optimization that can be obtained with generic constraints, lest see another constraint example.

class MyGenericObject<T> where T : class, MyInterface
void IsInterfaceGeneric(T t)
{
    System.Console.WriteLine(t is MyInterface);
}

Now the compiler can know that T must be a reference type, so no box needed?

IL_0000: ldarg.1
IL_0001: box !T
IL_0006: ldnull
IL_0007: cgt.un
IL_0009: call void [System.Console]System.Console::WriteLine(bool)
IL_000e: ret

No, the box still there.

So I’m sure that this example will not change anything, right?

struct MyGenericObject<T> where T : struct, MyInterface
void IsInterfaceGeneric(T t)
{
    System.Console.WriteLine(t is MyInterface);
}

Is exactly the same but with struct instead of class

And the answer is…

IL_0000: ldc.i4.1
IL_0001: call void [System.Console]System.Console::WriteLine(bool)
IL_0006: ret

What???

Yes, now the compiler can be smart enough because T can’t be null – it is a value type – and it must be `MyInterface` because it is constrained, so we can just push 1 to the stack!

Summary

If we need to run code like the C# compiler will generate it (or like how the CLR will run it), we must know, not just how to write it in IL, but, which IL to write in each case and how to run this IL in each case.

The point that I wanted to make is, writing IL is one thing – you can copy/paste – but writing an IL interpreter, even in C# code, that is compatible with the spec is another thing because you need to know every corner of the spec.

Here is the spec for isinst and castclass if you are interested (or read it from the source).

isinst

castclass
This entry was posted in .NET, C#, Internals and tagged , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.