Speed of Managed Code – Act II

If you read much of what I write, you know that performance is something I like to always keep in mind, and I’m also a big fan of quantitative analysis.  It all started ages ago with a look at the real time capabilities of Windows CE, but most recently I’ve been looking at managed code.  This entry is the second in a series of looks at performance of managed code, specifically the .NET Compact Framework.  For the first entry, see this blog entry (which became this MSDN article).


Act II – Method calls are Expensive (or Stay in the Shallow End of the Stack!)


As with many of my diatribes into technical problems, this all started with a newsgroup post.  SOmeone posted that the had done a Towers of Hanoi implementation in VB.NET, C# and C++ and that the C# implementation was faster than VB, but got slower in CF 2.0.  They also said that native code was much faster than both (I think he may have said an order of magnitude, but don’t quote me on that).


Well he didn’t post any code so we could reproduce his results, and I never take anyone on their word on something like this, so I decided to put it to the test.  Before going any further, you may need some background information on what the Towers of Hanoi problem is.  You’re probably familiar with it – you just didn’t know what it was called.  Go read this Wikipedia entry and come back.


I decide to do a recursive solution, so the meat of the problem is that we have an exponential number of method calls, making the call stack very, very deep.  Just from a memory point of view this is a bad idea (see my MEDC presentation on memory management if you want to know more on the whys of that).  But you’ll also see that the expense of method calls in the CF (see Act I) really bites you here as it bites hard.


First, let’s look at the code.  In C#, it looks like this:


public class Hanoi
{
  private int[] pegs = new int[3];

  public Hanoi(int totalDisks)
  {
    // start with all disks on peg 0
    pegs[0] = totalDisks;
    pegs[1] = 0;
    pegs[2] = 0;
  }

  public int Solve()
  {
    int et = Environment.TickCount;
    Move(0, 2, 1, pegs[0]);
    return Environment.TickCount et;
  }

  private void Move(int fromPeg, int toPeg, int intermediatePeg, int disks)
  {
    if(disks == 0) return;

    // move all but one disk to the intermediate peg
    Move(fromPeg, intermediatePeg, toPeg, disks 1);

    // move the last remaining disk to the destination – no need for the intermediate
    pegs[fromPeg] -= 1;
    pegs[toPeg] += 1;

    // now move all but one off the intermediate peg to the destination peg
    Move(intermediatePeg, toPeg, fromPeg, disks 1);
  }
}


You can see that Move calls itself twice.  Now try tracing the code in your head if I call Solve when totalDisks is 30.  Ugly.


Now lets look at the C++ implementation.


class Hanoi
{
    private:
        int pegs[3];
        void Move(int fromPeg, int toPeg, int intermediatePeg, int disks);

    public:
        Hanoi(int totalDisks);
        int Solve();
};

Hanoi::Hanoi(int totalDisks)
{
    // start with all disks on peg 0
    pegs[0] = totalDisks;
    pegs[1] = 0;
    pegs[2] = 0;
}

void Hanoi::Move(int fromPeg, int toPeg, int intermediatePeg, int disks)
{
    if(disks == 0) return;

    // move all but one disk to the intermediate peg
    Move(fromPeg, intermediatePeg, toPeg, disks 1);

    // move the last remaining disk to the destination – no need for the intermediate
    pegs[fromPeg] -= 1;
    pegs[toPeg] += 1;

    // now move all but one off the intermediate peg to the destination peg
    Move(intermediatePeg, toPeg, fromPeg, disks 1);
}

int Hanoi::Solve()
{
    int et = GetTickCount();
    Move(0, 2, 1, pegs[0]);
    return GetTickCount() et;
}


You can see that it’s really not much different – in fact it’s really, really close.  So what do we see for results when we run these? Look at the table below (my device was an Axim X30 with a PXA270 processor)






































































disks C# – CF 2.0 C++ Diff % improvement
20 744 461 283 38%
21 1518 1148 370 24%
22 2997 2068 929 31%
23 5964 4006 1958 33%
24 11892 7646 4246 36%
25 23942 14875 9067 38%
26 47945 29423 18522 39%
27 95674 58496 37178 39%
28 190917 116837 74080 39%


The C++ version performed nearly 40% better.  Of course the C# JIT compiler is running on a mobile device, with presumably much less power than a desktop machine, so the JITter is built to optimize for compile speed, not execution speed.  To try to level the playing field, I compiled the C++ version in Debug mode to turn off all compiler optimizations.  I can’t actually see what the CF JITter creates for assembly, so we can’t be sure if the resulting code is the same, but that’s the nearest I can come.


So we know that the implementation code is near identical.  We also know from Act I that identical code in a single method has no performance difference between native and managed code.  We also know from Act I that method calls are quite expensive.  A reasonable conclusion then is that the performance degradation we’re seeing here is nearly all in the cost of method calls.  Does this mean that managed code performance is terrible?  The answer is “it can be if you don’t fully understand what the CLR is doing.” The lesson learned here then is to either refactor the algorithm to keep your call stack short (there are non-recursive solutions to this problem – maybe another day I’ll test that), or put heavily recursive stuff into a native library and P/Invoke to it.


If you want to try these out on your own device, you can download the source code here (post your results in the comments if you’d like).  The source actually points out another lesson, this time in UI development speed. The C# version has a nice UI that I put together in about 10 minutes.  Writing a nice UI would have taken quite a bit longer in C++ so I didn’t even bother. With the C++ version you have to get the results from by running in eVC or Studio and setting a breakpoint in the calling loop and writing down the number after each iteration.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s