Working around pre-SP3 CF 1.0 shut down hangs

So a couple of us at OpenNETCF have been heads down coding in a project for the last couple months.  It’s got some interesting aspects and I’ll be posting lessons learned over the coming weeks.  One of the biggest problems is that the deployment device for testing is the HP6315, which I’ll go on record as saying sucks.  Working around its flakiness is a real pain.  Another problem is the customer’s desire to run on the device out-of-the-box to minimize deployment.  That means running with the version of the CF in ROM (CF 1.0 SP2 in this case).

Well the last few bugs in the project’s bug database are that the app can’t restart once it’s been closed.  The first suspect is obviously that the app didn’t really shut down all the way, and the CF is preventing new instances from running.

First step is to repro.  It’s odd becasue we’ve been developing for a couple months and neitehr of us have seen this.  We run through the reported repro steps from the customer’s functional testing and can’t get it to fail.  The customer can repro it every time, and on multiple devices.  It then dawned on me to try a hard-reset, clean device.  Blam!  Sure enough, it failed.

So what’s different?  Well in development we use Studio, which pushes down the latest CF (SP3) the first time and then we forget about it.  SP3 fixed something in SP2.  Of course the requirement is to use what’s in ROM, so it’s time to debug a little.

Next step is to use Remote Process Viewer and confirm that the process is indeed still running before digging into the code.

Well RPC shows that the second instance did in fact launch – now we have 2 instances.  Further, tapping the icon starts more instances.  If you don’t understand the implications of this, let me explain a little.  The CF (on PPC devices) enforces singleton app behavior (a bad idea IMHO, but that’s a side topic).  Well if we’re getting multiple instances, then this “enforcement” is failing  The CF is not only not shutting the first down, but the second is being prevented from fully coming up (maybe the assembly is locked in some way?).

Well, added a little debug code and we find that Application.Run is exiting and Main is running out it’s course.  Again, this points at a zombie thread, but that doesn’t explain the multiple instance issue.

I spent a couple hours walking our code and a 3rd party library, trying to find a zombie thread or anything that might be doing this to no avail, and then it dawned on me – the app is closing.  We don’t need any resources, we just want to end.  Well, let’s just be heavy handed

I looked up ExitProcess in the platform headers, and it’s and inline that calls TerminateProcess with GetCurrentProcess for the handle.  Digging further leads you to find that GetCurrentProcess() is really just the constant 0x42 (on ARM anyway).  So this is what I ended up with (well trimmed down anyway):

static void Main()
    Main main = new Main();

    if((System.Environment.Version.Major == 1) && (System.Environment.Version.Build < 4292))
        // NOTE:
        // This is a heavy-handed workaround for a bug this app exposes in pre-SP3 CF 1.0 devices
        // For those interested, 0x42 is the expansion of GetCurrentProcess() from the platform SDK headers
        // and ExitProcess is simply an inline calling TerminateProcess with GetCurrentProcess(). 
        TerminateProcess(0x42, 0);

[DllImport(“coredll.dll”, SetLastError=true)]
private static extern int TerminateProcess(uint handle, int uExitCode);

Long live the kludge! 

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s