Returning strings from C functions

Over the years I’ve noticed that questions always come in groups.  When I worked at Applied Data Systems, when a support question came in we always went the extra mile in writing up an answer because invariably we’d get at least 2 more questions on the same topic within a week.


I still see the same thing today.  Last week a fried who is pretty new to C development asked me about returning a string from a function and I went through and explained the hows and whys.  Then today I see the same question in a newsgroup.  Rather than just provide a short answer, I figure I might as well head off the repeats that are sure to come up in the coming weeks by providing a reasonable blog entry on it so I can just post a link.


So, how do we return a string from a C or C++ function anyway?  Well the short answer is “you don’t.”  You can, but it’s rare that you ever should.  The two exceptions that come to mind are if you have an accessor that returns a globally allocated value, like a constant, or if the return is simply a copy or internal location of a parameter to the same function (strstr is an example).  Let’s look at the reason why this is the case.


Let’s create a method to get a string:



TCHAR *GetString()
{
}


Now the question here is memory ownership.  For the function to return some string value, a buffer must be allocated to hold that value.  So let’s add that:



TCHAR *GetString()
{
    TCHAR *value = (TCHAR*)LocalAlloc(LPTR, MAX_PATH);
    _tcscpy(value, _T(“Hello memory leak”));
    return value;
}


So now we allocate a buffer and return it.  This will work, fine but the caller then must know that after they use the string they must call LocalFree or they’ll have a memory leak.  Now if you’re tempted to say “yes, but I’ll remember that” or “yes, but it’s in an internal library that I’ll use LocalFree in and it will never change” then you haven’t been developing long.  Rule #1 is that code will *always* be changed, or copied into another project.  Rule #2 is that it is almost always someone else who will do it. Even if you’re the one that does it, believe me, you won’t remember this 2 years down the road until you’ve burned 2 weeks trying to find the memory leak that a high-profile customer is complaining about and that management has made priority 1 for the entire team.  It happens.  Do not be tempted to do this.  Ever.


Ok, so we all agree that returning a string is bad (nod your head – yes you agree).  So how do we do it?  Well we know that to prevent a leak, the caller needs to do the allocation, so can’t we just pass in the buffer as a pointer?  How about this?



void GetString(TCHAR *value)
{
    _tcscpy(value, _T(“Hello overrun”));
}


The value I used should be a clue as to what the problem here is.  Let’s look at a use case.



TCHAR *myValue = (TCHAR*)LocalAlloc(LPTR, MAX_PATH);
GetString(myValue);


Will this work?  Sure, in this exact case it will.  But what if GetString’s value is larger than MAX_PATH?  What if the caller allocated a smaller buffer, or didn’t allocate one at all?  Well you’ll get a buffer overrun.  If you’re lucky this will manifest as a first chance exception or something that blows up spectacularly and is easy to find.  If you’re not lucky (and if you’re against a tight deadline, you won’t be) it will cause unexpected and non-reproducible behavior that is a real bitch to debug (on a Palm I’ve seen this cause execution to jump right to another application with no warning – try debugging that). 


So this could be even worse than our original FUBAR (if you’re unfamiliar I’ll let you look it up) code.  So what’s the right way to do this?  Well a good indication is how the Win32 APIs work.  Look at something like RegQueryValue.  It takes a buffer pointer and a size. And if the function is really nice, it will tell me how big the buffer should be if the caller has it too small.  Let’s look at an example.



BOOL ReturnString3(TCHAR *myString, DWORD *size)
{
    DWORD requiredSize = 0;
    TCHAR *buffer = _T(“Hello nice function”);


    // determine how big the buffer must be
    requiredSize = _tcslen(buffer);


    if(IsBadWritePtr(myString, requiredSize)
    {
        *size = requiredSize;
        SetLastError(ERROR_INSUFFICIENT_BUFFER);
        return FALSE;
    }


    _tcscpy(myString, buffer);
    *size = _tcslen(myString);
    return TRUE;
}


Here you see that we first determine how big the buffer should be and then we check the incoming buffer against that number.  If it fails, we copy the required size to the size parameter, set a meaningful error, and return FALSE so the caller knows we failed (we can’t force them to read the return, but if they don’t that’s just poor practice).


If the buffer is big enough, we copy in the value, set the actual size (in case they want it for something) and return TRUE.


A use case would look like this:



TCHAR *getString = NULL;
DWORD size = 0;

// this is supposed to fail – it gets the size
ReturnString3(getString, &size);

// now allocate a buffer
getString = LocalAlloc(LPTR, size);

// and get the data
BOOL success = ReturnString3(getString, &size);


So now you’ve seen the good, the bad and the ugly on how to return a string froma C function.  I have no illusions that me posting this will stop people from asking – even if it ends up as the number 1 result for Google.  Let’s face it, you have to know enough to do the search for it to be of any use.  But hopefully, those who are in the industry because they like to learn and don’t like to write crap code can either find it or be pointed to it and make good use of it.

1 thought on “Returning strings from C functions”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s