From C to Shining C++

I’ve had some extra time to do a bit of code cleanup at work, and I decided a good refactoring of string usage was way overdue. Although this code has been C++ for a long time, the original authors were either allergic to the STL or at least uncomfortable with it and hardly ever used it anywhere; this was originally written a good 15-20 years ago, so maybe it was still just too ‘new’ to them.

In any case, it’s still riddled with old C-style strings. They aren’t inherently bad, but it does tend to lead to sloppy code like:

char tempMsg[256];
setMsgHeader(tempMsg);
sprintf(tempMsg + strlen(tempMsg),
        "blah blah (%s) blah...", foo);

That’s potentially unsafe, so it really should use snprintf instead:

char tempMsg[256];
setMsgHeader(tempMsg);
snprintf(tempMsg + strlen(tempMsg),
         sizeof(tempMsg) - strlen(tempMsg),
         "blah blah (%s) blah...", foo);

Ugh, that’s ugly. Even after making it safer, it’s still imposing limits and using awkward idioms prone to error (quick, is there a fencepost error in the strlen calculations?). So I’ve been trying to convert a lot of this code to use std::string instead, like:

std::string tempMsg = getMsgHeader();
tempMsg += strprintf("blah blah (%s) blah...", foo); 

which I feel is far clearer, and is also length-safe (strprintf is my own helper function which acts like sprintf but returns an std::string).

It’s usually fairly straightforward to take a C string and convert it to the appropriate C++ equivalent. The main difficulty in a codebase like this is that strings don’t exist in isolation. Strings interact with other strings, get passed to functions, and get returned by functions, so anytime you change a string, it sets off a chain reaction of other things that need to be changed or at least accommodated. The string you just changed to an std::string is also an output parameter of the function? Welp, now the calling function needs to be converted to use std::string. Going to use .c_str() to call a function that takes plain C strings? Oops, it takes just ‘char *’ and now you gotta make it const-correct…

To try and keep things manageable, I’ve come up with the following guidelines for my conversion:

Don’t try and convert every string at once; triage them.

A lot of strings will be short and set once, used, and then discarded, and they’re not really so much of a risk. Instead, focus on the more complex strings that are dynamically assembled, and those that are returned to a caller. These are the cases where you really need to be wary of length limits with C strings, and you’ll gain a lot of safety by switching to std::string.

Functions that take an existing ‘const char *’ string pointer and only work with that can be considered lower-priority. Since they’re not altering the string, they’re not going to be the source of any harm, and changing them could risk introducing regressions.

Use wrapper functions to ‘autoconvert’ strings.

If you have functions that already take ‘const char *’ strings as parameters and work perfectly fine, you don’t necessarily need to convert the whole thing right away, but it can help to have a wrapper function that takes an std::string and converts it, so that you can use std::strings in the callers.

void MyFunc(const char *str)
{
    ...
}

void MyFunc(const std::string& str)
{
    MyFunc(str.c_str());
}

Now functions that call MyFunc can be converted to use std::string internally and still call MyFunc without having to convert MyFunc as well, or having to pepper the caller with .c_str() calls.

This can get tricky if you have functions that take multiple string parameters and you still want to be able to mix C-style and std::string as parameters. If it’s only a couple of parameters you can just write all possible combinations as separate wrappers, but beyond that it gets unwieldy. In that case you could use template functions and overloading to convert each parameter.

void MyFunc(const char *foo1, const char *foo2, const char *foo3, const char *foo4)
{
    printf("%s %s %s %s\n", foo1, foo2, foo3, foo4);
}

const char *ToC(const char *x) { return x; }
const char *ToC(const std::string& x) { return x.c_str(); }

template<typename S1, typename S2, typename S3, typename S4>
void MyFunc(S1 foo1, S2 foo2, S3 foo3, S4 foo4)
{
    MyFunc(ToC(foo1), ToC(foo2), ToC(foo3), ToC(foo4));
}

int main()
{
    MyFunc("foo", "bar", "baz", "blah");
    MyFunc("aaa", std::string("fdsfsd"), "1234", "zzzz");
    MyFunc(std::string("111"), "222", std::string("333"), std::string("444"));
    return 0;
}

It’s still kind of awkward since it needs helper overload functions polluting the namespace (I’d prefer lambdas or something nested function-ish, but I can’t see a good way to do it that isn’t ugly and difficult to reuse), but it avoids having to write 2^N separate wrappers.

Don’t get too fancy while converting

std::string will let you be more flexible when it comes to string manipulation, and you’ll probably no longer need some temporary string buffers, some string assembly steps can be combined, sanity checks can be removed, other string operations can be done more efficiently in a different way, etc. But if you try and make these changes at the same time you’re converting the string type, you could risk messing up the logic in a non-obvious way.

Make the first pass at string conversion just a straight mechanical one-to-one conversion of equivalent functionality. Once that’s done and working, cleaning up and optimizing the logic can be done in a second pass, where it’ll be clearer if the changes maintain the intent of the code without being polluted by all the type change details as well.

There are some other caveats to watch out for, too:

Beware of temporaries.

Wherever you still need C-style strings, remember that a returned .c_str() pointer only remains valid as long as the std::string still exists and is unchanged, so beware of using it on temporary strings.

const char *s = (boost::format("%s: %s") % str1 % str2).str().c_str();
SomeFunc(s); // WRONG, 's' is no longer valid

However, the following would be safe, since the temporary string still exists until the whole statement is finished, including the function call:

SomeFunc((boost::format("%s: %s") % str1 % str2).str().c_str());

Of course, most people already knew this, but the odds of inadvertently introducing an error like this goes way up when you’re changing a whole bunch of strings at once, so it’s worth keeping in mind.

Variadic functions are annoying

Unfortunately, ye olde variadic functions don’t work well with std::string. It’s most often used for printf-style functions, but an std::string doesn’t automatically convert to a C-style string when passed as a variadic parameter, so you have to remember to use .c_str() whenever passing in an std::string.

There are non-variadic alternatives like boost::format that you might want to prefer over printf-style formatters, but if you’re stuck with printf-style functions for now, make sure you enable compiler warnings about mismatched printf parameters and set function attributes like GCC’s __attribute__ ((format …)) on your own printf-style functions.

If you have boost::format available, and a C++17 compiler, but don’t want to convert a bajillion printf-style parameter lists, you could use a wrapper like this with variadic templates and fold expressions:

template<class... Args>
std::string strprintf(const char *format, Args&&... args)
{
    return (boost::format(format) % ... % std::forward<Args>(args)).str();
}

With this you can then safely pass in either C-style or std::string values freely. Unfortunately then you can’t use printf-format-checking warnings, but boost::format will be able to do a lot more type deduction anyway.

Performance is a concern…or is it?

C-style strings have pretty minimal overhead, and with std::strings you’ve now got extra object memory overhead, reallocations, more temporary objects, generated template functions, etc. that might cause some performance loss. Sticking with C-style strings might be better if you’re concerned about performance.

But…with modern optimizers and move constructors, return value copy elision, small string optimizations in the runtime, etc., the performance penalty might not be as bad as you think. Unless these strings are known to be performance-critical, I think the safety and usability value of std::strings still outweighs any slight performance loss. In the end, only profiling can really tell you how bad it is and it’s up to you to decide.

Leave a Reply

Your email address will not be published. Required fields are marked *