Wednesday, August 26, 2009

Const and Containment, or Do as the Vectors Do

What does it mean to be const-correct for an accessor? It's not uncommon to see an accessor method declared const that returns a mutable object stored in a member container:
class World
{
public:
    GameObject * getGameObjectAt(int index) const
    {
        return m_gameObjects[index];
    }
private:
    std::vector  m_gameObjects;
};
For the duration of the call to getGameObjectAt(), the class remains bit-wise const. Nothing is actually changed in the call. The same rationale is applied to the container, returning the element doesn't actually change the container for the lifetime of the call. The caller is getting a copy of the pointer in the container. The copy, however, is a non-const pointer, meaning the value pointed to can be changed! This can't happen with iterators or references.

What happens if GameObject contains methods that mutate World?
class GameObject
{
public:
    void doCombat()
    {
        // ... do combat-y things
        if(target->isDead())
        {
            m_world->removeGameObject(target);
        }
    }
};
This brief, contrived example keeps it pretty clear to a programmer that getting a GameObject from World and invoking doCombat() may change the World object and its m_gameObjects container. In the real-world, game code is rarely so brief or simple. Dozens (or more) programmers are working with volumes of code with very complex and subtle interactions.

A pretty common response from some programmers is "don't do that". This isn't helpful if the author of the code isn't around to berate the offending programmer for shooting himself in the foot. A good API will communicate "don't do that" through the interface and let the compiler yell at people for doing the wrong thing.

Taking the extra step to cross the line between bit-wise const-correct code and logically const code is how this is accomplished. Do as the vectors do when accessing aggregate data stored in containers. Use const overloading for accessors.
template
class Container
{
public:
    const Element & operator[] (size_t index) const;
    Element & operator[] (size_t index);
};
A logically const World class would look like:
class World
{
public:
    const GameObject * getGameObjectAt(int index) const
    {
        return m_gameObjects[index];
    }
    GameObject * getGameObjectAt(int index)
    {
        return m_gameObjects[index];
    }
private:
    std::vector  m_gameObjects;
};
While providing a const and non-const interface seems redundant and verbose, it does enforce the World class designer's intent that users of World not change it in a const context. If they try it, the compiler will effectively say "don't do that", and mis-use becomes more difficult (though not impossible with const_cast or C-Style casting). At that point, however, the user of a const World object has to jump through some hoops and explicitly say "I know what I am doing here, what I am doing is safe and correct!"

Const-correct code is safer code, but don't stop at bit-wise const to pacify a squawking compiler, take it all the way to logically const correct code to turn the pesky compiler on programmers about to do something unsafe.

Sunday, August 23, 2009

Heisenbugs, or Why const-correctness is more than pedantic esthetics

Heisenbugs, or Why const-correctness is more than pedantic esthetics

I recall, on more than one project, making the mistake of writing a const-correct serialization system, only to find that 90% of the code in the game wasn't const-correct already. It can be a long trip down that rabbit hole.

For one title in particular, the game needed to save world state. The "World" code was ported from a previous title, and worked pretty well for the game in progress.

void Game::save(const World & w) const
{
    FileStream fs;
    
    const Player * p = w.getPlayer();
    if(p)
    {
        fs << *p;
    }

    int objectCount = w.getObjectCount();
    for(i = 0; i < objectCount; ++i)
    {
        GameObject * o = w.getObjectAt(i);
        if(o)
        {
            fs << *o;
        }
    }

    // etc...
}

It seemed like straightforward logic that shouldn't take long to implement and test. Of course, it didn't compile. I'll contrive some of the World interface code to protect the guilty:
class World
{
public:
    Player * getPlayer()
    {
        return m_player;
    }

    GameObject * getObjectAt(int index)
    {
        return m_objects[index];
    }

private:
    Player * m_player;
    CustomFastArray m_objects;
};

Well, the problem here is obvious, Game::save() is passed a const reference to a World object, but the accessors for Player and Objects do not provide a const guarantee that retrieving them won't change the state of the World object. Decorating the accessors seems like a good place to start.
class World
{
public:
    const Player * getPlayer() const
    {
        return m_player;
    }

    const GameObject * getObjectAt(int index) const
    {
        return m_objects[index];
    }

private:
    Player * m_player;
    CustomFastArray m_objects;
};
After this change, all hell breaks loose in the compiler output. About 25% of Player and 5% of GameObject implement const correct accessors that are used in the << and >> operators for those types. Another 30% of the code use those accessors to actually change the objects returned. What started as a 2 or 3 hour task has balooned into 2 or 3 days of work correcting all of the code. The team lead makes an executive decision, "Yes, it's nice to have const correct code, but so much of what is in place already is incorrect that it requires too much time to fix."

Time is something there is never enough of on the schedule when making a game. Proponents of writing const-correct code are lambasted as pedants that want const for const's sake. There may be some truth to that, but the decision was made to simply pile on more const-incorrect code to a system that was already "too big to fix".

The day of reckoning arrives when QA finds an obscure save/load bug. They would play for a little while, save the game, play some more, the load the saved game. The bug was that the loaded game wasn't the same as the game that was saved. After 4 days and a trail of tracking down several layers of cause-and-effect that is too long to describe in this article, the problem turned out that the World object was a Heisenberg freak-show. The act of observing the world, namely Game::save(), was changing it!

That project never did receive const-correct treatment, and there were more Heisenbugs in the system. Overall, the cost to track down the bugs was greater than the cost to simply refactor the existing code and use the compiler to eliminate a whole class bug that is difficult to find and fix.

Const-correctness is more than esthetics for computer science wankers. If methods are written with strong const guarantees, new code that *does* try to mutate an object won't survive past the first attempt to compile it. The class design is preserved not by convention, but by the rules of the language.

The lesson to be learned here is: it always takes less time to fix code to let the compiler find bugs than it does to hope problems are caught at runtime during QA, or worse, after the game ships!

Friday, August 21, 2009

Prefer Pass-by-Reference to Passing Naked Pointers as Parameters

While bug fixing, I often run across innocent looking code like:
int Warrior::getDamageAmount(const Weapon * w) const
{
    return w->getDamageAmount() + m_damageMultiplier;
}

It's functional, and a lot of game code looks similar to this snippet. A straightforward update loop might look something like:
void Player::update()
{
    Weapon * w = NULL;
    // lots and lots of code for weapon selection follows
    // ...
    // ...

    // m_playerClass may be Warrior
    int damage = m_playerClass->getDamageAmount(w);
}

Now, at some point, someone re-factors the code a little, and a seldom used path leaves w = NULL. The game crashes in code written by the author of class Warrior::getDamageAmount() when the NULL pointer is dereferenced to get the method. The Warrior class author sees the problem (NULL was passed) fixes the offending code the caller wrote and adds an assert since there's no valid thing to do when a NULL parameter is passed.
int Warrior::getDamageAmount(const Weapon * w) const
{
    assert(w != NULL);
    return w->getDamageAmount() + m_damageMultiplier;
}

While this is perfectly legal code, it could still crash in release builds. One solution (and one that safe, slower game code employs) is to check the parameters before using them and trying to do something reasonable.

int Warrior::getDamageAmount(const Weapon * w) const
{
    int result = -1;
    assert(w != NULL);
    if(w != NULL)
        result w->getDamageAmount() + m_damageMultiplier;
    return result;
}

The return value here probably isn’t reasonable. It’s a compromise: don’t crash, but live with unexpected behavior that may or may not crash the game later. Applying -1 damage effectively turns a Warrior from a damage class to a healing class! “Why is my BFG 9000 healing when it’s fired?!”

There may be a better way to communicate to the caller what’s expected when they use getDamageAmount(). If the Weapon parameter is a reference instead of a pointer, it’s implied that it can’t be NULL. If it *never* makes sense to pass NULL to a method, a parameter passed by reference is the better choice.
int Weapon::getDamageAmount(const Weapon & w) const
{
    return w.getDamageAmount() + m_damageMultiplier;
}

Now when it’s called as:
m_playerClass->getDamageAmount(*w);

and a caller attempts to dereference a NULL pointer, the crash will happen at the call site, not inside Warrior::getDamageAmount(). In other words, the programmer that made the bug immediately understands that they’ve done something inappropriate.

It’s much easier to accidentally pass a NULL pointer to a method/function that accepts a pointer. It implies that NULL is OK and may even be part of the expected behavior.

Programmers shouldn’t be expected to understand the inner workings of someone else’s code. In some cases, that implementation is opaque because the source might not be available. Most programmers *do* realize that dereferencing a NULL pointer is bad, and will do the right thing before writing incorrect code.

Using pass-by-reference provides the same performance as the first example that passes an unchecked pointer along to the method, but also provides the safety that slower assert/if/conditional checks provide in naked pointer-based code. Often, there is no right way to handle NULL and the program can't reliably recover in those cases.

If it never makes sense to have a NULL pointer as a parameter to a method, then don't use a pointer parameter, use a reference parameter instead.