Saturday, September 26, 2009

STL Tips (Maybe some Tricks Later)

Like most articles here, this one will start with a shallow contrivance to make a point. A collection of Foo objects, perhaps in an STL vector.

namespace
{
    std::vector<Foo *> fooCollection;
}
Now, let's say a Foo pointer can appear one or more times in fooCollection. Furthermore, let's also assume that when a Foo is destroyed, it should probably not exist in the collection anymore.

Foo::~Foo()
{
    std::vector<Foo *>::iterator i;
    for(i = fooCollection.begin(); i != fooCollection.end(); ++i)
    {
        Foo * target = *i;
        if(target == this)
        {
            fooCollection.erase(i);
        }
    }
}

Pretty straightforward -- except once the code is run, awful things will happen. Iterators aren't necessarily valid after the collection has been modified.

Foo::~Foo()
{
    std::vector<Foo *>::iterator i;
    for(i = fooCollection.begin(); i != fooCollection.end(); ++i)
    {
        Foo * target = *i;
        if(target == this)
        {
            fooCollection.erase(i);
            break;
        }
    }
}

Ok, so after Foo is found in the collection, erase it, then break. The program no longer explodes when it is run. Of course, it is also no longer correct. Remember, the collection could contain the same Foo pointer more than once!

Most programmers familiar with the STL have already been cringing at the first two examples. std::vector<T>::erase() returns an iterator to the next element. To write code that will work properly with a vector after it has been modified, this value needs to be used. Incrementing an iterator while modifying a container is bad mojo.

Foo::~Foo()
{
    std::vector<Foo *>::iterator i;
    for(i = fooCollection.begin(); i != fooCollection.end();)
    {
        Foo * target = *i;
        if(target == this)
        {
            i = fooCollection.erase(i);
        }
        else
        {
          ++i;
        }
    }
}

Alright, that's the right way to remove several elements from a vector from within a for loop. By now, STL antagonists are surely chuckling at the rest of the C++ pedants! "What a pain in the ass!". They would be right. That's probably the worst way to solve the problem.

There's hope! STL provides several algorighms that operate on containers (also known as modifying algorithms. We'll remove all of that icky destructor code, replace it with a single algorithm call and improve encapsulation altogether!

Foo::~Foo()
{
    foo.erase(std::remove(fooCollection.begin(), fooCollection.end(), this), fooCollection.end());
}

Tada! A 90% reduction in code (potentially a 90% reduction in bugs) and it does the right thing. But wait! There's more! Because the remove algorithm doesn't necssarily operate with vectors exclusively, fooCollection can be refactored more easily to be something other than a vector. If it were a list or something else, the Foo destructor would continue to work unchanged!

Using what that the STL has to offer can drastically improve code quality. In addition to generic containers, the standard also defines adaptors, function objects and other support algorithms that work well with containers. Using only adapters supplied with the STL, std::remove()'s functionality can also be expressed as:

Foo::~Foo()
{
    std::remove_if(fooCollection.begin(), fooCollection.end(), std::bind2nd(equal_to<Foo *>(), this));
}

In another article, I'll talk a bit about function objects (functors) and some other STL tricks to help separate container-specific code from general game programming and to be more expressive with algorithms.
In the mean time, here are some references to useful and interesting algorithms, adapters and functors, some are straightforward, some are mind-bending in the ways thay can be expressed and used:

Wednesday, September 23, 2009

Embrace the Standard Template Library

Once upon a time, while working for a Studio that has gone far, far away, a C++ programmer joined a team of industry veterans. While writing a new system from the ground up, the lead programmer noticed something odd about what was written. In the new code were oddities like std::map and std::vector. This would not do!

"Why aren't you using the container classes that are already used in the game?" asked the lead.

"Well, this new system needs to detect mutable operations on containers and contained elements. The existing containers aren't const-correct. They also have quite a few bugs," replied the new initiate.

"That's unlikely, that code has been in use for almost 10 years now. We've shipped several titles with that code!" countered the lead. "You should be using the existing containers instead. It's impossible to parse STL error output. It's too slow for video games. We have already standardized on our own containers. Besides, we've also purchased this other technology for over $300,000.00 and they also use their own container types."

Okay, this isn't a fable or a myth or fiction. It happens. It happens more often with old codgers like your's truly that are trying to be very pragmatic about their priorities. It is often better to go with what you know than the latest, shiny new thing if it means you'll have some predictability in the schedule. Sometimes it is better to hold your tongue and pick your battles. In this case, the lead correctly noted that:

  • The studio had several successes with the code already in use, and didn't want to risk introducing something unfamiliar to a team of (mostly) C programmers that just happened to be using C++
  • The rest of the code extensively uses home-rolled container types, so interfacing with other, standardized, container types could prove to be problematic
  • The team invested heavily in third-party technology to get a head start on development. This third-party tech used it's own container types

There are a number of problems with those arguments, however:

  • By investing in third-party technology that provided another container library, a conflict had already been introduced. The home-rolled code could not interface easily with this third party technology.
  • If the code was so stable, why was it, after ten years, there were still bugs in the containers? Because the programming team (few of whom had any responsibility for the container code) learned over time to work around problems by convention.
  • Dealing with compiler error output for code that uses only standard C++ and C++ standard libraries is hardly an excuse to avoid it. The output is usually very explicit about what the problem is, even if the compiler vendor's implementation of the code itself wouldn't win any beauty contests. Anyone that spends any time in standard C++ can grok compiler output, which usually also leads to better programming practices (like const-correct coding). It's no more difficult to parse than the first time a programmer encounters pointers and does something that upsets the compiler.
  • The performance characteristics of the STL are strictly defined in the standard. "Slow" is a myth. If the wrong algorithm is used to solve a problem, it will be slow. Most programmers have had enough school and/or experience to understand what O(N^2) means compared to O(N). The standard spells this out. If a programmer is going to use C++, getting to know the language, including its strengths, weaknesses and proper application is a requirement.
  • Using the standard C++ library would actually reduce the risk to the project. The STL started life in 1980 as part of ADA, was introduced to C++ in 1994 (long before standardization), and has been thoroughly vetted by a LOT of programmers involved in the standardization process. The vendor implementations have had good coverage since the 1990's and are reasonably bug-free at this point. A library that has almost 30 years of design, hundreds of thousands of implementations use it, and is a formal part of ISO/ANSI standards for the C++ language is a pretty safe bet for use on a project compared to a home-rolled solution that has had coverage by 3 programs.

Were these reason enough to bite the bullet and embrace the STL? Sometimes logic and politics don't mix. In this case, no. Sewing cynicism or fighting a team of C programmers to move into the 21st century (or at least the last 20 years) would have been counterproductive. Game development is a a creative process that requires optimism, motivated programmers and agility. In this case, the psychological risk of demoralizing the team outweighed the real technical benefits of re-using standardized code.

Oddly enough, the rationale for using the pre-existing code was "re-use". In reality, fear of the unknown risks and NIH (Not Invented Here) played a pretty big role.

If the team consists of programmers that are generally open to new ways of working, have a generally optimistic attitude about learning new techniques, and the initial learning curve (and resulting productivity boost) outweigh risks to the schedule, it is probably a good idea to embrace the STL. This is easier early in projects than later, as some pre-existing code may need to be refactored. The time eventually saved in last-minute bug-fixing and "optimizations" that limit functionality for the next generation title is almost always worth it!

Some useful references on the STL:

Sunday, September 6, 2009

Singletons, Static Classes, Namespaces?

Global Objects Are Evil (?)
It is a widely held opinion that global data in code is evil. This belief dates back to the days of C, and is no less true today for C++ programmers. Why is it evil? The standard input, output and error file descriptors are global. In fact, there are many examples of global data in the C runtime library and the standard C++ library (e.g. iostreams). If they are so awful, why are they first-class citizens in these languages? Maybe not all globals are evil, but practical experience of battle-hardened programmers shows that they are usually evil in application code.

So what are the perils of globals?

To start, the standards, implementations and platform details are too vague to support static global data in anything more than the most trivial applications. Video games are the furthest thing from trivial programs.
  • Object lifetime is difficult to manage. C and C++ have well defined behavior for static initialization and destruction through atexit(), but dynamically linked libraries and other application-specific code can unwittingly violate the complex interactions of several objects in an application. The last static Renderer destroyed might not be the one that was first created. It certainly won't have the same object/game state associate with it when it is accessed on its second incarnation!
  • Some platforms may actually produce more than a single instance of a static object depending on the context that references it. Windows platforms that delay-load DLLs is one (of many) examples where static objects may be instanced and initialized more than once.
  • Nothing in either the C or C++ standard can define the behavior of multiple threads attempting "first access" to static global data.

Globals + Project Schedules Break Encapsulation and Reusability.

Because games are such complex, expensive and time consuming endeavors, the realities of budgets and project timelines create even greater havoc for static global data. Most game studios ship more than a single title before evaporating. After a product has shipped, and as the new one ramps up, there is pressure to recover the time and effort spent creating technology for the previous product. The programmers involved with shipping the last title often champion the technology they have spent the last 1-3 years developing and encourage the new team to hit the ground running for the next game with their "proven" technology. For the most part, the bits that are tucked furthest away from the game code they produced is ripe for re-use. The untold part of the story are the global objects that were polluted with game-specific code for an E3 demo, or a milestone review, or during the last 90 days of the project where programmers on a death march just wanted to ship the damn thing and take some vacation.

In game development, if the deadline looms, and a shortcut is available to ship a successful title, it will be taken. Global data is the most convenient shortcut to get the most important job done. Writing proper scaffolding between a Renderer and a GoblinObjectManager could take days when a programmer is left only with a few critical hours to ensure the success or failure of the project.

When the next team picks up the code and tries to use a design (that has nothing to do with Goblins or GoblinObjects), they find the global rendering object has been polluted with piles of code that has less to do with rendering in general, but everything to do with shipping the previous title and thus ensuring the studio still exists to employ them. This was the right decision to make given the circumstances and code they had to work with! Without the business requirements to meet, there would be nothing to sell and no reason to write the code in the first place! Those programmers that got the job done are the guys that ensured everyone else has a job to do for the next project. Always remember that "end of project" priorities are nothing like "beginning of project" priorities. They are often at oposite ends of the programming spectrum.

The new team, however, faces months of refactoring code and extricating the E3 demo garbage and last minute additions before they can really get started on finding the "fun" in their title. It would be cleaner and easier to simply start from scratch, right? If you are a game developer and can name this tune, raise your hand now!

C programmers in other disciplines already arrived at this conclusion on their own projects. C++ game developers that have lived through several product life-cycles have experienced this first hand. This is why there is a wide-spread opinion that globals are evil.
  1. The standard doesn't sufficiently address common usage of global data in complex applications.
  2. It is very difficult to manage initialization in multithreaded environments.
  3. Object lifetime is to easy to violate producing unexpected results.
  4. Globals make it too easy to break encapsulation, thus wasting time investments spanning years and preventing code re-use (the mantra of good OO design).

Coding "by convention" can solve one ore two of these problems. Convention never solves all of these problems, even on the most disciplined technical teams. It is always better to use features of the language to communicate the intent of technical design. Documents and convetions rarely solve the problems sufficiently.

If it can be agreed that the use of global static data is almost always a bad idea, how can single-instanced, global access to important application data be achieved? That is, afterall, why anyone would even ask whether a Static Class, Singleton, or Namespace would be appropriate to solve a problem, right?

Singletons and Static Classes are global data. They exhibit the very problems just discussed (and many more). If readers disagree, comment on the post and we can explore it further in future treatments :) If it is agreed that static classes and singletons represent global data, read on to the next section.

Just Don't Do It
C and C++ programmers worth their salt will first look for solutions that don't require globals at all. Many window toolkits and other complex systems simply pass contexts around and use parent-child relationships to form a well defined, acyclic dependency graph of objects. These systems demonstrate clean design and lend themselves to re-use more easily than those that lean on the crutch of access to global data. In C, it was often considered good design to pass opaque pointers to context structures through client code. While this may have bloated intermediate interfaces, the coupling between context data and higher level systems was properly communicated.

One problem (of many) with this approach is that any system built around passing a monolithic context could not easily be re-used or re-factored if the context were broken into smaller component parts. If, for example, a Renderer were to separate the notion of drawing meshes and defining camera space, access to the camera space could not be separated while still supporting the older, monolithic code, even if most of the code never bothered with the camera.

An Extendable, Global Interface (Or Do as the Runtime Does)
There are some solutions to the multi-threading problems of globals (double-locked singleton initialization). There are some problems solved to deal with broken encapsulation (just don't use globals). There are solutions to object lifetime issues (explicit Install and Remove methods for globals). What is really needed is an approach that doesn't expose global data, can manage data initialization through an interface, do it in a thread safe way, and allow game programmers to extend functionality without polluting the core interface or implementation!

C++ introduced Namepsaces, which a lot of old-school C programmers regarded as a "hack". Namespaces, however, provide the path to the solution to all of these problems.

Static classes and Singletons are static globals. I would hope it has already been demonstrated that static global data is the choice of last resort. Even if the initialization, lifetime and threading technical problems are solved in a static or singleton implementation, the fact remains that they break encapsulation in practice when they are polluted with application-specific code required to ship the product. Even the most conscientious programmers have no alternative, because static classes and singletons do not provide partial closure semantics -- e.g. extending the base functionality through new client code that is separate from the original implementation.

The brilliance of namespaces in C++ is that the can provide special closures.
Renderer.h
namespace Renderer
{
    void drawMesh(const Mesh & mesh);
}
GoblinRenderer.h
namespace Renderer
{
    void drawGoblin(const Goblin & goblin);
}

The Renderer interface has been extended deep within the game, and available only to the game to provide a rendering interface that makes sense for the game. The core Renderer interface escapes un-polluted with the game-specific rendering requirements. The renderer has been extended beyond it's original interface closure into the game. When the team evaluates the renderer for re-use apart from the game, the core code is not decorated with game-specific interfaces to support implementations required to ship the product!

Namespaces aren't a panacea. Programmers need to remember that the basic technical complications with object lifetime and multithreading issues need to be solved for ALL globally accessible objects, even if they are accessed through a namespace.

If a namespace is used, it is preferable to solve the object lifetime problem first, using Namespace::Install() and Namespace::Remove() explicitly in the code, usually after main() has been reached in execution. In the Install() and Remove() namespaced functions, the double-locked checks for dealing with global data should be honored. These two approaches solve the technical issues surrounding the "singleton problem". The fact that they are implemented in terms of a namespace instead of a monolithic object address the very real, pragmatic issues facing game developers on every project.

If the code requires global access to something, first attempt to design it in terms of a namespace. If that is not possible, abandon all hope!