Friday, January 22, 2010

A Brief Refresher on Member Initialization

So, I forgot something that is maybe a little important, especially when it comes to multithreaded code.
struct MakeThreads
{
    MakeThreads() :
    _thread(threadFunc, this)
    , _done(false)
    {
    }
    ~MakeThreads()
    {
        _done = true;
        _thread.join();
    }

    void threadFunc()
    {
        while(!_done)
        {
            // do stuff
        }
    }

    Thread  _thread;
    bool    _done;
}

MakeThread letsScrewUp;

Now, my old, single-threaded, C self wants to put the smaller members later in the struct/class declaration to ensure data packing is efficient. My new, multi-threaded C++ self hasn't formed proper habits to prevent this sort of bug. How I wish I had lint here to continue to wrap my knuckles and improve my coding habits!

Just a reminder, the member initializer list in the constructor does NOT dictate the order in which members are initialized with many C++ compilers. As Scott Meyers (and g++ with proper settings) warns us: the member initializer list should always match the order of member declaration in a class!

I made this mistake recently. For those that haven't already caught it, _done is initialized AFTER the thread is initialized. The thread queries _done (which may or may not be initialized at this point.)

This is one of those scenarios where the code does what you expect most of the time but rarely all of the time!

Like any language, some of the finer points of C++ are easy to forget unless you find opportunities to practice them often. For those itches that aren't scratched on your day job, I highly recommend busting your own chops in your spare time for remedial work :)

Notes on Scaling a Multiplayer Game Server

Well, lets start with some basics. The Client/Server model (for video games) consists of a game server, which is authoritative for some or all of the game state. The clients produce messages from player input, and consume state updates as dictated by the server. The client side, as far as the network is concerned, is pretty straightforward:

  • Connect to the server
  • Handle incoming messages to update the world
  • Send local player input to the server
  • Clean up when the server disconnects

There are a few things that can be done to streamline the client network code, such as putting Network I/O in a separate thread. This isn't done to make the game render faster, but to prevent the network from timing out when the client is busy loading some massive data set. The game client will never spend significant memory or CPU time dealing with the network and its messages. Programmers can play it pretty fast and loose in this regard. If the client can pass packets around, the job pretty much consists of keeping the game state consistent with what the server is telling the client.

For servers that don't have to scale beyond a handful of connections, the same paradigms hold true. Pumping the network and tossing packets isn't much work compared to the heavy lifting the game simulation has to do. There is one problem that frequently plagues the server: O(n^2) (order 'n' squared, for those not familiar with Big-O notation) operations. Every FPS I've worked on was inherently O(n^2) in its messaging. Basically, this happens when every client causes some update that generates traffic to every other client connected to the server.

In the great tradition of examples contrived and simplified to illustrate a point, let's assume that game clients are authoritative for player position. To further complicate the problem (and the math), we'll throw the update rate into the mix. so further assume that they update the network every 50ms (20Hz, twice as fast as the original Quake).

  • Each client sends 20 position updates each second.
  • A server with 2 clients will send 40 updates each second. 20 updates from Client 1, sends 20 update messages to client 2. Client 2 sends 20 update messages to client 1.
  • A server with 3 clients will send 120 updates each second. 20 updates from Client 1 will trigger a total of 40 update messages to client 2 and 3. 20 Updates from client 2 will trigger 40 update messages to clients 1 and 3. Client 3 will trigger 40 updates to clients 1 and 2.
  • A server with 4 clients will send 480 updates each second. 20 updates from client 1 will trigger 120 update messages to clients 2, 3 and 4. (etc...)

There seems to be a pattern here. With only 4 clients, running at 20Hz, the server needs to toss packets at a 2ms interval. This is something most hardware can handle, but game servers need to handle dozens, hundreds or thousands of players. Oh, and it needs to also run a game simulation. This model doesn't hold up well in the face of those numbers.

Most game programmers are painfully aware of these scaling issues. They employ a number of techniques to reign in the traffic requirements. Reducing the update frequency to other clients based on distance is one (of many) ways to affect the scalability equation. MMO's spread the load across dozens of servers in a cluster that consists of a single game "shard". Since this article is about scaling, and about the performance of network code, it won't focus on those techniques, but instead look at how scale affects overall performance of net code.

Scale comes in too many different flavors. Web servers need to deal with thousands of concurrent, isolated, short-lived connections. Chat servers handle thousands of concurrent, long-lived point-to-point connections. MMO servers must support hundreds or thousands of long-lived, point-to-multipoint connections. FPS servers deal with dozens of long-lived, point-to-all-point connections. The first common-sense reaction to scaling issues for a new server programmer is "well, Google manages to handle millions of clients with no problem, we'll use Web technology since this is already a solved problem."

The problems for game servers are primarily matters of pushing state updates at a rate that is proportional to the number of players that cause the updates, and the number of players that must receive those updates. Game servers are nothing like web servers, unless the game is designed to treat players as disconnected entities that have no affect on the state of the world that the rest of the players participate in.

Consider Iron Forge in World of Warcraft. At any point in time, there are hundreds of players nearby. It is one of the worst performing scenarios in multiplayer gaming. Everyone is running around in close quarters. In MMO parlance, it's a flashpoint. What is the server network code doing?

  1. The server receives a position update from a player.
  2. The server determines that 165 players in the immediate vicinity need to receive that update.
  3. Server sends 165 net messages. (the other 165 players are ALSO running around, creating messages. Do the math again, there are thousands and thousands of messages required to keep this state consistent for the game clients!).

Ok, code time:

void Connection::send(void * data, size_t length)
{
    // blocks while the OS copies the data to a net buffer. 
    // If the kernel buffer is full, blocks the entire time the 
    // remote is acknowledging it accepted data from the other 
    // 164 messages it was sent
    _socket->send(data, length); 
}

That's what client code on the server might do. An evolution of this model may want to avoid the potential blocking the kernel may do while its send buffers are full.

void Connection::send(void * data, size_t length)
{
    // don't block the ENTIRE game sim for one slow-assed client
    // that isn't emptying the kernel socket buffer fast enough
    // assume _sendQueue is thread safe
    _sendQueue.push_back(new Message(data, length));
}

// in write thread, grab messages off the queue
void Connection::networkThread()
{
    while(_connected)
    {
        Message * m = _sendQueue.pop_front(); // locks queue, removes front element
        _socket->send(m->data(), m->length());
        delete m;
    }
}

That's an improvement. Many client programmers will tell you that after they have optimized the snot out of their bleeding edge rendering system they had to start looking at allocations and moving memory around.

Allocating memory isn't doing work, it's making room to do work. Moving memory around isn't doing work, it's putting it someplace convenient to do work later. In our scenario, tossing 30,000 packets per second means there is a lot of work to do. Making 30,000 allocations per second and 30,000 deep copies per second will soon show up on the profile of an active server (though it will NEVER show up on the profile of a pre-production server that never has more than 10 or 20 people connected). Lesson to learn here: Play-testing for months with a few dozen users will never prepare code for what happens when thousands of users start beating the snot out of it.

One more word about allocations: the biggest risk for a server that needs to scale is memory fragmentation. Allocating and freeing tens of thousands of variable length buffers each second wreaks havoc on a server. It's not uncommon for a server to fall on its face because it cannot allocate memory. This can happen before it stalls trying to send/receive/move packets. I wouldn't bet on that race, but it is something that kept me awake some nights while trying to find a good solution. An allocation failure can happen when the server has 1GB of free memory for the process, but doesn't have 1k of free CONTIGUOUS memory to give the application when it needs it!

Reducing the allocations (that don't do work) and deep memory copies (that also don't do work) provides the greatest improvement for network code. Reducing the number of messages sent is the job of game code and game design. Network code can't fix an insane design, but it can try to accommodate a reasonable one so it can scale well.

void Game::updatePlayersNearby()
{
    MessageBuffer & buffer = _connectionManager::getBuffer();
    buffer << _newStateDeltas;
    _outstandingMessages.insert(&buffer);
    foreach(Connection * player, _nearbyPlayers)
    {
        player->send(buffer);
    }
}

void Game::updateComplete(MessageBuffer & buffer)
{
    if(std::find(_outstandingMessages.begin(); _outstandingMessages.end(), &buffer) != _outstandingMessages.end())
    {
        _connectionManager.freeBuffer(buffer);
    }
}

There are a LOT of assumptions made in that code that I hope are obvious. The principle point is that a server should multicast or broadcast to multiple connections without allocating new memory and without copying that memory for each connection. A single, pre-allocated message buffer is requested. That same buffer is shared for sending to ALL connections that need the data, and that buffer is pegged until they are all finished. The load on memory and the CPU for non-work is eliminated.

This isn't a panacea for scaling multiplayer game servers. There are MANY other issues involved with scaling a server well. In my personal experience, these are the issues most relevant to scaling the low level network code itself. This helps to address some of the problems that are most often experienced with game server technology. Always consider the traffic characteristics:

  • How many simultaneous connections will there be?
  • What are network side-effects of a connection sending a message to the server? (Send 1 or send N packets?)
  • How long will the connections last?

Take those few questions into consideration, and also think about how much non-work the server should sacrifice in the interests of performance. Sometimes non-work is a time sync and contributes to lag and overall scalability of the server.

Lastly, not all multiplayer games are games that need to scale. There's no need to go overboard with Overlapped I/O, shared buffers and other paradigms that complicate game code if the game code doesn't need to accommodate the scaling techniques these technologies are designed to solve.

Until next time ...

Have Fun!