Saturday, August 1, 2015

"T"-shaped developers

What is a T-Shaped Developer?

The "T"-shaped developer is an agile term for a maker that has very broad knowledge and can help a team to get stuff done regardless of the task, yet has a very deep knowledge in a particular discipline, Non-agile projects and companies tend to pigeon-hole programmers and reward them on the depth of their knowledge of a particular subject. 

This creates fragility in an organization. When "that one programmer that knows that one thing and performs that one function" is no longer available, the project is stuck and cannot move forward. An agile company is filled with lots of T-shaped generalists than can work anywhere to get stuff done.

This is not a "Jack of all trades, master of none" it is a "Jack of all trades and master of one".

I was recently hired into a team as the "back-end guy" to do the "easy" work of building a business and service platform for a game. As a T-shaped developer, I had the opportunity to build an audio engine and provide music and sound to the game heading into the green-light for the game. I also took on UI and systems work for the game. There were not enough UI and Audio experts available to finish features, so I was quite happy to step in and do the work. If I only could contribute as "the back-end guy" on the team, I would have been twiddling my thumbs or worse. writing a ton of code apart from the game as busy work that might not fit with the vision for the game.

The Generalist Programmer role does not mesh well with production and administrative management, however. A generalist does not slot into a hiring plan or career model. They contribute everywhere and it is hard to judge their progress because they span such a broad spectrum of work required to ship a game. This is a shame, because promoting the generalist, T-Shaped developer unlocks and removes blocking tasks on a project.

As a Technical Director, engineering advocate and hiring manager, I desperately sought and promoted the generalist, much to the consternation of the rest of the organization. Programmers that wanted to switch and expand their roles are often treated as risks -- because in the short-term for a schedule, they might be less efficient and lack the experience and wisdom to get it right the first time. Taking the long view, moving programmers around meant the organization as a whole becomes much stronger. Those risks are easily mitigated by pairing neophytes up with the wizened grey-beards and running training courses. Sure, you lose an hour or two each week in training, but the project gains another expert that can help the business adapt to constantly changing requirements.

The T-Shaped developer on a team provides more value than can be quantified. They might do artwork, game AI, networking, build shaders, optimize a rendering pipeline or any number of specialized tasks that simply would never be completed when human beings are treated as specialists that can only perform a single function.

When building a Scum team, or a new business, go for the T-Shaped developers first. They can solve the surprise changes and complexity of creating something new when requirements drive developers out of their "expert" zone.

Friday, March 7, 2014

I Do not have Time to Write Tests!

On more than one occasion during conversations around automating tests, I have heard a programmer say "I don't have time to write tests!"

Why?

1) Any programmer that submits code without compiling the source file, please leave a comment.


2) Okay, of the programmers that at least compile their source, do they build the entire project locally? If not, leave a comment.
It is ok. You aren't part of the team until you break the build for everyone...

3) If you are building for more than one platform, do you compile and link on all platforms before submitting a change? If not, leave a comment.
Screw those guys doing the PS3 port. They are jerks!

4) Of the programmers that compile, link and build for all platforms, how many run their changes under a debugger to prove (at least to themselves) it works? (Leave a comment)

5) Of the programmers that compile, link, build for all platforms and test locally, have found that their features have been wrecked by changes from other developers (artists, designers, programmers)? Comments should be full of this.

To illustrate the point, every project I have worked on has seen violations of all 5 scenarios. I will confess that I am also guilty of some or all of them. It happens and we learn :)

How much time, as the complexity of the project increases each iteration, is spent dealing with feature regression, technical debt and system fragility?

When a live service, earning millions of dollars each iteration, goes tits over an untested feature, leads to a panicked rush to fix it now now now!!! and then leads to a cascade of fail, is it time to consider, as my angry sage and friend Jonathan Tanner said: "slowing down to move faster?"

Most programmers that have been in the business for a while suffer under a completely different regime: they know what to do, but pointy-haired bosses make doing the right thing impossible! 

This majority of engineers should take this little bit of wisdom to heart: producers do not produce code. Business managers have no business to sell without your work. They regard you as freaking wizards. Be honest with them about the costs of doing a bad job.

Do you have time to write tests? Can you make time to help your fellow developers ensure they do not accidentally break what you have created?

For a taste of a solid test framework, check this link: http://logicle-cplusplus.blogspot.sg/2014/03/writing-great-user-stories.html

Wednesday, March 5, 2014

Writing Great User Stories


A few years ago, I was introduced to Behavior-Driven Development (BDD), RSpec and shortly afterward, Cucumber. Having advocated for Test-Driven Development (TDD), I was curious and cautiously optimistic.

Working with Cucumber (http://cukes.info) has been great. It is feature specification, documentation and testing wrapped up in a single workflow. It starts with a feature file written in plain language that drives the rest of the test framework. It is a close cousin to TDD, in that the tests are written first, then the implementation follows until tests pass. The difference between BDD and TDD: BDD tools like cucumber use behavioral/business specifications as the starting point, not an engineering authored suite of tests.

Feature files usually start with what reads like a typical scrum user story, followed by several scenarios that clearly define the feature's behavior. The scenarios can be used as acceptance criteria in addition to documentation and driving tests. Gherkin (the DSL that drives Cucumber), like any language or technology, has established current best practices. Gherkin doesn't really care how a feature is described, but scrum-like user stories are typical.

One of the guidelines for writing great feature descriptions is to apply an adapted version of the Six Sigma 5-Whys method.

There is a great example of this approach on the cucumber wiki.
[5:08pm] Luis_Byclosure: I'm having problems applying the "5 Why" rule, to the feature 
                         "login" (imagine an application like youtube)
[5:08pm] Luis_Byclosure: how do you explain the business value of the feature "login"?
[5:09pm] Luis_Byclosure: In order to be recognized among other people, I want to login 
                         in the application (?)
[5:09pm] Luis_Byclosure: why do I want to be recognized among other people?
[5:11pm] aslakhellesoy:  Why do people have to log in?
[5:12pm] Luis_Byclosure: I dunno... why? 
[5:12pm] aslakhellesoy:  I'm asking you
[5:13pm] aslakhellesoy:  Why have you decided login is needed?
[5:13pm] Luis_Byclosure: identify users
[5:14pm] aslakhellesoy:  Why do you have to identify users?
[5:14pm] Luis_Byclosure: maybe because people like to know who is 
                         publishing what
[5:15pm] aslakhellesoy:  Why would anyone want to know who's publishing what?
[5:17pm] Luis_Byclosure: because if people feel that that content belongs 
                         to someone, then the content is trustworthy
[5:17pm] aslakhellesoy:  Why does content have to appear trustworthy?
[5:20pm] Luis_Byclosure: Trustworthy makes people interested in the content and 
                         consequently in the website
[5:20pm] Luis_Byclosure: Why do I want to get people interested in the website?
[5:20pm] aslakhellesoy:  :-)
[5:21pm] aslakhellesoy:  Are you selling something there? Or is it just for fun?
[5:21pm] Luis_Byclosure: Because more traffic means more money in ads
[5:21pm] aslakhellesoy:  There you go!
[5:22pm] Luis_Byclosure: Why do I want to get more money in ads? Because I want to increase 
                         de revenues.
[5:22pm] Luis_Byclosure: And this is the end, right?
[5:23pm] aslakhellesoy:  In order to drive more people to the website and earn more admoney, 
                         authors should have to login, 
                         so that the content can be displayed with the author and appear 
                         more trustworthy.
[5:23pm] aslakhellesoy:  Does that make any sense?
[5:25pm] Luis_Byclosure: Yes, I think so
[5:26pm] aslakhellesoy:  It's easier when you have someone clueless (like me) to ask the 
                         stupid why questions
[5:26pm] aslakhellesoy:  Now I know why you want login 
[5:26pm] Luis_Byclosure: but it is difficult to find the reason for everything
[5:26pm] aslakhellesoy:  And if I was the customer I am in better shape to prioritise this 
                         feature among others
[5:29pm] Luis_Byclosure: true!
This is equally valuable for user stories in a typical product backlog. A well groomed backlog should have stories that communicate the value of each item for easy prioritization.

On some scrums, as the product owner, I have been pairing user stories with scenarios as acceptance criteria. Each story is groomed a bit with the team to include scenarios so the criteria for the features are clearly defined, testable, can be automated and communicated with business managers, production teams, artists, designers, testers and other engineers. The output of product backlog grooming (for upcoming items), are cucumber feature files.

As stated previously in this blog, the "business value" for game developers is making fun. It is all too easy to forget that when talking about monetization, operating expenses, development budgets, etc... Yes, we are in business to make money, but we do so by offering a value proposition that is unique -- we sell fun and delight to customers. Everything else we do is pointless if we have nothing of value to offer.


Applying 5-Whys to user stories should focus on whether the feature satisfies Minimum Marketable Features (MMF). One addition to the criteria of problems we need to address with a feature is "will this make the game more fun?"

Friday, January 22, 2010

A Brief Refresher on Member Initialization

So, I forgot something that is maybe a little important, especially when it comes to multithreaded code.
struct MakeThreads
{
    MakeThreads() :
    _thread(threadFunc, this)
    , _done(false)
    {
    }
    ~MakeThreads()
    {
        _done = true;
        _thread.join();
    }

    void threadFunc()
    {
        while(!_done)
        {
            // do stuff
        }
    }

    Thread  _thread;
    bool    _done;
}

MakeThread letsScrewUp;

Now, my old, single-threaded, C self wants to put the smaller members later in the struct/class declaration to ensure data packing is efficient. My new, multi-threaded C++ self hasn't formed proper habits to prevent this sort of bug. How I wish I had lint here to continue to wrap my knuckles and improve my coding habits!

Just a reminder, the member initializer list in the constructor does NOT dictate the order in which members are initialized with many C++ compilers. As Scott Meyers (and g++ with proper settings) warns us: the member initializer list should always match the order of member declaration in a class!

I made this mistake recently. For those that haven't already caught it, _done is initialized AFTER the thread is initialized. The thread queries _done (which may or may not be initialized at this point.)

This is one of those scenarios where the code does what you expect most of the time but rarely all of the time!

Like any language, some of the finer points of C++ are easy to forget unless you find opportunities to practice them often. For those itches that aren't scratched on your day job, I highly recommend busting your own chops in your spare time for remedial work :)

Notes on Scaling a Multiplayer Game Server

Well, lets start with some basics. The Client/Server model (for video games) consists of a game server, which is authoritative for some or all of the game state. The clients produce messages from player input, and consume state updates as dictated by the server. The client side, as far as the network is concerned, is pretty straightforward:

  • Connect to the server
  • Handle incoming messages to update the world
  • Send local player input to the server
  • Clean up when the server disconnects

There are a few things that can be done to streamline the client network code, such as putting Network I/O in a separate thread. This isn't done to make the game render faster, but to prevent the network from timing out when the client is busy loading some massive data set. The game client will never spend significant memory or CPU time dealing with the network and its messages. Programmers can play it pretty fast and loose in this regard. If the client can pass packets around, the job pretty much consists of keeping the game state consistent with what the server is telling the client.

For servers that don't have to scale beyond a handful of connections, the same paradigms hold true. Pumping the network and tossing packets isn't much work compared to the heavy lifting the game simulation has to do. There is one problem that frequently plagues the server: O(n^2) (order 'n' squared, for those not familiar with Big-O notation) operations. Every FPS I've worked on was inherently O(n^2) in its messaging. Basically, this happens when every client causes some update that generates traffic to every other client connected to the server.

In the great tradition of examples contrived and simplified to illustrate a point, let's assume that game clients are authoritative for player position. To further complicate the problem (and the math), we'll throw the update rate into the mix. so further assume that they update the network every 50ms (20Hz, twice as fast as the original Quake).

  • Each client sends 20 position updates each second.
  • A server with 2 clients will send 40 updates each second. 20 updates from Client 1, sends 20 update messages to client 2. Client 2 sends 20 update messages to client 1.
  • A server with 3 clients will send 120 updates each second. 20 updates from Client 1 will trigger a total of 40 update messages to client 2 and 3. 20 Updates from client 2 will trigger 40 update messages to clients 1 and 3. Client 3 will trigger 40 updates to clients 1 and 2.
  • A server with 4 clients will send 480 updates each second. 20 updates from client 1 will trigger 120 update messages to clients 2, 3 and 4. (etc...)

There seems to be a pattern here. With only 4 clients, running at 20Hz, the server needs to toss packets at a 2ms interval. This is something most hardware can handle, but game servers need to handle dozens, hundreds or thousands of players. Oh, and it needs to also run a game simulation. This model doesn't hold up well in the face of those numbers.

Most game programmers are painfully aware of these scaling issues. They employ a number of techniques to reign in the traffic requirements. Reducing the update frequency to other clients based on distance is one (of many) ways to affect the scalability equation. MMO's spread the load across dozens of servers in a cluster that consists of a single game "shard". Since this article is about scaling, and about the performance of network code, it won't focus on those techniques, but instead look at how scale affects overall performance of net code.

Scale comes in too many different flavors. Web servers need to deal with thousands of concurrent, isolated, short-lived connections. Chat servers handle thousands of concurrent, long-lived point-to-point connections. MMO servers must support hundreds or thousands of long-lived, point-to-multipoint connections. FPS servers deal with dozens of long-lived, point-to-all-point connections. The first common-sense reaction to scaling issues for a new server programmer is "well, Google manages to handle millions of clients with no problem, we'll use Web technology since this is already a solved problem."

The problems for game servers are primarily matters of pushing state updates at a rate that is proportional to the number of players that cause the updates, and the number of players that must receive those updates. Game servers are nothing like web servers, unless the game is designed to treat players as disconnected entities that have no affect on the state of the world that the rest of the players participate in.

Consider Iron Forge in World of Warcraft. At any point in time, there are hundreds of players nearby. It is one of the worst performing scenarios in multiplayer gaming. Everyone is running around in close quarters. In MMO parlance, it's a flashpoint. What is the server network code doing?

  1. The server receives a position update from a player.
  2. The server determines that 165 players in the immediate vicinity need to receive that update.
  3. Server sends 165 net messages. (the other 165 players are ALSO running around, creating messages. Do the math again, there are thousands and thousands of messages required to keep this state consistent for the game clients!).

Ok, code time:

void Connection::send(void * data, size_t length)
{
    // blocks while the OS copies the data to a net buffer. 
    // If the kernel buffer is full, blocks the entire time the 
    // remote is acknowledging it accepted data from the other 
    // 164 messages it was sent
    _socket->send(data, length); 
}

That's what client code on the server might do. An evolution of this model may want to avoid the potential blocking the kernel may do while its send buffers are full.

void Connection::send(void * data, size_t length)
{
    // don't block the ENTIRE game sim for one slow-assed client
    // that isn't emptying the kernel socket buffer fast enough
    // assume _sendQueue is thread safe
    _sendQueue.push_back(new Message(data, length));
}

// in write thread, grab messages off the queue
void Connection::networkThread()
{
    while(_connected)
    {
        Message * m = _sendQueue.pop_front(); // locks queue, removes front element
        _socket->send(m->data(), m->length());
        delete m;
    }
}

That's an improvement. Many client programmers will tell you that after they have optimized the snot out of their bleeding edge rendering system they had to start looking at allocations and moving memory around.

Allocating memory isn't doing work, it's making room to do work. Moving memory around isn't doing work, it's putting it someplace convenient to do work later. In our scenario, tossing 30,000 packets per second means there is a lot of work to do. Making 30,000 allocations per second and 30,000 deep copies per second will soon show up on the profile of an active server (though it will NEVER show up on the profile of a pre-production server that never has more than 10 or 20 people connected). Lesson to learn here: Play-testing for months with a few dozen users will never prepare code for what happens when thousands of users start beating the snot out of it.

One more word about allocations: the biggest risk for a server that needs to scale is memory fragmentation. Allocating and freeing tens of thousands of variable length buffers each second wreaks havoc on a server. It's not uncommon for a server to fall on its face because it cannot allocate memory. This can happen before it stalls trying to send/receive/move packets. I wouldn't bet on that race, but it is something that kept me awake some nights while trying to find a good solution. An allocation failure can happen when the server has 1GB of free memory for the process, but doesn't have 1k of free CONTIGUOUS memory to give the application when it needs it!

Reducing the allocations (that don't do work) and deep memory copies (that also don't do work) provides the greatest improvement for network code. Reducing the number of messages sent is the job of game code and game design. Network code can't fix an insane design, but it can try to accommodate a reasonable one so it can scale well.

void Game::updatePlayersNearby()
{
    MessageBuffer & buffer = _connectionManager::getBuffer();
    buffer << _newStateDeltas;
    _outstandingMessages.insert(&buffer);
    foreach(Connection * player, _nearbyPlayers)
    {
        player->send(buffer);
    }
}

void Game::updateComplete(MessageBuffer & buffer)
{
    if(std::find(_outstandingMessages.begin(); _outstandingMessages.end(), &buffer) != _outstandingMessages.end())
    {
        _connectionManager.freeBuffer(buffer);
    }
}

There are a LOT of assumptions made in that code that I hope are obvious. The principle point is that a server should multicast or broadcast to multiple connections without allocating new memory and without copying that memory for each connection. A single, pre-allocated message buffer is requested. That same buffer is shared for sending to ALL connections that need the data, and that buffer is pegged until they are all finished. The load on memory and the CPU for non-work is eliminated.

This isn't a panacea for scaling multiplayer game servers. There are MANY other issues involved with scaling a server well. In my personal experience, these are the issues most relevant to scaling the low level network code itself. This helps to address some of the problems that are most often experienced with game server technology. Always consider the traffic characteristics:

  • How many simultaneous connections will there be?
  • What are network side-effects of a connection sending a message to the server? (Send 1 or send N packets?)
  • How long will the connections last?

Take those few questions into consideration, and also think about how much non-work the server should sacrifice in the interests of performance. Sometimes non-work is a time sync and contributes to lag and overall scalability of the server.

Lastly, not all multiplayer games are games that need to scale. There's no need to go overboard with Overlapped I/O, shared buffers and other paradigms that complicate game code if the game code doesn't need to accommodate the scaling techniques these technologies are designed to solve.

Until next time ...

Have Fun!

Monday, December 7, 2009

Nails

I picked up the new Motorola "Droid" phone. I'm a big fan of Linux and this has to be one of the most useful devices I've purchased in recent memory. Of course, my first thought was how I might write some code for the Android platform.

Google's SDK is very Java-oriented. I've done some development in Java, long ago, and didn't care for the experience. My last gig at Microsoft required a lot of C# work, which brought back the bad-old-days of Java programming. Fortunately, Google also provides an NDK for native development. I guess C++ is a programmer comfort zone to me.

After a bit of hackery, I was running a simple OpenGL ES app. It was cute. The process was relatively painless, and I was thrilled to have my bits shuffled by my Droid's ARM-7. After stewing on it for a while, I began to wonder if maybe I was being too harsh on Java/C# and other languages for game development. Most of the work I've done over the past decade have been on large, expensive productions -- not something you build on today's mobile platforms. I was guilty of using the C++ hammer in my toolbox to the exclusion of everything else.

So, I've decided to put together a Flash-based game with an eye to porting it to Java and C# for other platforms (Android/XNA perhaps)? All of the targets support blasting bits around to a display device or bitmap, so I'm going old-school. I forgot how much fun it is to watch a simple game evolve over the course of a few hours and a few hundred lines of code.

If the game reaches a playable state and would otherwise be destined to rot in my bit-locker, maybe I'll post a walkthrough here on the blog. It may not be C++, but it is game development, and not every nail is a multi-million dollar production requiring a C++ hammer.

Monday, October 26, 2009

Fun In A Box

So, I've been wandering aimlessly through my "stack of stuff" to play. I revisited Mass Effect, and it scratched a few gaming itches I've had for a while now. I'm embarassed to have let it collect dust for so long. I've also played Stardock's "Sins of a Solar Empire", which is a fine 4X indie title and well worth the money I spent on it.

In fact, I was so impressed with Sins, that I figured I would give Galactic Civ II a try, what with Stardock earning some repute with Sins, and the 9/10 scores Galactic Civ II earned on some review sites.

It was after playing several turns of Galactic Civ that I realized that "Fun" must come in many complex flavors. I was far from hooked and was a little perplexed at the critical acclaim the game received. I'm sure I'll be giving it some more time to find the fun in the game.

I think this notion, "finding the fun", is where great games lose so much of their potential audience. Gamers and game review sites lament the great titles that never found success. How many of those titles required players to "find the fun"? We are video game developers. We SELL fun. In a Box. For Money. Would you buy a car from a salesman that simply tossed you some keys and told you to go find the car? Maybe you'll find a Citation, maybe you'll find a Tesla Roadster, maybe you'll tell the salesman to stick those keys where nobody will find them....

Game design wonks and academics argue constantly over what defines "fun" for video games. Sadly, there are far too many shops that don't even bother to take any metrics about how much "fun" their games provide. You don't need a formal definition to pursue a "fun-formula" and churn out a "AAA title".

Sit people down and ask them to play it. Ask them, "are you having fun right now?" If the answer is "no", then the odds are, they aren't having fun! That begs a more difficult question: Why aren't they having fun? Again, academics and design wonks spend a LOT of time debating this issue as well. Oddly, the same approach to a solution is so rarely pursued it makes me want to cry.

Sit people down and ask them to play the game. When they aren't having fun, ask them "Why?" Answers usually describe emotional states: "I'm frustrated," "I'm confused," or "I'm bored."

See a pattern here? Actually play-test the game and get some feedback. "Why are you frustrated?" "Why are you confused?" "Why are you bored?"

Play-testing is so underutilized (in my experience anyway) that I'm not surprised that I feel like I'm wandering through a gaming wasteland trying to cherry-pick some entertainment for the $40 or $60 I gamble on one title or another. Most games only get coverage from QA departments, and not even the developers actually spend much time playing them! QA is essential to ensuring a title ships with few bugs, but you have to remember, these are the people that are playing the same content, over and over for months (sometimes years) on end, just praying for a build that doesn't crash or will run at a decent frame rate.

Play-testing rule #1 : Eat your own dog food. Developers (artists, designers, programmers) MUST play-test the game at least once a week. These people are responsible for their work. To understand any other play-test feedback, they need to be in the trenches with the players and QA. They are already in the trenches for development!

Play-testing rule #2 : Each milestone/sprint (whatever your methodology is), get some fresh blood in to play the game. These are the people that will be spending their money for the fun you are trying to package up and sell them. If they aren't having fun, you aren't selling what they want!

Play-testing rule #3 : Only listen. Never talk back to the testers. Telling them "Oh, that's fixed in the next build" or "you just have to play through this part to get to the cool stuff" won't improve the quality of the feedback. Whoever is doing the listening can filter the feedback and prioritize accordingly.

Play-testing rule #4 : take the feedback seriously. If all of the developers are hard-core gamers but want the rest of the population to enjoy what they made, they have to understand that not everyone is willing to go "find the fun" in some epic search through the game.



Fun.


In a Box.


For Money.