The Know what to Build articles discuss how to identify our multiplayer game’s server architecture early in the project. This reveals technical risks and key engineering questions before we’ve invested much development time.
In a recent post I described how to assess the impacts of our game’s key design features on the server architecture. In this post I’ll describe a way to develop a high level estimate of the amount of work our server will have to do, to help us know what to build.
The Server Loop as a Model
Developing the architecture for our game server requires making trade-offs. Knowing how to do this requires a general understanding of the operations the server will perform, and the amount of work required.
We’ll use the server loop as a conceptual model for this. The server loop is a variant of the classic game loop pattern, applied to server-side functionality. Because this is just a model, we’ll ignore implementation details such as fixed or variable step sizes. Also, for now we’ll assume our server is single-threaded and runs on a single core machine.
In its simplest form, pseudocode for a server loop looks something like this:
While the game is running: Receive player input from clients. Update game state. Send game state changes to clients.
That version of the model involves a lot of hand-waving, so let’s make it more specific:
While the game is running: Handle client connection/disconnection. Decode encrypted and compressed network messages. Perform player character movement. Perform player character actions. Update other game objects. Update physics simulation. Send state changes to clients. Encode encrypted and compressed network messages.
This version gives us a better idea of what happens during the “update game state” step from the first example.
Keep in mind that our server loop is just a mental model. It doesn’t represent our eventual implementation. It identifies the main things our server will do each iteration, or frame, of the server loop. These operations aren’t guaranteed to become top-level functions in our real server loop, although some indeed may.
The server loop above executes operations that might exist in a multiplayer first person shooter game. Other game types will do different things in their server loops. For example, an MMORPG server loop will probably add AI and NPC operations, as well as operations for crafting, trade, and other game systems.
Time as a Proxy for Work
Once we know what happens in the server loop, we need to estimate the amount of work each server frame will perform. For our model we just need a high-level abstraction, since we’re not thinking about implementation now. Here, it’s handy to use time as a proxy for work.
One reason time works for this simply that it’s a familiar concept. More relevant is that time — specifically latency or response time — is how players will judge our game’s performance. Also, we’ll use wall time for this instead of CPU time. This is okay because wall time is closely related to perceived latency. Finally, in our model, wall time does correlate with CPU time, because our model stipulates a single-threaded process running on a single core machine.
Theoretical Function for Work
Our server loop executes continuously for as long as the game runs. For a given operation, the server performs an amount of work that we can represent as a function:
where: is work performed (wall time). is size of the data set being operated on. is wall time to execute operation. is probability that operation will execute. is complexity of the operation performed.
Let’s take a look at these inputs:
- Size of Data Set: The number of data items against which the operation will execute, once per item.
- Time to Execute the Operation: Estimated time to execute the operation once.
- Probability that Operation will Execute: In a given server frame, each operation will probably not execute against every element in its data set. We’ll simulate this effect by assigning a probability to the operation.
- Complexity of Operation: This is the Big-O measure of algorithmic complexity of the operation.
The total work done in each server frame is the sum of the work of all the operations in the frame:
A More Practical Function
The theory is fine, but for it to be useful, we should simplify our function.
Let’s start by factoring out the complexity into a separate function expressed in terms of N:
Let’s assume linear complexity (O(N)) in the common cases where we don’t know the actual complexity:
Now we can simplify our theoretical function to this:
And, for those rare cases where we do know the specific Big-O complexity:
where: is one of O(N), O(logN), O(NlogN), ...
Applying the Function to the Model
Now we can exercise our server loop model using our simplified function and some reasonable input values. Figuring out the right inputs to use is a cross between a black art and a crap shoot.
Estimating the execution time T is the hardest. In some cases we can write test code and measure its execution time. In others, we’ll have to make educated guesses. We can improve these as we learn more over time, so iteration is important here.
Estimating probability P is also a guess. However, here we can probably get some help from our game design. Also, a bit of deductive reasoning helps. For example, client connections will happen far less often than most other operations. If that weren’t the case, there’d be no reason to play. Again, iterate.
If we have trouble with our estimates, we can min/max the inputs to come up with some best/worst case scenarios, which is still better than a SWAG.
Often, just going through the exercise is worth more than the result. It forces us to decompose the problem and think through some important concepts. This reveals engineering challenges and risks, an important goal in developing our architecture.
The table below attempts to estimate the average server frame time of our FPS server loop model. Each row represents an operation in the loop. The leftmost column describes the operation. The middle columns contain the inputs to our work function: N, T, P and C. The rightmost column holds the result, W, representing work done for that operation. The bottom row contains the sum of the work for all operations, expressed as the average server frame time.
- The game is a 64-player FPS with PvP combat only (i.e. no AI or NPCs).
- Our theoretical server loop runs in a single-threaded process on a single-core machine.
- N is the size of the data set for the operation. This varies, but is often related to number of concurrent players.
- T is the estimated wall time to complete the operation. Values are short (0.1 ms), medium (1.0 ms), long (10 ms).
- P is the probability that the operation will run during a given server frame. Values are low (1%), medium (50%), high (80%).
- C is the complexity factor, or the appropriate Big-O function applied to N. This defaults to O(N) unless we know otherwise.
- Footnote annotations in specific cells identify other assumptions about the contained values.
|Operation||Data Set (N)||Time (T)||Probability (P)||Complexity (C)||Work (W)|
|Server Frame Time (MS)||266.29|
|Handle client connection/disconnection||Active players||64||long||10||low||1%||O(N)||64.0||6.40|
|Decode encrypted and compressed network messages.||Active players||64||medium||1||high||80%||O(N)||64.0||51.20|
|Handle player events from clients||Active players * events per player||64||medium||1||high||80%||O(N)||64.0||51.20|
|Perform player character movement||Active players * movement events per player||64||medium||1||high||80%||O(N)||64.0||51.20|
|Perform player character actions||Active players * action events per player||64||short||0.1||medium||50%||O(N)||64.0||3.20|
|Update game objects state||Modified game objects||841)64 player characters + 20 game objects||medium||1||low||1%||O(N)||84.00||42.00|
|Update physics simulation||Modified physics objects||842)64 player characters + 20 game objects||medium||1||medium||50%||O(logN)3)Assume binary search of spatial tree/graph.||6.39||3.20|
|Send state changes to clients||Active players * modified game and physics objects||107524)64 clients * (84 modified physics objects + 84 modified game objects) ||medium||1||medium||50%||O(logN)5)Assume binary search of spatial tree/graph.||13.39||6.70|
|Encode encrypted and compressed network messages.||Active players||64||medium||1||high||80%||O(N)||64.0||51.20|
What the Model Tells Us
In a nutshell, it looks like our theoretical server loop model would run at around 270 milliseconds per frame, or about 3.7 frames per second. Yikes, that seems slow, doesn’t it?
But, remember our single-threaded, single-core assumption. We made that assumption to simplify our model and the math behind it. Now, let’s imagine running the loop in a multi-threaded server on a 4-core machine. This, in theory, cuts our frame time by 4 to about 68 milliseconds per frame, or 15 frames/second. We can double that again to 30 frames/second on an 8-core commercial rack-mounted machine. Server loops typically run at a slower frequency than those of their clients, so this frame rate approaches something reasonable for running a game server.
Now we know that our FPS game must run on a beefy multi-core box (and do a great job of multithreading) to handle the load of 64 concurrent players. This is reasonable, as most modern FPS games that support 64 or more players tend to be hosted on commercial game servers.
When to Use the Model
The purpose of our server loop model is to help identify what types of things a game server will do, and the work required. This helps us understand how to allocate computing resources to support the game server’s operation.
I used an FPS server loop for our example in this post because it’s simple and illustrates the main concept fairly well. This is probably overkill for most FPS game servers, though. FPS game development is common and fairly well understood. Traditional FPS games tend to use a Monolithic Architecture for simplicity, and because players often want to host their own servers. This limits the architectural choices FPS developers can make, so our model can’t add much value here.
The model should be more useful when applied to the domain of MMO server architecture. To be worthy of the massive adjective, MMO servers must scale to support thousands of concurrent players. They also engage players with deeper play experiences involving many interconnected game systems, virtual economies, and social interactions. This richness requires distributing the work across multiple server processes hosted on multiple computers. The server loop model is a tool that helps us know how to do this.
In the next Know what to Build article, I’ll expand on our model and apply it to the problem domain of MMO servers. We’ll use it to explore the impacts of greater scale and complexity on our architecture. What we learn from that will help us understand how to distribute game functionality across multiple processes and machines.
What do you think about this approach? What techniques, if any, do you use to get your mind around what your game server will do, and whether your design can handle it? I’d love to hear about any improvements or alternatives you might have to suggest. Of course, questions are great too! Please write a comment using the feedback form.
Thanks once again for spending time with me at Engines of Delight!