I was researching stateful distributed systems for an article today and found a post on High Scalability that was a perfect fit. It’s an unofficial transcript of a StrangeLoop 2015 talk by Caitie McCaffrey, Tech Lead for Observability at Twitter. I read the transcript first, then felt compelled to watch the presentation.
Caitie talks about solving problems that have challenged MMO developers for years. I’ve often thought we were alone in having to deal with massive state at massive scale. There just wasn’t much written that addressed the problems we faced. Over the years I’ve found a lot of information on scale, load, and concurrency in the web space. Yet, very little of it seemed relevant to the stateful systems we build for our MMOs.
Caitie’s presentation changes that. She shares insights on using “sticky” connections and stateful servers to expand the options for building highly available distributed systems. Her solutions for dynamic cluster management and work distribution make stateful systems able to load balance more efficiently and recover from failures more effectively. She solidifies these concepts with case studies from Twitter, Uber, and most relevant for us, Halo 4’s use of the Orleans actor framework. She sums up with practical advice and cautions for implementors: don’t reinvent the wheel.
Thanks to the High Scalability blog for the transcript, and to Caitie for her great presentation. I highly recommend checking out both of them.
Source (presentation): “Building Scalable Stateful Services” by Caitie McCaffrey (slides)