WildStar Wednesday: Stress Test Postmortem
By Loic "Atreid" Claveau - 05 June, 2013
So, for our first stress test we asked you to break things, and wow – did you ever. This event, which we are now officially calling “Stress Test Part One: The Stressening”, uncovered some extremely important data for us. The good news? We’ve got lots of information to work with. Gigs upon gigs of data to parse, crunch, evaluate and replicate.
Given that today we are running Stress Test Part II: The Stress Continues, we wanted to share an inside look at what happened the last time we took a ride down this particular rabbit hole.
Here are some key discoveries:
- For some reason, one instance of the Crimson Isle had as many as 400 players placed on it. This is bad, because that instance is meant to cap at 20 players (We instance our starting zones for a better new player experience). But it’s good, because we've already fixed that bug.
- Sentinel, which is our server monitoring tool, was misrepresenting the names of some players, making it hard for us to see who you actually were while you were logged in. That’s bad, but it’s also good because we were able to diagnose that issue, and a fix is in the works.
- And the big issue: We are capping out at 10 Mb a second out of our gateway machine. The Gateway machine is so named because it is the gateway between you, and all of our servers. With the information we gathered on Friday, we will be running forensic tests until we can identify if this is a hardware or software issue. Then, we fix it.
Craig Turner, our Live Producer, was the ringmaster for this important milestone in our development project. Here’s what he had to say:
Within 2 minutes of flipping the switch to turn the stress test on, I was literally running through the hallways at the office. There was not going to be a slow ramp up of users, oh no. We were at critical mass almost immediately. It was awesome.
What wasn’t awesome is that we broke in a way we weren’t prepared for. We had some contingency plans in place to make things playable should we have fallen over in the spots we thought were fragile, but as things tend to happen with new systems; we didn’t know what we didn’t know. The great thing is, is this is exactly what the Stress Test was intending to accomplish. We needed to see what was going to break in new and interesting ways so that we can prevent those problems in the future.
I’m looking forward to the additional Stress Test windows so that we can break new things! Ultimately though, I’m looking forward to the time that the Community Team once again forgets my name. When that happens, it’ll be because stability isn’t an issue and everything is running along smoothly. The only way we get to that point though, is with your help and tolerance in participating in more of these tests.
David “Scooter” Bass, our Senior Community Manager, was at the communications helm, making sure everyone was updated on what was going on. Here’s his take on the WildStar stress test:
For our first stress test, the Community team had a few commitments we made to ourselves (and to the stress testers):
- Setting proper expectations. Too many gamers are used to the usual "STRESS TEST WEEKEND WOOOOOO!" hype, and the problem was that we needed to do an actual stress test… one where we kept adding more and more players until something broke so that we could see where our weaknesses were. We did a lot of communication on this beforehand, and it seemed to work pretty well… the majority of active stress testers were well aware that things were not going to work, and weren't too bummed out when they didn't. I'm sure the promise of a future beta invite to those who put up with the unpleasantness helped too…
- Updating the community regularly. This is tough when you've got people in three offices across the country tracking issues, but we wanted to commit, at the very least, to posting every half-hour (if not more often) with an update. Most of the time, these were pretty basic updates of "Still no updates, but people are looking into it!" but that commitment to constantly staying in communication with people, especially during what proved to be a frustrating gameplay experience, is what we feel really creates strong communication between players and the team.
- Making things fun. When you've got an unplayable game and 15,000 people waiting to play it… what do you do? Turns out our answer to that is "go a little nuts." While the Community, Customer Service, and Live teams were waiting to hear back from our engineers and server programmers about what they'd found, we may have gone a little crazy. COMMUNITY SECRET: We like to call Craig's two videos "A Descent Into Madness."
During the actual test, we set up an impromptu "War Room" (which was really just me moving over to sit near our Live Producer Craig Turner and our Game Support Manager David Crossley so we could all yell things out at each other). It's amazing how being in the same room as people helps improve communication a thousand-fold, but it allowed us to respond to Support tickets faster, post on the forums more often, and get more immediate gameplay feedback across all three disciplines. At one point, fans were asking us to set up a webcam, but sadly it would've shown an empty room as we ran around getting more information.
As I'm sure Craig will attest to, we obtained some invaluable data from the stress test that we never could have found otherwise, so even though we are sorry most people didn't get a chance to play, we're excited to run something like this again in the future!
So in closing, thank you. Thank you for suffering through this, thank you for providing us with incredibly invaluable data, and thank you to everyone that has been so supportive of both our efforts, and their stress test compatriots.
Remember, there will be plenty of opportunities to participate in our stress test events, as well as the longer-term WildStar beta. So keep the faith!
See you on Nexus!