Bungie explain Destiny 2 outage and character rollback, and how they’ll stop it happening again

For the second time in two weeks, Destiny 2 updates have led to players losing currency and materials, forcing Bungie to take the game offline while they fix server issues and rollback characters to a point shortly before the update. With how close together these instances were and with identical inventory issues in both cases, Bungie have explained what went wrong in the first place, how they’ve gone about fixing it, and how they’re going to prevent it from happening again.

The most recent instance came last night on 11th February after the launch of Hotfix, a recurrence of the same issue that occurred with Hotfix 2.7.1 on 28th January. Unfortunately, the issue goes quite a bit deeper and further back than that.

It started with a bug in quest log sorting that was reported and fixed last year. Quests are treated like inventory items on the game’s backend, with everything having a timestamp to help track and sort, and these being cleaned up every time you login to the game alongside checks to the current balance of the game. At the time Bungie removed the timestamp resets for quests, but this had an unintended consequence for stacked currencies and materials, and was initially written off as an internal debug issue by Bungie. It was pushed out to users in 2.7.1, and quickly fixed with a server-side hotfix and restart.

But this went hand in hand with Bungie adding additional WorldServers to the game for Shadowkeep’s launch. Unfortunately a small number of these would crash on start up, and Bungie’s workaround was to simply manually restart the server, seeing no real effects for players. That bit them in the ass with, as those handful of WorldServers crashed again, were manually restarted, but did not apply the previous character data corruption hotfix, and started causing the issue once more. They even managed to skip the verification systems to try and spot the version mismatch, which Bungie thought was impossible – Seriously, this is how we get SkyNet, guys.

Though it affected a smaller number of servers and characters, Bungie decided the best course of action was to perform another character rollback from the backups taken prior to rolling out

Bungie have recovered quickly twice and at no meaningful impact to players, but clearly this can’t happen again, and so they’re working to fix the root causes and putting in place new systems to prevent this from happening again:

  1. We have added further safeguards to our process for “hot-patching” our servers to ensure that they cannot start with an unexpected version. This change is in place as we spin up the game today.
  2. We have fixed the issue that caused a small fraction of WorldServers to crash on startup. This fix will be deployed with Season 10.
  3. The permanent fix for character corruption will be rolled into the next update as an executable change, removing the need for the configuration override. (Unfortunately the Hotfix was too far along to benefit from this).
  4. Looking ahead, we are investigating ways to speed up our rollback and recovery mechanisms.
  5. In a future release, we will address the issue that can cause servers to skip loading configuration data.
  6. We will also add more protections to the login-account clean-up code, to help prevent future bugs from being introduced into such a critical area.
  7. We are updating our development methodologies to catch issues like this earlier in the release pipeline.

The first step to this is Hotfix, which is planned to be pushed live tomorrow on 13th February. This will also fix the infinite Dawnblade issue.

Source: Bungie

Written by
I'm probably wearing toe shoes, and there's nothing you can do to stop me!