A/B Tests and Experiments in Games - The LiveOps Standard

A/B tests are useful tools in the LiveOps arsenal. If properly configured they can allow developers to quickly iterate over design ideas in a live game with low risk. Applying the Scientific Method will ensure the data and results remain clean and trustworthy.

What are A/B tests?

An A/B test comes from the scientific method where you run an experiment across two groups to confirm or deny your hypothesis. Let’s say you have a farming game and you want to see whether players like strawberries or blueberries as their first fruit they plant and you currently have strawberries. Your hypothesis might be “Players will plant more blueberries than strawberries as their first plant”. You run a test where some players get strawberries and others get blueberries. You review the data after and see if your results confirm or deny your hypothesis.

Groups

In the above example we have 2 groups, strawberries and blueberries, let’s call them as group A and group B. It’s important when creating the two groups that we need to consider what question we want answered in the end. Too many variables can call your resulting data into question. Did players plant more blueberries because they want them or because they were bigger, did they have a different yield etc. In our example we just want to know if people like strawberries or blueberries more so we we will make sure they have the same balance output, size etc.

The A group is often referred to as the control, this is the object we currently have in the game, the B group is usually referred to as the variant, this is the new change we are testing. Now that we understand how groups are set up, we can discuss how we assign players to each group.

Group Assignment

When we assign the groups to the game we can split it up however we want. In some cases discussed in future articles, you might only want 10% of players to see the B group to act as a firewall against risky changes you are introducing to the game, a way to pull the plug if things go south quickly. In almost every case though you want an even split between your groups, or 50% in each of the two groups.

A simple way to do this is to set your assignment variable as 0.5 and call a random number that if below that variable will be the A group and B group otherwise. In most cases this satisfies our group assignment but there are some big caveats to be aware of. Finally you want to make sure you only assign players to a group once, you don’t want players to switch groups when they reload the game.

When to Assign

There isn’t a rule for this but more that you just need to be aware of your assignment timing. Generally speaking you want to assign just before people encounter the variants for the first time.

For example:

If you are testing a change in an event then assign groups at the first event launch
If you are testing the example above and the planting happens early then you might want to assign at first launch.

Significance

Note: This is a deep topic that will be covered in future articles

The closer the difference between the variance and control, the higher population you need to have in order to have confidence in the data and the result. Let’s imagine you have 10 people come to plant fruits, 5 in each group. From that you see that 6 people planted blueberries and 4 strawberries, How confident are you in the result? It’s too close for such a small number, we need more players to flush out the possibility of random chance. Let’s imagine with the same group that 9 players planted blueberries and only 1 was strawberries. Now we have a lot more confidence that blueberries are the winner, the group size remained the same but the gap was much larger.

Luckily there is a statistical formula for calculating this, and even luckier, there are calculators out there for this.

https://abtestguide.com/calc

Keep in mind that if the confidence is not met through this then the result can be attributed to random chance. It can still be a usable test but beware.

Data Dashboard Setup

Your dashboard should be monitoring the following data points in realtime.

Assignment Percentage

As players come to the assignment point in the game you will want to keep an eye on the group assignments in your data analysis tools, this will be an important variable to see so add it to your data dashboard for the test.

Custom Actions Between Groups

In our example we wanted to see how how many people plant from each group. In this case, you would make a chart to show the number of fruits planted for each group. This is your key data point because it ties back to your hypothesis.

Even though we might have our answer here, we need some checks and balances to make sure our results don’t have knock on effects in our game.

Retention and Engagement Between Groups

It could be that players initially like blueberries but that strawberry planters have higher retention. We don’t know why that is yet but it gives us something to look into after the test.

Revenue By Group

It’s possible that blueberries have led to higher revenue in the game, like retention above, it’s not our goal to provide an answer here but instead look into why that is in future tests.

Common Pitfalls

Common A/B Testing Mistakes

Changing too many variables at once
Assigning players inconsistently
Ending tests too early
Test doesn’t affect enough players
Ignoring secondary effects (like retention or monetization changes)
Forgetting to document results for future reference

Post Test Data Review and Summary

Once the test has ended you want to review any potential issues that may have come up in your social media or support channels. Are any of these issues ones that could affect the way you view the data?

Going back to our data dashboards we want to check a couple of things

Was the hypothesis verified through our data, this is probably found in the custom chart we built.
Was our assignment split the way we expected it to?
Was any other data like retention or revenue affected positively or negatively in our test?

Based on what our data tells us along with our confidence calculation what are the outcomes of the test? I like to add this summary to the data dashboard so we have something to come back to in the future. What are the next steps, do we want to apply the changes or makes some changes and run the test again?

A/B tests aren’t just for big studios with massive data teams. With careful planning and clear hypotheses, small teams can build a steady rhythm of experimentation that improves retention, monetization, and player satisfaction.