Todd Maddox reports on why A/B tests are sub-optimal in building and optimizing immersive training and performance tools.
I have written extensively on the potential of virtual reality (VR) and augmented reality (AR) to revolutionize training and performance in healthcare, manufacturing, and corporate learning, to name a few. Learning is about the experience, and VR and AR are grounded in experiential learning. VR offers a virtual experience and AR overlays computer-generated assets onto the real world to drive learning and performance.
Unfortunately, too often the primary focus is on the technology, with much less focus on optimizing the interface and the experience for the user (the UI/UX). Although computing power, specialized graphics, controllers, and the like are important, their value is diminished if they don’t engage the user’s brain in a way that is effective at achieving the desired training and performance goal. Too often the technological “wow” factor dominates when what is more important is how the user’s brain is engaged to achieve a set goal.
To optimize the UI/UX we must incorporate what is known about the brain and how it processes information, then apply the scientific method to test hypotheses and to identify the optimal solution. For example, one might test the hypothesis that a hands-free AR training solution leads to faster task completion than a hand-held AR training solution. To test this hypothesis, suppose that 10 workers completed the task with the hands-free AR solution, and 10 other workers completed the task with the hand-held AR training solution.
Under the assumption that the only difference between the two groups of workers is the type of training device, the time to completion would be noted for each of the 20 workers and a simple statistical test (called a Student’s t-test) would be conducted to compare the time to completion between the two groups. Although a detailed description of probability theory that underlies statistical reasoning is beyond the scope of this report, suffice it to say that the outcome of this statistical test allows one to claim (or not claim) beyond a “statistical reasonable doubt” that the hands-free AR solution leads to faster time to completion than the hand-held AR solution.
The experiment described above is referred to as an A/B test in the corporate sector. It is the simplest type of experiment that can be conducted, but it is extremely powerful and is used extensively. If you want to determine whether a new drug has the desired effect, you might conduct an A/B tests of the drug against a placebo. If you want to determine if your new technology or product is superior to the existing gold standard, you might conduct an A/B test. An A/B test allows you to test the hypothesis that one group is different from another on some outcome measure, such as time to completion.
I spent 25 years teaching statistics and experimental design to University students, and during that time, I conducted 100s of experiments in my laboratory. I can count on one hand the number of A/B tests that I conducted.
Why so few?
Although there is nothing wrong with an A/B test, factorial designs provide more value at no additional cost. In an A/B test there is one factor. In the example above, the factor would be the type of AR training device (hands-free vs. hand-held). Suppose a second factor was included that represented time pressure. Specifically, suppose that 5 of the workers using the hands-free device and 5 of the workers using the hand-held device were simply told to complete the task, whereas the other 5 from each group were told that they must complete the task within 15 minutes when it normally takes about 25 minutes. This factorial design has four groups of workers: hands-free/no time pressure, hands-free/time pressure, hand-held/no time pressure, and hand-held/time pressure. More importantly, it allows one to test three hypotheses.
- Hypothesis 1: Does a hands-free device speed time to completion?
- Hypothesis 2: Does time pressure speed or slow time to completion?
- Hypothesis 3: Does the presence or absence of time pressure affect time to completion differently for hands-free vs. hand-held AR training tool?
In the simple A/B test example, 20 workers completed the task, and Hypothesis 1 was tested. In the factorial design example, 20 workers completed the task and all three hypotheses could be tested. This is the power of a factorial design over an A/B test, and this is one reason why the overwhelming majority of experimental science, especially behavioral science, relies exclusively on factorial designs. A second reason follows from Hypothesis 3.
Hypothesis 3 is especially interesting and addresses the possibility that there is an “interaction” between time pressure and the nature of the device on time to completion. I would predict that a hands-free device is much less susceptible to the deleterious effects of time pressure than a hand-held device. Of course, this is an empirical question.
Interactions abound in the real world, especially the world of behavioral science. We have all heard of drug interactions. For example, some research suggests that a glass or two of wine can reduce stress, and a small dose of Xanax can reduce anxiety, but combining alcohol and Xanax can be fatal. That is a drug interaction. The combined effect of alcohol and Xanax is not simply the sum of the independent effects. Rather the effect is interactive, large, and potentially fatal.
Anytime a human is involved and is physically processing something like a drug, or is mentally processing and engaging with technology, interactions are likely. This is the second reason why factorial designs are so popular. You will miss important findings if you focus exclusively on A/B tests.
The human brain is composed of multiple learning and performance systems. These include the cognitive learning and performance system in the brain that relies on the prefrontal cortex and is constrained by working memory and attentional limitation. The behavioral skills learning and performance system in the brain relies on the striatum and learns stimulus-response associations via incremental, dopamine-mediated reward and punishment learning. Interestingly, this system is not constrained by working memory and attention. In fact, “overthinking” hurts behavioral learning.
The emotional learning and performance system in the brain relies on the amygdala and other limbic structures. This addresses issues of stress, pressure, and situational awareness broadly speaking. Finally, the experiential learning and performance system in the brain represents the sensory and perceptual aspects of the environment and relies on the occipital, temporal, and parietal lobes.
Each system has its own unique processing characteristics, and thus each system is optimized with different technologies and processes. For example, the critical bottleneck within the cognitive system is cognitive load. One almost always wants to reduce cognitive load. This means that the what, where, and when of VR or AR assets must be examined to determine the optimal combination to achieve a given goal. The best way to do this is with a factorial design.
The what, where, and when are each a separate factor that must be combined and studied factorially to optimize performance. The nature and timing of corrective feedback in the behavioral system must be examined with a factorial design, and of course the emotional and motivational aspects (stress, pressure, engagement) as well as the detailed sensory and perceptual processes must be examined.
The basic science literature is full of fascinating results in each of these domains that can guide VR and AR product optimization, but in the end, one has to rely on science and factorial designs to “get it right”. The plethora of data that can be extracted from VR and AR solutions and the ease of manipulating relevant factors (what, where and when of assets or the nature of timing of the feedback) makes factorial designs straightforward to conduct.
VR and AR training and performance tools have already provided significant ROI relative to traditional training approaches. If you have a basic understanding of the breadth of neural engagement with these technologies, then you are not surprised by this ROI. Even so, these initial ROI benchmarks are a starting point. Much more ROI is awaiting those organizations who are ready to conduct the factorial design experiments necessary to find the “sweet spots” in the UI/UX, and there are many, that will show the true value of these technologies.
These immersive training and performance technologies are still in the early stage. They are reaping benefits for their users because of the way that they engage the brain relative to traditional approaches. As with any young science or technology, the early wins are often large, but more wins are to follow if solid research and development is conducted that leverages what we know about the brain, applies the scientific method, and uses factorial designs as a window onto the interactions that optimize UI/UX.