YANSS 158 – The science behind why we find A/B testing icky when it comes to policies, practices, medicine, and social media

In 1835, at a tavern in Bavaria, a group of 120 people once met to drink from a randomized assortment of glass vials.

Before shuffling them, they divided the vials into two sets. One contained distilled water from a recent snowfall and the other a solution made by collecting 100 drops of that water and dropping into the pool a grain of salt, and then diluting a drop of the result into another 100 drops, again and again, 30 times in all.

They did this to test out a new idea in medicine called homeopathy, but it was the way they did it that changed things forever. By testing options A and B at the same time, but without telling sick people which option they would be getting, they not only debunked a questionable medical practice, they invented modern science and medicine.

About 200 years later a company in California tried something similar. A group of 700,000 people gathered inside a virtual tavern to share news and photos and stories both happy and sad. The company then used some trickery so that some people randomly encountered more happy things and others more sad things.

They did this to test out a new idea in networking called emotional contagion, but it was the way they did it that changed how many people felt about gathering online. By testing options A and B at the same time, but without telling people which option they would be getting, they not only learned if a computer program could make its users more happy or more sad, they created a backlash that resulted in a large-scale, world-wide panic.

Though we always learn something new when we perform an A/B test, we don’t always support the pursuit of that knowledge, which is strange, because without A/B testing we have to live with whatever option the world delivers to us, be it through chance or design. Should we use cancer drug A or B? Should we try gun control policy A or B? Should we try education technique A or B? It seems like our reaction to these questions would be to support testing A on half the people, B on the other, and then to look at which one works best and go with that moving forward, but as you will learn in this episode of the You Are Not So Smart Podcast, new research shows that a significant portion of the public does not feel this way, enough to cause doctors and lawmakers and educators to avoid A/B testing altogether.

Back at the tavern, they called the (barely) salt water in the vials a C30-solution because it was made using homeopathic techniques that rely on the C scale. C is short for centesimal, which means a division into hundredths. The scale was named and created by the inventor of homeopathy, Samuel Hahnemann who believed that if something caused an illness, then giving that thing to someone who was already sick would cure them. This hair-of-the-dog concept is, of course, not true, but people believed it was sound medical advice going back to the Greeks and beyond.

In the late 1700s, the time when Hahnemann became a physician, this idea had come under some scrutiny, because it killed people. Got a tummy ache? Here, take this concoction of fermented oatmeal, snake venom, and rancid meat so your stomach aches even more. It will cause vomiting and diarrhea and blood yawns, but once you are done writhing in agony, you’ll be all better. Unless you die. But if you do die, well, hey, it’s 1781. We tried our best.

Hahnemann thought that, sure, this ancient idea of similia similibus curantur, or “what makes a man ill also cures him,” caused too much harm. His innovation? Dilute the bad thing until it doesn’t do anything bad. And with that, he had invented homeopathy, which roughly translates to “kind-of like suffering, but not.”

Homeopathy became very popular in the we-have-no-idea-what-we-are-doing era of medicine. Why? Because doctors didn’t wash their hands, prescribed cocaine to children for toothaches, and used a bevy of techniques that today would land you a prison sentence even if the patient survived. In a medical environment where doing nothing at all would actually be the better option, homeopathy did indeed save lives. So, based on those results, not knowing any better, the first homeopathic schools opened in 1835, and by the turn of the century there were dozens of colleges and more than 15,000 homeopaths, each accepting cash that was covered with more molecules of medical value than could be found in the water they peddled in exchange.

Although homeopathy had its heyday, and although it is still practiced today by people who like their medicine based on ideas that predate the discovery of vitamins and vaccines, it was almost immediately criticized and ridiculed by the medical community of the 1800s.

It’s easy to see why. If you dilute something to one part in one hundred, like a single grain of salt, you get a C2-dilution. But Hahnemann said you needed to dilute things to C30, and at that point you would have to give out more than a billion doses per second to the entire human race for the length of time it took the Earth to cool into a livable planet before a single person would likely receive a single molecule of whatever it was that you diluted.

And it is this fact that lead 120 people to meet in a Tavern in Bavaria in 1835 at an event organized by “a society of truth-loving men” living in Nuremberg. They were lead by Friedrich Wilhelm von Hoven, who was the head of local hospitals there, and who was also not a big fan of homeopathy. He had written a scathing review of the practice, explaining that homeopathic remedies had zero effect on people’s health, and added that whatever it was that people experienced would be the same thing they would experience without taking anything at all. All they had was a belief that they had received a cure, he wrote, and apparently, sometimes, that was all a patient needed. Johann Jacob Reuter, a popular, local homeopath, objected to all this, and so to settle the dispute the town decided to put the stuff his conceptions to the test.

More than 100 citizens, some with illnesses and some with just curiosity, met the doctors and the truth-loving men at the tavern. The experimenters numbered 100 vials, split them into two lots, filled half with snow water and the other with the C30 homeopathic solution created by Reuter, and shuffled them. The vials were then distributed, but no one knew who got what. Independent observers recorded which vials had which solution, who had received which, and then sealed and protected the information so the recipients and the doctors couldn’t affect the results with beliefs. Three weeks later, everyone gathered again. First, people reported whether they had experienced anything. Then the sealed information was opened, and everyone learned what everyone drank. Only eight people reported any effects, and among them, half had taken the plain water and half the solution. The truth-loving men concluded homeopathy was bunk

As the Journal of the Royal Society of Medicine later reported, “The organizers concluded that the symptoms or changes which the homeopaths claimed to observe as an effect of their medicines were the fruit of imagination, self-deception and preconceived opinion.”

And with that, as noted earlier, in many ways, modern medicine was born. They had unknowingly invented the double-blind trial, randomized control groups, randomization, placebos, and many other aspects of experimental design that we use to this day. And once we figured out that we could do this, we were able to put all sorts of things to the test — medical procedures, drugs, education techniques, financial decisions, public policies, folk remedies, martial arts techniques and farming practices, and on and on and on. Instead of making a choice between A or B and living with it, or deciding to stick with tradition or choose a new path, or trying a new medicine or doing nothing at all, we could try out both choices to see which lead to the preferred outcome, and then make an evidence-based judgment how to approach the problem we were trying to solve. Once we did this, a whole lot of things that seemed to work turned out to have no effect, or to have a worse effect than their alternatives. A whole lot of practices and policies turned out to be based on superstition or wishful thinking or to be politically motivated or in some other way emotionally motivated.

So you might think that, in general, as an idea, as a practice, the A/B test would be beloved, supported, and encouraged as a way to test out policies and practices and drugs and treatments, which brings us to that company in California 200 years later.

In 2014, Facebook wanted to know if Facebook was making people sad. There was a lot of conjecture at the time that it was. After all, this was, on one hand, a highlight reel of the lives of friends and family, showing only the best-looking selfies, and best aspects of their lives. On the other hand, people were sharing a lot of outrage and political discontent, arguing about the issues of the day, and passing around memes that were often more infuriating than funny. Of all the strange changes to our daily lives and social interactions that came with widespread adoption of the platform, people wondered if it was injecting us with depression.

Facebook asked researchers at Cornell to look into it, and they did what 200 years of science said they should do. They created an A/B test, or what science calls a randomized experiment, or what medicine called a randomized controlled trial. They set aside 700,000 users, and for one week Group A saw more positive items than usual, Group B saw more negative items than usual, and a control saw what everyone else outside the experiment was seeing.

In the end, the researchers concluded that Facebook did affect our emotions like so many of us thought it might. The effects were small, but real: if people saw more negative news, it made them more sad than a control. Same for positive news. But the strongest emotions emerged when the study was revealed to the public, because the public felt angry, creeped-out, and in some new, Black-Mirror-kind-of-way, violated.

The Atlantic called it “Facebook’s Secret Mood Manipulation Experiment.” The headline in the New York Times read, “Facebook Tinkers with Users’ Emotions.” There were protests, and there were calls from lawyers, activists, and lawmakers to investigate. People called it intrusive, spooky, and scandalous. Facebook defended itself, and then everyone argued what we should do about all this. As The Guardian reported, “Clay Johnson, the co-founder of Blue State Digital, the firm that built and managed Barack Obama’s online campaign for the presidency in 2008, said: ‘The Facebook ‘transmission of anger’ experiment is terrifying.’ He asked: ‘Could the CIA incite revolution in Sudan by pressuring Facebook to promote discontent? Should that be legal? Could Mark Zuckerberg swing an election by promoting Upworthy [a website aggregating viral content] posts two weeks beforehand? Should that be legal?'”

Bioethics researcher Michelle Meyer, and her colleagues started to wonder why people reacted so poorly, considering that Facebook and other institutions, including hospitals, governments, and schools, are always testing their products, policies, and practices, just not in an A/B experimental design. Instead, they usually just change how things work, what drugs we take, or how their products function — and then we go on living taking what is offered. As she put it, Facebook had already manipulated our behavior by inventing Facebook, and then again and again each time it updated how Facebook works. But as long as the changes were universal, we didn’t freak out. Why is it, Meyer and her team asked, that making those changes random causes a backlash?

It’s an important question because, as Meyer and her team point out, “A/B tests, have long been the ‘gold standard’ for evaluating drugs and other medical interventions and are increasingly used to evaluate business products and services, government programs, education and health policies, and global aid.'”

In this episode you will learn what they found, and why they concluded that, “rigorously evaluating policies or treatments via pragmatic randomized trials may provoke greater objection than simply implementing those same policies or treatments untested.” In other words, we would rather live with an untested option A, or an untested option B, than live in a world where A and B are being tested at the same time. Meyer and her team call this the A/B effect, and you will learn all about it in this episode.

Download – iTunes – Stitcher – RSS – Patreon – Soundcloud

This episode is sponsored by The Great Courses Plus. Get unlimited access to a huge library of The Great Courses lecture series on many fascinating subjects. Start FOR FREE with The Psychology of Human Behavior taught by David W. Martin. Learn about how your mind makes sense of the world and what motivates us to think, feel, and behave differently from one another. Click here for a FREE TRIAL.

There is no better way to create a website than with Squarespace. Creating your website with Squarespace is a simple, intuitive process. You can add and arrange your content and features with the click of a mouse. Squarespace makes adding a domain to your site simple; if you sign up for a year you’ll receive a custom domain for free for a year. Start your free trial today, at Squarespace.com and enter offer code SOSMART to get 10% off your first
purchase.