Sampling Bias

We’ve all experienced research studies in one way or another. Everything from commercials claiming positive customer feedback, to medications prescribed for different conditions underwent some sort of research. Perhaps you’ve even taken part in a research study. But not all studies are equal in terms of accuracy and completeness of information. Nor are all the methods of gathering information for the study equally accurate.

When a scientist wants to study something, first he sets up a test. Then, he needs to look at the group of animals or people that he wants to study. This group—called a population—is made up of individuals who are affected by that something, or likely to be involved in it somehow. Since most populations are far too large to test each individual, most studies try to pick a random number from among the population. The idea is that the smaller sample will more or less represent the larger whole. However, problems arise when the investigators get into sampling bias.

Sampling bias is described as “a sample of a group that does not equally represent the members of that group.”[1] That basically means that some individuals were sampled more than other individuals, resulting in misleading information that doesn’t accurately represent the whole population. For example, if researchers wanted to study the effect of a certain food item on the development of heart disease, but mostly sampled people with unhealthy lifestyles, then the results would be skewed. This could be a random effect that cannot be predicted or prevented, or it could be the result of faulty study planning. Either way, it gives us the wrong answer, and a great deal of planning for population studies involves minimizing this bias.

Now let’s look at this in a real-life setting. Most of us have heard about rabies, and know that raccoons, bats, skunks, and foxes might give it to us if they bite us. Beyond that, only a few who aren’t wildlife or medical specialists learn about the details of this disease. For one thing, more than 90% of the rabies cases reported in the United States since 1980 have been in wildlife, not humans or pets.[2] It actually accounts for very few human deaths, and is the 2nd rarest disease after polio.[3]

https://www.cdc.gov/rabies/exposure/animals/wildlife_reservoirs.html

This isn’t to say that rabies shouldn’t be taken seriously. It is a deadly disease, and we need to be cautious if we come into contact with wild animals in order to avoid ourselves or our pets becoming infected. There are ongoing projects attempting to eradicate rabies from wildlife so that it’s no longer a threat. To aid in such an endeavor researchers have studied the percent of the wild animal population that is infected with the disease. However, the only test that tells us absolutely whether an animal has rabies or not can only be performed after death.[4] As you may imagine, this can result in a fair amount of sample bias.

If we get our data primarily from the cases that were presented for having rabies, it will seem as though a huge percent of the population is infected. The Center for Disease Control monitors the number of rabies cases reported, and stated: “For the present report, percentages of rabid animals were calculated on the basis of total numbers of animals tested. These percentages are likely not reliable indicators of the true incidence of rabies within animal populations because most animals submitted for testing were selected on the basis of abnormal behavior or visible illness or were involved in a potential exposure incident, biasing the sample submitted for testing.”[5] This means that, since most of the animals caught were abnormal in some way, we only know how many of the abnormal animals have rabies, not how many in the entire population have rabies.

The best way to reduce biased sampling would be to proactively catch and test animals in their natural habitat. This would increase the chances of getting a sample as close to the actual population as possible. This would still pose risks of bias, however. If traps were set out, it could be more likely for weak, sick, very young, or very old animals to be caught, which would affect the results. There would have to be a method of catching representatives from every group of the population: young, old, healthy, sick, weak, strong; and to catch them in the proportion that they exist in the population.

There are certain biases that can never be completely removed. Take this test for an example. What percentage of the wild raccoon population is sick? How many are weak, how many are strong? We can’t know this for sure without sampling the entire population, which is impossible. No matter what we do, there will always be some unknowns, and thus our results will always be a little off from the true number. However, the larger the sample size and the more meticulous the study parameters, the closer our results will be to the actual number.


[1] Sample bias. (n.d.) Medical Dictionary. (2009). Retrieved March 17 2019 from https://medical-dictionary.thefreedictionary.com/Sample+bias

[2] https://avmajournals.avma.org/doi/pdfplus/10.2460/javma.248.7.777

[3] https://cpw.state.co.us/learn/Pages/LivingwithWildlifeBatsRabies.aspx

[4] https://www.cdc.gov/rabies/diagnosis/animals-humans.html

[5] https://avmajournals.avma.org/doi/pdfplus/10.2460/javma.248.7.777

Leave a comment