Section 1: Chapter 7:
USMLE Biostatistics: Confounding Bias & Methods to Prevent It
The different kinds of research bias are a favorite of the USMLE. Learn everything you need to know about the confounding bias here.
Consider a study which aims to find if variable A is causally associated with variable D (in other words, does the presence of variable A lead to variable D). In reality, variable A does not cause variable D (they are not causally associated), however, the study finds that a causal association exists. How could this have occurred? The reason this occurred is because there was a third variable, called variable B, which was causally associated with variable D. For some reason, variables A and B happened to occur together or coexist. Since the researchers were not aware of the presence of variable B, they incorrectly assumed that variable A was causing variable D when it reality it wasn’t. In this example, variable B was a confounder in the association between variables A and D. This is an example of confounding bias.
The confounding bias is a favorite of the USMLE biostatistics portion. You should be prepared to identify examples of confounding bias when given a question stem. As we said in the previous chapters, there are three main “families” of bias; selection bias, measurement/information bias, & confounding bias. Confounding bias is a type of measurement error which occurs when there is no association between the exposure and the outcome, but there are variables other than the exposure variable which are associated with the outcome variable and are creating a spurious (or false) association. Confounding can be hard to conceptualize at first because it involves the interplay between multiple variables and on top of that, it is easily confused with another similar, yet unrelated topic called effect modification. Let’s break down confounding bias in order to make it easier to understand.
Let’s start with a phrase which you are likely to have heard before. “The results or findings were confounded by blank….” This is phrase is typically used to describe that the results or outcome of something was confused or made uncertain (confounded) by something. For example, “the results of the fasting glucose test were confounded by the fact that the patient was taking high levels of glucocorticoids.” You probably already know that glucocorticoids significantly alter serum glucose levels, therefore, they likely influenced the test result. In other words, glucocorticoid use, by virtue of their influence on glucose levels, “confused” or made uncertain the results of the test. This is the reason why the word “confounding” originates from the latin word confundere which means “to confuse.” If you can understand this, then you can fully understand the confounding bias and how to identify it.
Okay, now let’s take a deep dive into what is confounding bias. Confounding bias involves three variables and the relationship between these variables. These variables are:
- The independent variable or exposure variable
- The dependent variable or outcome variable
- The extraneous variable or confounding variable
We have talked about the first two variables in earlier chapters. What is an extraneous variable? An extraneous variable is a variable which is not part of the exposure or outcome variable. It is an “extra” variable. This extra variable is often referred to as a third variable because it is not part of the first two types of variables (aka the exposure or outcome variable). A confounding variable is a third variable because it is not a part of the exposure or outcome variables. So whenever we talk about confounding bias, you should know that we are referring to three variables (exposure variable, outcome variable and confounding variable) and the relationship between these variables, period.
Okay, so now that we know that confounding bias involves three different kinds of variables and the relationship between these variables, let’s talk about the relationship between the variables. There are three rules regarding the relationship between the variables which must be fulfilled in order for confounding bias to exist. Do not get overwhelmed, we will explain these rules in just a bit.
- The confounding variable must be causally associated with the outcome variable. In other words, the confounding variable must lead to the outcome variable. This is very important because if the confounding variable is not associated with the outcome variable in any way, then it will have zero influence over the results of the study. Confounding bias cannot occur under these circumstances.
- The confounding variable must be related with the exposure variable in some way other than causally. In other words, the confounding variable and exposure variable must occur together, but the exposure variable must not cause the confounding variable. Let conceptualize this with an example. In the relationship between drinking coffee and the development of lung cancer, smoking is a confounder. If you were to conduct a study looking at the relationship between drinking coffee and the risk of lung cancer, you may find that drinking coffee is associated with increased risk of lung cancer, provided that a large amount of your study population also smokes in addition to drinking coffee. In reality, coffee is not associated increased risk of lung cancer, however, smoking definitely is and since smoking and drinking coffee “tend to occur together” you may incorrectly conclude that it is the coffee causing the lung cancer. Going back to our second rule, why can’t the exposure variable cause the confounding variable? Let’s put it in terms of our example. If drinking coffee (the exposure variable) lead to smoking (the confounding variable), then it could be argued that the exposure is associated with the outcome (lung cancer) via an intermediate variable (smoking). In other words, you could say that drinking coffee leads to smoking which in turn leads to lung cancer. You certainly know that this is simply not the case. Drinking coffee does not lead to smoking. This is why the second rule is important. If the exposure variable leads to the confounding variable and the confounding variable leads to the outcome variable (rule #1) then, the confounding variable is not a true confounder, but rather an intermediate variable between the exposure and outcome.
- The confounding variable must be unequally distributed between the exposure group and control group. In other words, the frequency of the confounding variable must differ between the control group and the experimental group. If the frequency of the confounding variables is equal between study groups, then the effects will cancel out and there will be no influence on the study results. Let’s use our previous example to conceptualize this. Let’s imagine you design a study to analyze the relationship between drinking coffee and lung cancer risk. In order to do this, you recruit 100 people who drink coffee and 100 people who do not drink coffee and then follow them up to see which groups develops more lung cancer. Suppose each group has 50 individuals who smoke, in other words, the frequency of the confounding variable is equal between control and experimental group. Assuming that 50% of the smokers develop cancer at the end of the follow up period, your results will be 25 cases of lung cancer out of 100 in both the experimental and control group, relative risk = 0.25/0.25 = 1. A relative risk of 1 indicates no difference in risk between groups and therefore no association. Therefore, if the frequency of the confounding variable is equal between experimental and control group, they will tend to cancel out and have little to no influence on the results. Therefore in order for confounding bias to occur, the frequency of the confounding variable must differ between the experimental and control groups. In our previous example, since smoking and drinking coffee tend to occur together, this lends itself to the situation in which the coffee drinking group will tend to have a higher frequency of the confounding variable (smoking). This is often the case in research. The exposure variable and confounding variable will tend to coexist which will lead situations in which the experimental group will have higher frequencies of the confounding variable and therefore introduce confounding bias into the study.