Wednesday, April 25, 2018

V is for (Meta-Analysis) Variance

Variance in Meta-Analysis For the letter E, I introduced the metafor package to compute effect sizes. That is, you provide a data frame with the study information and the data needed to compute the effect size(s), and metafor does that for you. But what I didn't mention is that metafor also gives you additional information that will become very important when conducting an actual meta-analysis - and that information will be the focus of today's and tomorrow's posts.

Today, I'll talk about the concept of variance in meta-analysis and discuss how metafor gives you that information. But first, we should set up our data frames with study information and data. I'll be reusing the two data frames I created for the E post - one to compute standardized mean difference (Cohen's d) and one to compute log odds ratio.

smd_meta<-data.frame(
  id = c("005","005","029","031","038","041","041","058","058","067","067"),
  study = c(1,2,3,1,1,1,2,1,2,1,2),
  author_year = c("Ruva 2007","Ruva 2007","Chrzanowski 2006","Studebaker 2000",
                  "Ruva 2008","Bradshaw 2007","Bradshaw 2007","Wilson 1998",
                  "Wilson 1998","Locatelli 2011","Locatelli 2011"),
  n1 = c(138,140,144,21,54,78,92,31,29,90,181),
  n2 = c(138,142,234,21,52,20,18,15,13,29,53),
  m1 = c(5.29,5.05,1.97,5.95,5.07,6.22,5.47,6.13,5.69,4.81,4.83),
  m2 = c(4.08,3.89,2.45,3.67,3.96,5.75,4.89,3.80,3.61,4.61,4.51),
  sd1 = c(1.65,1.50,1.08,1.02,1.65,2.53,2.31,2.51,2.51,1.20,1.19),
  sd2 = c(1.67,1.61,1.22,1.20,1.76,2.17,2.59,2.68,2.78,1.39,1.34)
)

or_meta<-data.frame(
  id = c("001","003","005","005","011","016","025","025","035","039","045","064","064"),
  study = c(1,5,1,2,1,1,1,2,1,1,1,1,2),
  author_year = c("Bruschke 1999","Finkelstein 1995","Ruva 2007","Ruva 2007",
                  "Freedman 1996","Keelen 1979","Davis 1986","Davis 1986",
                  "Padawer-Singer 1974","Eimermann 1971","Jacquin 2001",
                  "Ruva 2006","Ruva 2006"),
  tg = c(58,26,67,90,36,37,17,17,47,15,133,68,53),
  cg = c(49,39,22,50,12,33,19,17,33,11,207,29,44),
  tn = c(72,60,138,140,99,120,60,55,60,40,136,87,74),
  cn = c(62,90,138,142,54,120,52,57,60,44,228,83,73)
)

Now we'll rerun the code from that previous post that we used to generate effect sizes. We reference the original data frame when we run this code if we want the computed variables to be appended to the existing data frame.

library(metafor)
## Loading required package: Matrix
## Loading 'metafor' package (version 2.0-0). For an overview 
## and introduction to the package please type: help(metafor).
smd_meta <- escalc(measure="SMD", m1i=m1, m2i=m2, sd1i=sd1, sd2i=sd2, n1i=n1,
                   n2i=n2,data=smd_meta)
or_meta <- escalc(measure="OR", ai=tg, bi=(tn-tg), ci=cg, di=(cn-cg),
                  data=or_meta)

Let's take a quick look at the updated data frame, starting with smd_meta.

smd_meta
##     id study      author_year  n1  n2   m1   m2  sd1  sd2      yi     vi
## 1  005     1        Ruva 2007 138 138 5.29 4.08 1.65 1.67  0.7269 0.0154
## 2  005     2        Ruva 2007 140 142 5.05 3.89 1.50 1.61  0.7433 0.0152
## 3  029     3 Chrzanowski 2006 144 234 1.97 2.45 1.08 1.22 -0.4099 0.0114
## 4  031     1  Studebaker 2000  21  21 5.95 3.67 1.02 1.20  2.0087 0.1433
## 5  038     1        Ruva 2008  54  52 5.07 3.96 1.65 1.76  0.6464 0.0397
## 6  041     1    Bradshaw 2007  78  20 6.22 5.75 2.53 2.17  0.1893 0.0630
## 7  041     2    Bradshaw 2007  92  18 5.47 4.89 2.31 2.59  0.2444 0.0667
## 8  058     1      Wilson 1998  31  15 6.13 3.80 2.51 2.68  0.8927 0.1076
## 9  058     2      Wilson 1998  29  13 5.69 3.61 2.51 2.78  0.7867 0.1188
## 10 067     1   Locatelli 2011  90  29 4.81 4.61 1.20 1.39  0.1592 0.0457
## 11 067     2   Locatelli 2011 181  53 4.83 4.51 1.19 1.34  0.2603 0.0245

You'll notice that, in addition to the information from the data frame creation code above, I have two new variables: yi and vi. yi refers to the effect size for that study. vi is the variance associated with the effect size for that study. This information is important for computing the study weight, which I'll talk about tomorrow, as well as generating confidence intervals around the overall effect size, which I'll talk about Sunday. But you might be wondering where this variance comes from. For that, stand back - I'm about to use math.

Meta-analysis uses weights based on sample size. The reason for that is because meta-analysis is attempting to take information from multiple studies on a topic and estimate the underlying effect size that these studies are trying to estimate. We weight on sample size because studies done on more people are expected to be better able to estimate the true population value, so larger studies get more weight. On the flip side, larger studies are expected to have more precision in their estimate, so larger studies have smaller variance. In meta-analysis, the variance is estimated based on sample size. For standardized mean difference, the variance is computed as:

vi = ((n1+n2)/(n1*n2)) + (yi2/(2*(n1+n2)))

So for the first effect size, which is 0.7269, we can recreate the computed variance (within some rounding error):

((138+138)/(138*138)) + (0.7269/(2*(138+138)))
## [1] 0.0158096

That calculation is done for each effect size, and fortunately, metafor will do it for you. And since sample size is part of the calculation, you'll note that variances are larger for smaller studies and smaller for larger studies.

library(ggplot2)
ggplot(smd_meta, aes(x=vi, y=(n1+n2))) + geom_point() +
  geom_text(aes(label=id),hjust=0, vjust=0) +
  scale_x_continuous(breaks=seq(0,0.15,0.01)) +
  scale_y_continuous(breaks=seq(0,400,25)) +
  labs(x = "Variance", y = "Total Sample Size") + theme_bw()


I did the same calculations for the or_meta data frame. Variance for odds ratio is calculated based on the sample size in each of the four cells (treatment-outcome1 (a), treatment-outcome2 (b), control-outcome1 (c), and control-outcome2 (d)):

vi = (1/na) + (1/nb) + (1/nc) + (1/nd)

So again, larger studies will have smaller variances, though imbalance among those four cells (for instance, having a lot more people in one group than another) will inflate variance somewhat. We can see that for study 045, which has a lot more people in the control group than treatment group.

or_meta
##     id study         author_year  tg  cg  tn  cn      yi     vi
## 1  001     1       Bruschke 1999  58  49  72  62  0.0945 0.1860
## 2  003     5    Finkelstein 1995  26  39  60  90  0.0000 0.1131
## 3  005     1           Ruva 2007  67  22 138 138  1.6046 0.0831
## 4  005     2           Ruva 2007  90  50 140 142  1.1976 0.0620
## 5  011     1       Freedman 1996  36  12  99  54  0.6931 0.1508
## 6  016     1         Keelen 1979  37  33 120 120  0.1615 0.0809
## 7  025     1          Davis 1986  17  19  60  52 -0.3759 0.1650
## 8  025     2          Davis 1986  17  17  55  57  0.0513 0.1690
## 9  035     1 Padawer-Singer 1974  47  33  60  60  1.0845 0.1655
## 10 039     1      Eimermann 1971  15  11  40  44  0.5878 0.2279
## 11 045     1        Jacquin 2001 133 207 136 228  1.5035 0.3933
## 12 064     1           Ruva 2006  68  29  87  83  1.8968 0.1203
## 13 064     2           Ruva 2006  53  44  74  73  0.5089 0.1237
ggplot(or_meta, aes(x=vi, y=(tn+cn))) + geom_point() +
  geom_text(aes(label=id),hjust=0, vjust=0) +
  scale_x_continuous(breaks=seq(0,0.4,0.05)) +
  scale_y_continuous(breaks=seq(0,375,25)) +
  labs(x = "Variance", y = "Total Sample Size") + theme_bw()


Tune in tomorrow, when I'll talk about weights for meta-analysis, which is based on this variance calculation!

1 comment:

  1. Great post. Thank you. :)
    Do you happen to have any reference for the variance formula: vi = ((n1+n2)/(n1*n2)) + (yi2/(2*(n1+n2)))?

    ReplyDelete