Showing posts with label grad school. Show all posts
Showing posts with label grad school. Show all posts

Sunday, June 24, 2018

Statistics Sunday: Converting Between Effect Sizes for Meta-Analysis

Converting Between Effect Sizes I'm currently working on my promised video on mixed effects meta-analysis, and was planning on covering this particular topic in that video - converting between effect sizes. But I decided to do this as a separate post that I can reference in the video, which I hope to post next week.

As a brief refresher, meta-analysis is aimed at estimating the true effect (or effects) in an area of study by combining findings from multiple studies on that topic. Effect sizes, the most frequently used being Cohen's d, Pearson's r, and log odds ratio, are estimated from information in study reports and presentations. There's a lot of variation in how clearly reports and documents describe the findings and the information given to estimate the study's overall effect. But when you conduct the meta-analysis, whether using fixed, random, or mixed effects analysis, you need to use only one type of effect size. That means that, sometimes, studies will give you a different type of effect size than you plan to use. Fortunately, there are ways to convert between effect sizes and use different types of statistical information to generate your estimates.

First up, converting between those key effect sizes. In the meta-analysis I performed in grad school, I examined the effect of pretrial publicity on guilt. There are two ways guilt was frequently operationalized in the studies: as a guilty/not guilty verdict or as a continuous guilt rating. For those outcomes, we would likely use, respectively, log odds ratio and Cohen's d. The escalc function in the metafor package can compute log odds ratio for guilty/not guilty counts, and Cohen's d for mean and standard deviation of the guilt ratings. But studies may use different types of information when presenting their results, so you may not be able to simply compute those effect sizes.

For instance, a study using verdict may present a chi-square and one of its effect sizes, Cramer's V, which is very similar to a correlation coefficient. How can I convert that into log odds ratio?

To convert from one effect size to the other, you need to follow a prescribed path, which can be seen in the diagram below. What this diagram tells you is which effect sizes you can convert between directly: you can directly convert between log odds ratio and Cohen's d, and between Cohen's d and Pearson's r. If you wanted to convert between Pearson's r and log odds ratio, you'll first need to convert to Cohen's d. You'll need to do the same thing for variance - compute it for the native effect size metric, then convert that to the new effect size metric.


Let's start by setting up functions that will convert between our effect sizes for us, beginning with Cohen's d and log odds ratio. Then we'll demonstrate with some real data.

#Convert log odds ratio to d
ltod <- function(lor) {
  d = lor * (sqrt(3)/pi)
  return(d)
}
vltovd <- function(vl) {
  vd = vl * (3/pi^2)
  return(vd)
}

#Convert d to log odds ratio
dtol <- function(d) {
  lor = d*(pi/sqrt(3))
  return(lor)
}
vdtovl <- function(vd) {
  vl = vd*(pi^2/3)
  return(vl)
}

You'll notice a mathematical symmetry in these equations - the numerators and denominators switch between the equations. Now let's set up equations to r and d. These equations are slightly more complex and will require a few additional arguments. For instance, converting the variance of r to variance of d requires both the variance of r and r itself. Converting from d to r requires group sample sizes, referred to as n1 and n2.

#Convert r to d
rtod <- function(r) {
  d = (2*r)/(sqrt(1-r^2))
  return(d)
}
vrtovd <- function(vr,r) {
  vd = (4*vr)/(1-r^2)^3
  return(vd)
}

#Convert d to r
dtor <- function(n1,n2,d) {
  a = (n1+n2)^2/(n1*n2)
  r = d/(sqrt(d^2+a))
  return(r)
}
vdtovr <- function(n1,n2,vd,d) {
  a = (n1+n2)^2/(n1*n2)
  vr = a^2*vd/(d^2+a)^3
  return(vr)
}

Remember that the metafor package can compute effect sizes and variances for you, so you might want to run the escalc on the native effect sizes so that you have the estimates and variances you need to run these functions. But if you ever find yourself having to compute those variances by hand, here are the equations, which we'll use in the next step.

vard <- function(n1,n2,d) {
  vd = ((n1+n2)/(n1*n2)) + (d^2/(2*(n1+n2)))
  return(vd)
}

varr <- function(r,n) {
  vr = (1-r^2)^2/(n-1)
  return(vr)
}

varlor <- function(a,b,c,d) {
  vl = (1/a)+(1/b)+(1/c)+(1/d)
  return(vl)
}

One of the studies I included in my meta-analysis gave Cramer's V. It had a sample size of 42, with 21 people in each group. I'd like to convert that effect size to log odds ratio. Here's how I could do it.

cramerv <- 0.67
studyd <- rtod(cramerv)
studyvr <- varr(0.67,42)
studyvd <- vrtovd(studyvr,cramerv)
dtol(studyd)
## [1] 3.274001
vdtovl(studyvd)
## [1] 0.5824038

I can now include this study in my meta-analysis of log odds ratios.

What if my study gives different information? For instance, it might have given me a chi-square or a t-value. This online effect size calculator, created by David Wilson, coauthor of Practical Meta-Analysis, can compute effect sizes for you from many different types of information. In fact, spoiler alert: I used an earlier version of this calculator extensively for my meta-analysis. Note that this calculator returns odds ratios, so you'll need to convert those values into a log odds ratio.

Wednesday, May 2, 2018

Statistical Sins: Is Your Classification Model Any Good?

Prediction with Binomial Regression April A to Z is complete! We now return to your regularly scheduled statistics blog posts! Today, I want to talk about an issue I touched on during A to Z: using regression to predict values and see how well your model is doing.

Specifically, I talked a couple of times about binomial regression (here and here), which is used to predict (read: recreate with a set of variables significantly related to) a binary outcome. The data example I used involved my dissertation data and the binary outcome was verdict: guilty or not guilty. A regression model returns the linear correction applied to the predictor variables to reproduce the outcome, and will highlight whether a predictor was significantly related to the outcome or not. But a big question you may be asking of your binomial model is: how well does it predict the outcome? Specifically, how can you examine whether your regression model is correctly classifying cases?

We'll start by loading/setting up the data and rerunning the binomial regression with interactions.
dissertation<-read.delim("dissertation_data.txt",header=TRUE)
dissertation<-dissertation[,1:44]
predictors<-c("obguilt","reasdoubt","bettertolet","libertyvorder",
              "jurevidence","guilt")
dissertation<-subset(dissertation, !is.na(libertyvorder))

dissertation[45:50]<-lapply(dissertation[predictors], function(x) {
  y<-scale(x, center=TRUE, scale=TRUE)
  }
)

pred_int<-'verdict ~ obguilt.1 + reasdoubt.1 + bettertolet.1 + libertyvorder.1 + 
                  jurevidence.1 + guilt.1 + obguilt.1*guilt.1 + reasdoubt.1*guilt.1 +
                  bettertolet.1*guilt.1 + libertyvorder.1*guilt.1 + jurevidence.1*guilt.1'
model<-glm(pred_int, family="binomial", data=dissertation)
summary(model)
## 
## Call:
## glm(formula = pred_int, family = "binomial", data = dissertation)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.6101  -0.5432  -0.1289   0.6422   2.2805  
## 
## Coefficients:
##                         Estimate Std. Error z value Pr(>|z|)    
## (Intercept)             -0.47994    0.16264  -2.951  0.00317 ** 
## obguilt.1                0.25161    0.16158   1.557  0.11942    
## reasdoubt.1             -0.09230    0.20037  -0.461  0.64507    
## bettertolet.1           -0.22484    0.20340  -1.105  0.26899    
## libertyvorder.1          0.05825    0.21517   0.271  0.78660    
## jurevidence.1            0.07252    0.19376   0.374  0.70819    
## guilt.1                  2.31003    0.26867   8.598  < 2e-16 ***
## obguilt.1:guilt.1        0.14058    0.23411   0.600  0.54818    
## reasdoubt.1:guilt.1     -0.61724    0.29693  -2.079  0.03764 *  
## bettertolet.1:guilt.1    0.02579    0.30123   0.086  0.93178    
## libertyvorder.1:guilt.1 -0.27492    0.29355  -0.937  0.34899    
## jurevidence.1:guilt.1    0.27601    0.36181   0.763  0.44555    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 490.08  on 354  degrees of freedom
## Residual deviance: 300.66  on 343  degrees of freedom
## AIC: 324.66
## 
## Number of Fisher Scoring iterations: 6

The predict function, which I introduced here, can also be used for the binomial model. Let's have R generate predicted scores for everyone in the dissertation sample:

dissertation$predver<-predict(model)
dissertation$predver
##   [1]  0.3907097456 -4.1351129605  2.1820478279 -2.8768390246  2.5804618523
##   [6]  0.4244692909  2.3065468369 -2.7853434926  0.3504760502 -0.2747339639
##  [11] -1.8506160725 -0.6956240161 -4.7860574839 -0.3875950731 -2.4955679446
##  [16] -0.3941516951 -4.5831011509  1.6185480937  0.4971923298  4.1581842900
##  [21] -0.6320531052 -4.8447046319 -2.3974890696  1.8566258698  0.0360685822
##  [26]  2.2151040131  2.3477149003 -2.4493726369 -0.2253481404 -4.8899805287
##  [31]  1.7789459288 -0.0978703861 -3.5541042186 -3.6009218603  0.1568318789
##  [36]  3.7866003489 -0.6371816898 -0.7047761441 -0.7529742376 -0.0302759317
##  [41] -0.1108055330  1.9751810033  0.2373614802  0.0424471071 -0.4018757856
##  [46]  0.0530272726 -1.0763759980  0.0099577637  0.3128581222  1.4806679691
##  [51] -1.7468626219  0.2998282372 -3.6359162016 -2.2200774510  0.3192366472
##  [56]  3.0103216033 -2.0625775984 -6.0179845235  2.0300503627  2.3676828409
##  [61] -2.8971753746 -3.2131490026  2.1349358889  3.0215336139  1.2436192890
##  [66]  0.2885535375  0.2141821004  1.9480686936  0.0438751446 -1.9368013875
##  [71]  0.2931258287  0.5319938265  0.0177643261  3.3724920900  0.0332949791
##  [76]  2.5935500970  0.7571810150  0.7131757400  2.5411073339  2.8499853550
##  [81]  2.8063291084 -0.4500738791  1.4700679077 -0.8659309719  0.0870492258
##  [86]  0.5728074322  0.1476797509  2.4697257261  2.5935500970 -2.2200774510
##  [91] -0.0941827753  1.3708676633  1.4345235392 -0.2407209578  2.4662700339
##  [96] -1.9687731888 -6.7412580522 -0.0006224018 -4.4132951092 -2.8543032695
## [101]  1.2295635352  2.8194173530  0.1215689324 -3.8258079371  1.8959803882
## [106] -4.5578801595  2.3754402614  0.0826808026  1.5112359711 -3.5402060466
## [111]  0.2556657363  0.7054183194  1.4675797244 -2.3974890696  2.6955929822
## [116] -0.3123518919 -4.8431862346 -2.0132721372  0.4673405434 -2.3053405270
## [121]  1.9498822386 -0.5164183930 -1.8277820872 -0.0134750769 -2.3013547136
## [126] -0.2498730859 -4.4281010683 -0.0134750769 -0.2604532514  0.1476797509
## [131] -2.3392939519 -2.0625775984 -3.5541042186  1.5087477879 -4.6453051124
## [136]  2.0616474606 -3.2691362859 -7.3752231145 -1.6666447439  1.0532964013
## [141] -2.0625775984 -0.3355312717  2.2481601983 -2.2200774510 -4.3276959075
## [146]  0.8685972087 -0.7727065311  1.7511589809 -0.4774548995  0.0008056357
## [151]  1.7022334970 -0.4202625135 -0.2902646169  2.4409712692  0.0008056357
## [156]  0.0008056357 -3.6009218603 -0.8567788439 -0.4528474822  0.3517462520
## [161]  0.1307210605 -3.7843118182 -2.8419024763 -3.5191098774 -0.1460684795
## [166]  1.8809888141  2.8194173530 -2.4656469123  1.0589888029  0.1659840070
## [171]  1.4345235392  2.3676828409  1.5749534339 -0.1681557545  2.6406620359
## [176]  0.1476797509 -2.2135177411  1.9168260534 -3.4993205379  0.4557086940
## [181] -3.8136089417 -0.1121510987 -3.9772095600  1.3849234171  0.3504760502
## [186]  2.3807710856 -3.0667307601  2.3040586537  1.7599138086 -0.2083894500
## [191]  0.6844579761 -0.3552635652 -1.9459392035 -0.6075281598 -2.1663310490
## [196]  2.3676828409 -1.9205271122 -2.2334295071 -4.4265826710 -1.0117771483
## [201] -0.0161530548 -0.3072233074 -0.0161530548 -0.7451676752 -7.0351269313
## [206]  2.6406620359 -3.7523234832 -0.2498730859  2.0222929422  3.2886316225
## [211] -1.6221457956  2.4749949634  1.7570711677  0.0904873650 -4.7332807307
## [216]  0.1568318789 -0.0302759317  0.5127229828  1.3097316594 -6.9309218514
## [221]  0.0515992352 -0.4514194447 -0.2253481404 -4.7652690656 -0.4279866041
## [226] -4.4136563866 -3.7618312672  0.0156676181 -0.2590252139  2.6076058507
## [231]  1.6420333133 -3.9985172969 -6.2076483227  0.1632104039  0.1829426974
## [236] -4.7652690656 -4.4212844958  1.6001906117  0.8579971472 -3.8699110198
## [241]  0.3022779567 -0.1679979189  1.9421248181  0.6592738895  1.6132788564
## [246] -0.0366544567 -3.4818233673 -3.9422152187 -0.3473613776  0.4321933815
## [251]  0.7480288869 -0.2498730859 -1.9861068488 -2.2297920164 -0.7621263656
## [256]  1.2966434147  0.1632104039  0.2048721368  1.7789459288  0.4926393080
## [261]  0.4096285430 -1.7794744955 -2.5822853071  2.0413250624 -6.6574350219
## [266] -0.1277642235 -2.1972434657 -2.5075677545 -0.4482774141 -0.6943740757
## [271] -0.7821891015  6.3289445390  0.1568318789  0.1165981835  1.4781797859
## [276] -4.2287015488 -3.6157278195 -0.1511970641 -0.7047761441  2.0935344484
## [281] -3.8258079371 -4.4231102471  1.3097316594  3.4081542651 -0.4996175382
## [286] -2.0534397824  0.9783975145 -2.2562634924  3.7196170683  1.1110084017
## [291]  2.1661785291 -4.2138955896  1.9421248181  2.3065468369 -0.7139282722
## [296] -4.1431023472 -2.0854115837  2.9389399956  1.7711269214 -0.0302759317
## [301] -2.6458711124  0.5856241187 -0.1199576611  1.8566258698 -2.2383553905
## [306]  2.3807710856 -0.2838860920  3.1176953128  2.8499853550  2.8063291084
## [311]  0.0034011417 -0.4683781352 -3.0377484314 -1.3833686805  1.7764577456
## [316]  1.7842151661  3.4081542651  0.1165981835 -4.6988069009 -2.6013721641
## [321]  2.0616474606 -0.2498730859 -4.2207121622  4.1705330009  5.2103776377
## [326] -4.5406977837 -1.5080855068 -2.5232652805 -5.7259789038  2.5211393933
## [331] -0.3487069432 -2.5035573312 -2.2764097339 -5.8364854607 -1.8694684539
## [336]  1.3402996614  0.5728074322  0.3663267540 -0.1603491921 -2.1690805453
## [341] -1.4105339689  3.0768201201 -5.1065624241 -4.5966850670 -4.5498907729
## [346] -1.3078399029 -1.0882592824  0.3128581222 -0.3644156933  0.3100845191
## [351]  2.4774831467 -1.0763759980  2.2151040131 -0.0952748801 -4.6864864366

Now, remember that the outcome variable is not guilty (0) and guilty (1), so you might be wondering - what's with these predicted values? Why aren't they 0 or 1?

Binomial regression is used for nonlinear outcomes. Since the outcome is 0/1, it's nonlinear. But binomial regression is based on the general linear model. So how can we apply the general linear model to a nonlinear outcome? Answer: by transforming scores. Specifically, it transforms the outcome into a log odds ratio; the log transform makes the outcome variable behave somewhat linearly and symmetrically. The predicted outcome, then, is also a log odds ratio.

ordvalues<-dissertation[order(dissertation$predver),]
ordvalues<-ordvalues[,51]
ordvalues<-data.frame(1:355,ordvalues)
colnames(ordvalues)<-c("number","predver")
library(ggplot2)
ggplot(data=ordvalues, aes(number,predver))+geom_smooth()
## `geom_smooth()` using method = 'loess'

Log odds ratios are great for analysis, but when trying to understand how well your model is predicting values, we want to convert them into a metric that's easier to understand in isolation and when compared to the observed values. We can convert them into probabilities with the following equation:

dissertation$verdict_predicted<-exp(predict(model))/(1+exp(predict(model)))

This gives us a value ranging from 0 to 1, which is the probability that a particular person will select guilty. We can use this value in different ways to see how well our model is doing. Typically, we'll divide at the 50% mark, so anyone with a probability of 0.5 or greater is predicted to select guilty, and anyone with a probability less than 0.5 would be predicted to select not guilty. We then compare this new variable with the observed results to see how well the model did.

dissertation$vpred_rounded<-round(dissertation$verdict_predicted,digits=0)
library(expss)
## Warning: package 'expss' was built under R version 3.4.4
dissertation<- apply_labels(dissertation,
                      verdict = "Actual Verdict",
                      verdict = c("Not Guilty" = 0,
                                        "Guilty" = 1),
                      vpred_rounded = "Predicted Verdict",
                      vpred_rounded = c("Not Guilty" = 0,
                                        "Guilty" = 1)
)
cro(dissertation$verdict,list(dissertation$vpred_rounded, total()))
 Predicted Verdict    #Total 
 Not Guilty   Guilty   
 Actual Verdict 
   Not Guilty  152 39   191
   Guilty  35 129   164
   #Total cases  187 168   355
One thing we could look at regarding this table, which when dealing with actual versus predicted categories is known as a confusion matrix, is how well the model did at correctly categorizing cases - which we get by adding together the number of people with both observed and predicted not guilty, and people with observed and predicted guilty, then dividing that sum by the total.

accuracy<-(152+129)/355
accuracy
## [1] 0.7915493

Our model correctly classified 79% of the cases. However, this is not the only way we can determine how well our model did. There are a variety of derivations you can make from the confusion matrix. But two you should definitely include when doing this kind of analysis are sensitivity and specificity. Sensitivity refers to the true positive rate, and specificity refers to the true negative rate.

When you're working with confusion matrices, you're often trying to diagnose or identify some condition, one that may be deemed positive or present, and the other that may be deemed negative or absent. These derivations are important because they look at how well your model identifies these different states. For instance, if most of my cases selected not guilty, I could get a high accuracy rate by simply predicting that everyone will select not guilty. But then my model lacks sensitivity - it only identifies negative cases (not guilty) and fails to identify any positive cases (guilty). If I were dealing with something even higher stakes, like whether a test result indicates the presence of a condition, I want to make certain my classification is sensitive to those positive cases. And vice versa, I could keep from missing any positive cases by just classifying everyone as positive, but then my model lacks specificity and I may subject people to treatment they don't need (and that could be harmful).

Just like accuracy, sensitivity and specificity are easy to calculate. As I said above, I'll consider not guilty to be negative and guilty to be positive. Sensitivity is simply the number of true positives (observed and predicted guilty) divided by the sum of true positives and false negatives (people who selected guilty but were classified as not guilty).

sensitivity<-129/164
sensitivity
## [1] 0.7865854

And specificity is the number of true negatives (observed and predicted not guilty) divided by the sum of true negatives and false positives (people who selected not guilty but were classified as guilty).

specificity<-152/191
specificity
## [1] 0.7958115

So the model correctly classifies 79% of the positive cases and 80% of the negative cases. The model could be improved, but it's functioning equally well across positive and negative cases, which is good.

It should be pointed out that you can select any cutpoint you want for your probability variable. That is, if I want to be very conservative in identifying positive cases, I might want there to be a higher probability that it is a positive case before I classify it as such - perhaps I want to use a cutpoint like 75%. I can easily do that.

dissertation$vpred2[dissertation$verdict_predicted < 0.75]<-0
dissertation$vpred2[dissertation$verdict_predicted >= 0.75]<-1
dissertation<- apply_labels(dissertation,
                      vpred2 = "Predicted Verdict (0.75 cut)",
                      vpred2 = c("Not Guilty" = 0,
                                        "Guilty" = 1)
)
cro(dissertation$verdict,list(dissertation$vpred2, total()))
 Predicted Verdict (0.75 cut)    #Total 
 Not Guilty   Guilty   
 Actual Verdict 
   Not Guilty  177 14   191
   Guilty  80 84   164
   #Total cases  257 98   355
accuracy2<-(177+84)/355
sensitivity2<-84/164
specificity2<-177/191
accuracy2
## [1] 0.7352113
sensitivity2
## [1] 0.5121951
specificity2
## [1] 0.9267016

Changing the cut score improves specificity but at the cost of sensitivity, which makes sense, because our model was predicting equally well (or poorly, depending on how you look at it) across positives and negatives. In this case, a different cut score won't improve our model. We would need to go back and see if there are better variables to use for prediction. And to keep us from fishing around in our data, we'd probably want to use a training and testing set for such exploratory analysis.

Monday, April 30, 2018

Z is for Z-Scores and Standardizing

Z is for Z-Scores and Standardizing Last April, I wrapped up the A to Z of Statistics with a post about Z-scores. It seems only fitting that I'm wrapping this April A to Z with the same topic. Z-scores are frequently used, sometimes when you don't even realize it. When you take your child to the doctor and they say he's at the x percentile on height, or when you take a standardized test and are told you scored in the y percentile, those values are being derived from Z-scores. You create a Z-score when you subtract the population mean from a value and divide that result by the population standard deviation.

Of course, we often will standardize variables in statistics, and the results are similar to Z-scores (though technically not the same if the mean and standard deviation aren't population values). In fact, when I demonstrated the GLM function earlier this month, I skipped a very important step when conducting an analysis with interactions. I should have standardized my continuous predictors first, which means subtracting the variable mean and dividing by the variable standard deviation, creating a new variable with a mean of 0 and a standard deviation of 1 (just like the normal distribution).

Let's revisit that GLM analysis. I was predicting verdict (guilty, not guilty) with binomial regression. I did one analysis where I used a handful of attitude items and the participant's guilt rating, and a second analysis where I created interactions between each attitude item and the guilt rating. The purpose was to see if an attitude impacts the threshold - how high the guilt rating needed to be before a participant selected "guilty". When working with interactions, the individual variables are highly correlated with the interaction variables based on them, so we solve that problem, and make our analysis and output a bit cleaner, by centering our variables and using those centered values to create interactions.

I'll go ahead and load my data. Also, since I know I have some missing values, which caused an error when I tried to work with predicted values and residuals Saturday, I'll also go ahead and identify that case/those cases.

dissertation<-read.delim("dissertation_data.txt",header=TRUE)
predictors<-c("obguilt","reasdoubt","bettertolet","libertyvorder",
              "jurevidence","guilt")
library(psych)
## Warning: package 'psych' was built under R version 3.4.4
describe(dissertation[predictors])
##               vars   n mean   sd median trimmed  mad min max range  skew
## obguilt          1 356 3.50 0.89      4    3.52 0.00   1   5     4 -0.50
## reasdoubt        2 356 2.59 1.51      2    2.68 1.48  -9   5    14 -3.63
## bettertolet      3 356 2.36 1.50      2    2.38 1.48  -9   5    14 -3.41
## libertyvorder    4 355 2.74 1.31      3    2.77 1.48  -9   5    14 -3.89
## jurevidence      5 356 2.54 1.63      2    2.66 1.48  -9   5    14 -3.76
## guilt            6 356 4.80 1.21      5    4.90 1.48   2   7     5 -0.59
##               kurtosis   se
## obguilt          -0.55 0.05
## reasdoubt        26.92 0.08
## bettertolet      25.47 0.08
## libertyvorder    34.58 0.07
## jurevidence      25.39 0.09
## guilt            -0.54 0.06
dissertation<-subset(dissertation, !is.na(libertyvorder))

R has a built-in function that will do this for you: scale. The scale function has 3 main arguments - the variable or variables to be scaled, and whether you want those variables centered (subtract mean) and/or scaled (divided by standard deviation). For regression with interactions, we want to both center and scale. For instance, to center and scale the guilt rating:

dissertation$guilt_c<-scale(dissertation$guilt, center=TRUE, scale=TRUE)

I have a set of predictors I want to do this to, so I want to apply a function across those specific columns:

dissertation[46:51]<-lapply(dissertation[predictors], function(x) {
  y<-scale(x, center=TRUE, scale=TRUE)
  }
)

Now, let's rerun that binomial regression, this time using the centered variables in the model.

pred_int<-'verdict ~ obguilt.1 + reasdoubt.1 + bettertolet.1 + libertyvorder.1 + 
                  jurevidence.1 + guilt.1 + obguilt.1*guilt.1 + reasdoubt.1*guilt.1 +
                  bettertolet.1*guilt.1 + libertyvorder.1*guilt.1 + jurevidence.1*guilt.1'
model2<-glm(pred_int, family="binomial", data=dissertation)
summary(model2)
## 
## Call:
## glm(formula = pred_int, family = "binomial", data = dissertation)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.6101  -0.5432  -0.1289   0.6422   2.2805  
## 
## Coefficients:
##                         Estimate Std. Error z value Pr(>|z|)    
## (Intercept)             -0.47994    0.16264  -2.951  0.00317 ** 
## obguilt.1                0.25161    0.16158   1.557  0.11942    
## reasdoubt.1             -0.09230    0.20037  -0.461  0.64507    
## bettertolet.1           -0.22484    0.20340  -1.105  0.26899    
## libertyvorder.1          0.05825    0.21517   0.271  0.78660    
## jurevidence.1            0.07252    0.19376   0.374  0.70819    
## guilt.1                  2.31003    0.26867   8.598  < 2e-16 ***
## obguilt.1:guilt.1        0.14058    0.23411   0.600  0.54818    
## reasdoubt.1:guilt.1     -0.61724    0.29693  -2.079  0.03764 *  
## bettertolet.1:guilt.1    0.02579    0.30123   0.086  0.93178    
## libertyvorder.1:guilt.1 -0.27492    0.29355  -0.937  0.34899    
## jurevidence.1:guilt.1    0.27601    0.36181   0.763  0.44555    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 490.08  on 354  degrees of freedom
## Residual deviance: 300.66  on 343  degrees of freedom
## AIC: 324.66
## 
## Number of Fisher Scoring iterations: 6

The results are essentially the same; the constant and slopes of the predictors variables are different but the variables that were significant before still are. So standardizing doesn't change the results, but it is generally recommended because it makes results easier to interpret. The variables are centered around the mean, so negative numbers are below the mean, and positive numbers are above the mean.

Hard to believe A to Z is over! Don't worry, I'm going to keep blogging about statistics, R, and whatever strikes my fancy. I almost kept this post going by applying the prediction work from yesterday to the binomial model, but decided that would make for a fun future post. And I'll probably sprinkle in posts in the near future on topics I didn't have room for this month or that I promised to write a future post on. Thanks for reading and hope you keep stopping by, even though April A to Z is officially over!

Sunday, April 29, 2018

Statistics Sunday: Conducting Meta-Analysis in R

Here it is, everyone! The promised 4th post on meta-analysis, and my second video for Deeply Trivial! In this video, I walk through conducting a basic meta-analysis, both fixed and random effects, in the metafor package:



See these previous posts and links for more information:
You can access the code I used in the video here as well as code to do similar analysis with the or_meta dataset here.

Saturday, April 28, 2018

Y is for Ys, Y-hats, and Residuals

Y is for Ys, Y-hats, and Residuals When working with a prediction model, like a linear regression, there are a few Ys you need to concern yourself with: the ys (observed outcome variable), the y-hats (predicted outcome variables based on the equation), and the residuals (y minus y-hat). Today, I'll dig into the different flavors of y and how you might work with them when conducting linear regression.

As my data example, I'll use my dissertation dataset and the linear model I introduced in the GLM post.

dissertation<-read.delim("dissertation_data.txt",header=TRUE)
guilt_lm_full<-lm(guilt ~ illev + circumst + deftest + policepower +
                    suspicious + overzealous + upstanding,
                  data=dissertation)
options(scipen = 999)
summary(guilt_lm_full)
## 
## Call:
## lm(formula = guilt ~ illev + circumst + deftest + policepower + 
##     suspicious + overzealous + upstanding, data = dissertation)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.0357 -0.7452  0.1828  0.9706  2.5013 
## 
## Coefficients:
##             Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)  4.16081    0.38966  10.678 < 0.0000000000000002 ***
## illev        0.11111    0.05816   1.911              0.05689 .  
## circumst    -0.08779    0.06708  -1.309              0.19147    
## deftest     -0.02020    0.05834  -0.346              0.72942    
## policepower  0.02828    0.06058   0.467              0.64090    
## suspicious   0.17286    0.06072   2.847              0.00468 ** 
## overzealous -0.03298    0.04792  -0.688              0.49176    
## upstanding   0.08941    0.05374   1.664              0.09706 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.169 on 347 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.07647, Adjusted R-squared:  0.05784 
## F-statistic: 4.105 on 7 and 347 DF,  p-value: 0.0002387

In this model, the outcome variable is a guilt rating, ranging from 1 to 7. This is our y, which our regression model is trying to recreate through the linear relationship between our x's and our y. Using the coefficients in the output, we could compute the predicted y (y-hat) - what a person's score would be if the linear model fit the data perfectly. Fortunately, R has a built-in function that will compute y-hat for a dataset: predict. This function requires two arguments: a regression model and the dataset to use to predict values. Let's have R predict values for the dissertation dataset, and add it on as a new variable.

dissertation$predicted<-predict(guilt_lm_full, dissertation)

In this application, we don't care as much about the predicted values - we will later on in this post - but we probably do care about the residuals: the difference between the observed value and the predicted value. This gives us an idea of how our model is doing and whether it fits reasonably well. It can also tell us if the model falls apart at certain values or ranges of values.

In the residuals post, I showed that you can easily request residuals from the model. As we did with predicted, let's create a new variable in the dataset that contains our residuals.

dissertation$residual<-resid(guilt_lm_full)
## Error in `$<-.data.frame`(`*tmp*`, residual, value = structure(c(0.0326393185592984, : replacement has 355 rows, data has 356

Ruh-roh, we got an error. Our dataset contains 356 observations, but we only have 355 residuals. This is because someone has a missing value on one of the variables in the regression model and was dropped from the analysis. There are a variety of ways we could find out which case is missing a value, but since I'm only working with a handful of variables, I'll just run descriptives and look for the variable with only 355 values.

library(psych)
## Warning: package 'psych' was built under R version 3.4.4
describe(dissertation[c(13,15,18,21,27,29,31,44)])
##             vars   n mean   sd median trimmed  mad min max range  skew
## illev          1 356 2.98 1.13      3    3.02 1.48   1   5     4 -0.17
## circumst       2 356 2.99 0.95      3    2.97 1.48   1   5     4  0.13
## deftest        3 356 3.39 1.46      4    3.57 0.00  -9   5    14 -5.25
## policepower    4 355 3.86 1.41      4    4.02 0.00  -9   5    14 -6.40
## suspicious     5 356 2.09 1.14      2    2.01 1.48  -9   5    14 -1.97
## overzealous    6 356 3.34 1.34      4    3.41 1.48  -9   5    14 -4.49
## upstanding     7 356 3.09 1.29      3    3.11 1.48  -9   5    14 -2.31
## guilt          8 356 4.80 1.21      5    4.90 1.48   2   7     5 -0.59
##             kurtosis   se
## illev          -1.04 0.06
## circumst       -0.51 0.05
## deftest        40.74 0.08
## policepower    55.05 0.08
## suspicious     23.52 0.06
## overzealous    38.44 0.07
## upstanding     19.66 0.07
## guilt          -0.54 0.06

The variable policepower is the culprit. I can drop that missing value then rerun the residual code.

dissertation<-subset(dissertation, !is.na(policepower))
dissertation$residual<-resid(guilt_lm_full)

Now I can plot my observed values and residuals.

library(ggplot2)
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha
qplot(guilt,residual,data=dissertation)

We want our residuals to fall around 0, which is only happening for guilt ratings near the midpoint of the scale. This suggests that ordinary least squares regression may not be the best fit for the data, as the model shows a tendency to overpredict (negative residual) guilt rating for people with lower observed ratings and underpredict (positive residual) for people with higher observed ratings.

But, as I often do on this blog for the sake of demonstration, let's pretend the model is doing well. One way we could use a regression model is to predict scores in a new sample. For instance, there are multiple rumors that different graduate schools have prediction equations they use to predict a candidate's anticipated graduate school GPA, based on a combination of factors asked about in the application packet, to determine if a person is grad school-ready (and ultimately, to decide if they should be admitted). Schools generally won't confirm they do this, nor would they ever share the prediction equation, should such an equation exist. But this is one way regression results from a training sample could be used to make decisions on a testing sample. So let's do that.

Unfortunately, I don't have a second dissertation dataset laying around that I could apply this equation to, but I could take a note from the data science playbook, and randomly divide my sample into training and testing datasets. I use the training dataset to generate my equation, and I use the testing dataset to apply my equation and predict values. Since I have outcome variable data in the testing dataset too, I can see how well my model did. Once I have a well-performing model, I could then apply it to new data, maybe to predict how highly you'll rate a book or movie, or to generate recommendations, or even to determine if I should let you in to the super-elite Monstersori school I want to create.

First, I'll split my dataset in half.

smp_size <- floor(0.50 * nrow(dissertation))

set.seed(42)
train_ind <- sample(seq_len(nrow(dissertation)), size = smp_size)

train <- dissertation[train_ind, ]
test <- dissertation[-train_ind, ]

Now I have a train dataset, with 177 observations, and a test dataset with 178. I can rerun the same linear model as before, this time only using the training data.

guilt_lm_train<-lm(guilt ~ illev + circumst + deftest + policepower +
               suspicious + overzealous + upstanding,
             data=train)
summary(guilt_lm_train)
## 
## Call:
## lm(formula = guilt ~ illev + circumst + deftest + policepower + 
##     suspicious + overzealous + upstanding, data = train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.9420 -0.8359  0.1641  0.9371  2.3151 
## 
## Coefficients:
##             Estimate Std. Error t value       Pr(>|t|)    
## (Intercept)  5.28874    0.77150   6.855 0.000000000128 ***
## illev        0.08866    0.08485   1.045        0.29759    
## circumst    -0.13018    0.09917  -1.313        0.19109    
## deftest     -0.25726    0.10699  -2.405        0.01727 *  
## policepower  0.01758    0.12316   0.143        0.88665    
## suspicious   0.25716    0.08857   2.903        0.00419 ** 
## overzealous -0.11683    0.08240  -1.418        0.15807    
## upstanding   0.10371    0.07574   1.369        0.17273    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.194 on 169 degrees of freedom
## Multiple R-squared:  0.1265, Adjusted R-squared:  0.09027 
## F-statistic: 3.495 on 7 and 169 DF,  p-value: 0.001586

I can use my predict function to predict scores for the testing dataset. Remember, all this function needs is the linear model name and a dataset to use for the prediction function - and it can be any dataset, as long as it contains the same variables from the model.

test$predicted2<-predict(guilt_lm_train, test)

The original predicted value (from when I was working with the full dataset) is still in this set. I could have replaced values by using the same variable name, but just for fun, decided to keep those values and create a second prediction variable.

Because we have observed and predicted2 for our training dataset, let's see how well our model did, by creating a new residual variable, residual2. We can't use the resid function, because we didn't have R perform a linear regression on the testing dataset, but we can easily generate this variable by subtracting the predicted score from the observed score. Then we can once again plot our observed and residual values.

test$residual2<-test$guilt-test$predicted2
qplot(guilt,residual2,data=test)

We're still seeing similar issues with the residuals as we did for the full dataset. If we wanted to actually apply our linear model, we'd want to do more research and pilot work to get the best equation we can. As with many things in statistics, the process is heavily determined by what you plan to do with the results. If you want to report variables that have a strong linear relationship with an outcome, we might be fine with these regression results. If we want to build an equation that predicts an outcome with a minimum of error, we'd want to do more exploratory work, pulling in new variables and dropping low-performing ones. Many of the predictors in the model were not significant, so we could start model-building from those results. We may need to build multiple training datasets, to ensure we aren't only picking up chance relationships. And for much larger applications, such as recommendation systems on services like Amazon and Netflix, machine learning may be a better, more powerful method.

One more A to Z post left!

Thursday, April 26, 2018

W is for (Meta-Analysis) Weights

Weights in Meta-Analysis Yesterday, I talked about the variance calculation for meta-analysis, which is based on sample size - the exact calculation varies by the type of effect size you're using. It's a good idea to inspect those variance calculations. There are many places where your numbers for the meta-analysis can be incorrect.

One place is when extracting that information from the studies themselves, which is why having more than one person extract information and calculating some sort of interrater reliability is a good idea. Another place is at data entry. You should always double-check those numbers, either with a visual examination of the dataset or, my preference especially for large meta-analyses, through data visualization, as I did yesterday with ggplot. Values that don't show an inverse relationship between sample size and variance should be checked. This was the case with study 045 in the or_meta file. A visual check, and knowledge of the formula for variance, confirmed that this variance was correct; it was large because there were a lot more people in the control group than treatment group.

Once we're satisfied that our numbers are correct, we can move forward with the meta-analysis. Part of that calculation involves study weights. Weights and variance have an inverse relationship; in fact, one value can be used to compute the other:

wi = 1/vi
vi = 1/wi

When you take the inverse of a small number, such as variance, you get a large number, weight. And when you take the inverse of a large number, you get a small number. metafor calculates those weights for you once you conduct the actual meta-analysis, using the rma function. So let's do that. First, I'll load my data frames, and calculate effect sizes and variances.

smd_meta<-data.frame(
  id = c("005","005","029","031","038","041","041","058","058","067","067"),
  study = c(1,2,3,1,1,1,2,1,2,1,2),
  author_year = c("Ruva 2007","Ruva 2007","Chrzanowski 2006","Studebaker 2000",
                  "Ruva 2008","Bradshaw 2007","Bradshaw 2007","Wilson 1998",
                  "Wilson 1998","Locatelli 2011","Locatelli 2011"),
  n1 = c(138,140,144,21,54,78,92,31,29,90,181),
  n2 = c(138,142,234,21,52,20,18,15,13,29,53),
  m1 = c(5.29,5.05,1.97,5.95,5.07,6.22,5.47,6.13,5.69,4.81,4.83),
  m2 = c(4.08,3.89,2.45,3.67,3.96,5.75,4.89,3.80,3.61,4.61,4.51),
  sd1 = c(1.65,1.50,1.08,1.02,1.65,2.53,2.31,2.51,2.51,1.20,1.19),
  sd2 = c(1.67,1.61,1.22,1.20,1.76,2.17,2.59,2.68,2.78,1.39,1.34)
)

or_meta<-data.frame(
  id = c("001","003","005","005","011","016","025","025","035","039","045","064","064"),
  study = c(1,5,1,2,1,1,1,2,1,1,1,1,2),
  author_year = c("Bruschke 1999","Finkelstein 1995","Ruva 2007","Ruva 2007",
                  "Freedman 1996","Keelen 1979","Davis 1986","Davis 1986",
                  "Padawer-Singer 1974","Eimermann 1971","Jacquin 2001",
                  "Ruva 2006","Ruva 2006"),
  tg = c(58,26,67,90,36,37,17,17,47,15,133,68,53),
  cg = c(49,39,22,50,12,33,19,17,33,11,207,29,44),
  tn = c(72,60,138,140,99,120,60,55,60,40,136,87,74),
  cn = c(62,90,138,142,54,120,52,57,60,44,228,83,73)
)

library(metafor)
## Loading required package: Matrix
## Loading 'metafor' package (version 2.0-0). For an overview 
## and introduction to the package please type: help(metafor).
smd_meta <- escalc(measure="SMD", m1i=m1, m2i=m2, sd1i=sd1, sd2i=sd2, n1i=n1,
                   n2i=n2,data=smd_meta)
or_meta <- escalc(measure="OR", ai=tg, bi=(tn-tg), ci=cg, di=(cn-cg),
                  data=or_meta)

These two datasets use different effect size metrics, so I can't combine them unless I convert the effect sizes from one dataset. For now, we'll meta-analyze them separately. First up will be the dataset using standardized mean difference (Cohen's d).

There are two overall methods of meta-analysis - fixed effects and random effects. Fixed effects means that you suspect there is only one underlying effect size that your studies are trying to estimate. You can access this by setting method="FE" (for fixed effects) in the rma function.

smd.rma<-rma(yi,vi,method="FE",data=smd_meta)
summary(smd.rma)
## 
## Fixed-Effects Model (k = 11)
## 
##   logLik  deviance       AIC       BIC      AICc  
## -41.4601   97.3274   84.9202   85.3181   85.3647  
## 
## Test for Heterogeneity: 
## Q(df = 10) = 97.3274, p-val < .0001
## 
## Model Results:
## 
## estimate      se    zval    pval   ci.lb   ci.ub     
##   0.3492  0.0526  6.6388  <.0001  0.2461  0.4523  ***
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

I'll go into the various parts of this output on Sunday. For now, we'll look at the Model Results, which includes the estimate (the overall effect size).

smd.rma$beta
##              [,1]
## intrcpt 0.3491923

The overall Cohen's d for this set of studies is 0.349, a moderate effect. In plain English, the average guilt rating among people who saw pretrial publicity is about 0.349 standard deviations higher than people in the control group. There's an associated Z-value and p-value, which tests whether that metric is significantly different from 0.

options(scipen=999)
smd.rma$zval
## [1] 6.638784
smd.rma$pval
## [1] 0.00000000003162821

The Z-test confirms that this value is significantly different from 0. I'll go into conducting a random effects model (and how to tell if you should) on Sunday. Now, let's run the meta-analysis for the log odds ratio dataset.

or.rma<-rma(yi,vi,method="FE",data=or_meta)
summary(or.rma)
## 
## Fixed-Effects Model (k = 13)
## 
##   logLik  deviance       AIC       BIC      AICc  
## -23.0842   47.8107   48.1684   48.7334   48.5321  
## 
## Test for Heterogeneity: 
## Q(df = 12) = 47.8107, p-val < .0001
## 
## Model Results:
## 
## estimate      se    zval    pval   ci.lb   ci.ub     
##   0.7456  0.0987  7.5566  <.0001  0.5522  0.9389  ***
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
or.rma$beta
##              [,1]
## intrcpt 0.7455659
or.rma$zval
## [1] 7.556647
or.rma$pval
## [1] 0.00000000000004135935

Our estimated log odds ratio for this dataset is 0.7456, which is significantly different from 0. This number is difficult to interpret in isolation, so we want to convert it back to odds ratio, which is easier to interpret.

exp(or.rma$beta)
##             [,1]
## intrcpt 2.107634

Overall, people who saw pretrial publicity were over 2 times as likely to convict as people in the control group.

metafor will also let you supply user-defined weights, using the weights argument in the rma function. If you're applying a correction to your effect sizes - such as converting your Pearson's r coefficients to Fisher's Z - you'll probably want to supply custom weights, because the formula is different. If you leave that argument out, as I did, rma will create weights as the inverse of variance by default.

I should note that I only picked a handful of studies to include for this example, so these don't match the results I found in my actual meta-analysis, which included many more studies. (If you're conducting research on pretrial publicity and would like the actual results of my meta-analysis, please contact me. You can find contact info on my about page. I'm happy to share my actual results. These numbers are for demonstration purposes only. I still hope to publish this work someday, which is why I'm not using the full dataset or actual results here, but at this point, I need to collect more data on the new pretrial publicity studies that have been conducted since 2011.)

More on meta-analysis Sunday! And tune in tomorrow for the letter X!