Showing posts with label rumination. Show all posts
Showing posts with label rumination. Show all posts

Wednesday, March 14, 2018

Statistical Sins: Not Creating a Codebook

I'm currently preparing for Blogging A-to-Z. It's almost a month away, but I've picked a topic that will be fun but challenging, and I want to get as many posts written early as I can. I also have a busy April lined up, so writing posts during that month will be a challenge even if I had picked an easier topic.

I decided to pull out some data I collected for my Facebook study to demonstrate an analysis technique. I knew right away where the full dataset was stored, since I keep a copy in my backup online drive. This study used a long online survey, which was comprised of several published measures. I was going through identifying the variables associated with each measure, and was trying to take stock of which ones needed to be reverse-scored, as well as which ones also belonged to subscales.

I couldn't find that information in my backup folder, but I knew exactly which measures I used, so I downloaded the articles from which those measures were drawn. As I was going through one of the measures, I realized that I couldn't match up my variables with the items as listed. The variable names didn't easily match up and it looked like I had presented the items within the measure in a different order than they were listed in the article.

Why? I have no idea. I thought for a minute that past Sara was trolling me.

I went through the measure, trying to match up the variables, which I had named as an abbreviated version of the scale name followed by a "keyword" from the item text. But the keywords didn't always match up to any item in the list. Did I use synonyms? A different (newer) version of the measure? Was I drunk when I analyzed these data?

I frantically began digging through all of my computer folders, online folders, and email messages, desperate to find something that could shed light on my variables. Thank the statistical gods, I found a codebook I had created shortly after completing the study, back when I was much more organized (i.e., had more spare time). It's a simple codebook, but man, did it solve all of my dataset problems. Here's a screenshot of one of the pages:


As you can see, it's just a simple Word document with a table that gives Variable Name, the original text of the item, the rating scale used for that item, and finally what scale (and subscale) it belongs to and whether it should be reverse-scored (noted with "R," under subscale). This page displays items from the Ten-Item Personality Measure.

Sadly, I'm not sure I'd take the time to do something like this now, which is a crime, because I could very easily run into this problem again - where I have no idea how/why I ordered my variables and no way to easily piece the original source material together. And as I've pointed out before, sometimes when I'm analyzing in a hurry, I don't keep well-labeled code showing how I computed different variables.

But all of this is very important to keep track of, and should go in a study codebook. At the very least, I would recommend keeping one copy of surveys that have annotations (source, scale/subscale, and whether reverse-coded - information you wouldn't want to be on the copy your participants see) and code/syntax for all analyses. Even if your annotations are a bunch of Word comment bubbles and your code/syntax is just a bunch of commands with no additional description, you'll be a lot better off than I was with only the raw data.

I recently learned there's an R package that will create a formatted codebook from your dataset. I'll do some research into that package and have a post about it, hopefully soon.

And I sincerely apologize to past Sara for thinking she was trolling me. Lucky for me, she won't read this post. Unless, of course, O'Reilly Auto Parts really starts selling this product.

Monday, November 13, 2017

Statistics Sunday: What is Bootstrapping?

Last week, I posted about the concept of randomness. This is a key concept throughout statistics. For instance, I may have mentioned before but many statistical tests assume that the cases used to generate the statistics were randomly sampled from the population of interest. That, of course, rarely happens in practice, but this is a key concept in what we call parametric tests - tests that compare to an assumed population distribution.

The reason for this focus on random sampling goes back to the nature of probability. Every case in the population of interest has a chance of being selected - an equal chance in simple random sampling, and unequal but still predictable chances when more complex sampling methods are used, like stratified random sampling. It's true that you could, by chance, draw a bunch of really extreme cases. But there are usually fewer cases in the extremes.


If you look at the normal distribution, for instance, there are so many more cases in the middle that you have a much higher chance of drawing cases that fall close to the middle. This means that, while your random sample may not have as much variance as the population of interest, your measures of central tendency should be pretty close to the underlying population values.

So we have a population, and we draw a random sample from it, hoping that probability will work in our favor and give us a sample data distribution that resembles that population distribution.

But what if we wanted to add one more step, to really give probability a chance (pun intended) to work for us? Just as cases that are typical of the population are more likely to end up in our sample, cases that are typical of our sampling distribution are more likely to end up in a sample of the sample. (which we'll call subsample for brevity's sake) And if we repeatedly drew subsamples and plotted the results, we could generate a distribution that gets a little closer to the underlying population distribution. Of course, we're limited by the size of our sample, in that our subsamples can't exceed that size, but we can bypass that by random sampling with replacement. That means that after pulling out a case and making a note of it, we put it back into the mix. It could get drawn again. This gives us a theoretically limitless sample from which to draw.

That's how bootstrapping works. Bootstrapping is a method of generating unbiased (though it's more accurate to say less biased) estimates. Those estimates could be things like variance or other descriptive statistics, or it could be used in inferential statistical analyses. Bootstrapping means that you use random sampling with replacement to estimate values. Frequently, it means using your observed data as a sort of population, and repeatedly drawing large samples with replacement from that data. In our Facebook use study, we used bootstrapping to test our hypothesis.

To summarize, we measured Facebook use among college students, and also gave them measures of rumination (tendency to fixate on negative feelings), and subjective well-being (life satisfaction, depression, and physical symptoms of ill health). We hypothesized that rumination mediated the effect of Facebook use on well-being. Put in plain language, we believed using Facebook made you more likely to ruminate, which in turn resulted in lower well-being.

The competing hypothesis is that people who already tend to ruminate use Facebook as an outlet for rumination, resulting in lower well-being. In this alternative hypothesis, Facebook is the mediator, not rumination.

Testing mediation means testing for an indirect effect. That is, the independent variable (Facebook use) affects the dependent variable (well-being) indirectly through the mediator (rumination). We used bootstrapping to estimate these indirect effects; we took 5000 random samples of our data to generate our estimates. Just as we're more likely to draw cases typical of our sample (which are hopefully typical of our population), we're more likely to draw samples that (hopefully) have the typical effect of our population. The resulting indirect effects we get from bootstrapping won't be the same as a simple analysis of our observed data. We're using probability to remove bias from our estimates.

And what did we find in our Facebook study? We found stronger support for our hypothesis than the alternative. That is, we had stronger evidence that Facebook use leads to rumination than the alternative that rumination leads to Facebook use. If you're interested in finding out more, you can read the article here.

Saturday, April 22, 2017

S is for Scatterplot

Visualizing your data is incredibly important. I talked previously about the importance of creating histograms of your interval/ratio variables to check the shape of your distribution. Today, I'm going to talk about another way to visualize data: the scatterplot.

Let's say you have two interval/ratio variables that you think are related to each other in some way. You might think they're simply correlated, or you might think that one causes the other one. You would first want to look at the relationship between the two variables. Why? Correlation assumes a linear relationship between variables, meaning a consistent positive (as one increases so does the other) or negative (as one increases the other decreases) relationship across all values. We wouldn't want it to be positive at first, and then flatten out before turning negative. (I mean, we might, if that's the kind of relationship we expect, but we would need to analyze our data with a different statistic - one that doesn't assume a linear relationship.)

So we create a scatterplot, which maps out each participants' pair of scores on the two variables we're interested in. In fact, you've probably done this before in math class, on a smaller scale.

As I discussed in yesterday's bonus post, I had 257 people respond to a rather long survey about how they use Facebook, and how use impacts health outcomes. My participants completed a variety of measures, including measures of rumination, savoring, life satisfaction, Big Five personality traits, physical health complaints, and depression. There are many potential relationships that could exist between and among these concepts. For instance, people who ruminate more (fixate on negative events and feelings) also tend to be more depressed. In fact, here's a scatterplot created with those two variables from my study data:


And sure enough, these two variables are positively correlated with each other: r = 0.568. (Remember that r ranges from -1 to +1, and that 1 would indicate a perfect relationship. So we have a strong relationship here, but there are still other variables that explain part of the variance in rumination and/or depression.)

Savoring, on the other hand, is in some ways the opposite of rumination; it involves fixating on positive events and feelings. So we would expect these two to be negatively correlated with each other. And they are:


The correlation between these two variables is -0.351, so not as a strong as the relationship between rumination and depression and in the opposite direction.

Unfortunately, I couldn't find any variables in my study that had a nonlinear relationship to show (i.e., has curves). But I could find two variables that were not correlated with each other: the Extraversion scale from Big Five and physical health complaints. Unsurprisingly, being an extravert (or introvert) has nothing to do with health problems (r = -0.087; pretty close to 0):


But if you really want to see what a nonlinear relationship might look like, check out this post on the Dunning-Kruger effect; look at the relationship between actual performance and perceived ability.

As I said yesterday, r also comes with a p-value to tell whether the relationship is larger than we would expect by chance. We would usually report the exact p-value, but for some these, the p-value is so small (really small probability of occurring by chance), the program doesn't display the whole thing. In those cases, we would choose a really small value (the convention in these cases seems to be 0.001) and say the p was less than that. Here's the r's and p-values for the 3 scatterplots above:

  1. Rumination and Depression, r = 0.568, p < 0.001
  2. Rumination and Savoring, r = -0.351, p < 0.001
  3. Extraversion and Health Complaints, r = -0.087, p = 0.164

Monday, April 3, 2017

B is for Beta

So we're moving forward with blogging A to Z. Today's topic is really following up from Saturday, where we talked about alpha. Perhaps you were wondering, "If alpha is equal to Type I error, where you say there's something there when there really isn't, does that mean there's also a chance you'll say there's nothing there when there really is?" Yes, and that's what we're talking about today: Beta. Also known as your Type II error rate.


Continuing with the horror movie analogy, this is when you walk in the room and walk right by the killer hiding in the corner without even seeing him. He was right there and you missed him! Once again, you don't ever get any feedback that you missed it by ending up dead later on. So it won't necessarily kill you to commit a Type II error but at the very least, you're missing out on information that might be helpful.

Unlike alpha, which we set ahead of time, we can only approximate beta. We can also do things to minimize beta: use really strong interventions (dosage), minimize any diffusion of the treatment into our control group, and select good measures, to name a few. In fact, if you want to learn more, check out this article on internal validity, which identifies some of the various threats to internal validity (the ability of our study to show that our independent variable causes our dependent variable).

To use my caffeine study example from Saturday, I would want to use a strong dose of caffeine with my experimental group. I would also make sure they haven't had any caffeine before they came in to do the study, and if I can't control that, I would at least want to measure it. I would also probably keep my experimental and control groups separate from each other, to keep them from getting wise to the differences in the coffee. And I would want to use a test that my participants have not taken before.

There's also a way you can minimize beta directly, by maximizing its inverse: 1-β or Power. Power is the probability that you will find an effect if it is there. We usually want that value to be at least 0.8, meaning 80% probability that you will find an effect if there's one to be found. If you know something about the thing you're studying - that is, other studies have already been performed - you can use the results of those studies to estimate the size of the effect you'll probably find in your study. In my caffeine study, the effect I'm looking for is the difference between the experimental and control group, in this case a difference between two means (averages). There are different metrics (that I'll hopefully get to this month) that reflect the magnitude of the difference between two groups, metrics that take into account not only the difference between the two means but also how spread out the scores are in the groups.

Using that information from previous studies, I can then do what's called a power analysis. If you're doing a power analysis before you conduct a study (what we would call a priori), you'll probably use that power analysis to tell you how many people you should have in your study. Obviously, having more people in your study is better because more people will get you close to the population value you're trying to estimate (don't worry, I'll go into more detail about this aspect later). But you can't get everyone in your study, nor would you want to spend the time and money to keep collecting data - studies would never end! So an a priori power analysis helps you figure out what resources you need while also helping you feel confident that, if there's an effect to be found, you'll find it.

Of course, you might be studying something completely new. For instance, when I studied Facebook use, rumination, and health outcomes, there was very little research on these different relationships - there's more now, of course. What I did in those cases was to pick the smallest effect I was interested in seeing. Basically, what is the smallest effect that is just big enough to be meaningful or important? In this case, we're not only using statistical information to make a decision; we're also potentially using clinical judgment.

For instance, one of my health measures was depression: how big of a difference in depression is enough for us to say, "This is important"? If we see that using Facebook increases depression scores by that much, then we have a legitimate reason to be concerned. That's what I did for each of the outcomes, when I didn't have any information from previous studies to guide me. Then I used that information to help me figure out how many people I needed for my study.

Power analysis sounds intimidating, but it actually isn't. A really simple power analysis can be done using a few tables. (Yep, other people have done the math for you.) More complex power analyses (when you're using more complex statistical analysis) can be conducted with software. And there are lots of people out there willing to help you. Besides, isn't doing something slightly intimidating better than doing a study without actually knowing you're going to find anything?

Tuesday, May 10, 2016

Your Brain on Stress

Speaking of brain activity and responses, a friend shared this video with me - how stress affects your brain:



This video gives a nice overview of many important brain systems and what they do, while talking about the effect of stress. The video also talks briefly about the epigenetics, the ways in which the environment can trigger certain genes to express. This means that, even if you have a genetic predisposition to stress and anxiety, a nurturing environment can keep that gene from expressing.

An important extension of this concept is the biopsychosocial model, which states that biology, psychology, and social environment combine to determine health across one's lifespan.


Experience can change the brain, though your brain becomes less plastic (changeable) as you age. This is why a small child may recover from a head injury that would be fatal to an adult. The brain is able to rewire itself, especially prior to the age of 6. And while the brain changes discussed in the video are real, they represent a worst-case scenario of stress response. If you experience normal amounts of stress or only occasional instances of high stress, you'll probably be fine. But if you experience high chronic stress, you'll want to do something to cope with that - whether it be talk therapy, lifestyle changes to minimize stress, and/or medications for anxiety.

Saturday, April 16, 2016

N is for Negativity Bias

You probably won't be surprised if I tell you that human beings have a tendency to focus on the negative. Though many people try to be positive and grateful, when something bad happens, we tend to fixate on that thing, complain about it, and in many cases, let it ruin our mood/day/week/month/year/etc. This is known as the negativity bias; unpleasant things have a greater impact than pleasant or neutral things.

You can see how this bias might be important for survival. If something can result in a negative outcome (which could be as minimal as discomfort to as extreme as injury or death), it's going to get more of our attention and more strongly influence our behavior than something with a positive outcome. However, this bias influences a variety of decisions, including ones that would be better served with more rational consideration of the facts.

During this election year, you've probably heard MANY ads about different candidates, and as with many election cycles, MANY of these ads are actually attacks on other candidates: highlighting negative traits and bad things that candidate has done in his/her past. These ads capitalize on the negativity bias.

I was going to post some examples of negative ads here, but I'm sure you've seen tons. So here's a puppy instead.


Obviously, if you're conscious of this bias, you can try to correct for it. One way is by making an effort to fixate on the positive, through a process called savoring; I've blogged about savoring before, and you can also read more about it here. Or just keep staring at that adorable puppy!

Sunday, September 11, 2011

Forgiveness and the 10-Year Anniversary of 9/11

Today, we remember the 10th anniversary of the attacks of September 11th. In church today, the message was one of forgiveness. There are many religious and spiritual arguments for the importance of forgiveness that I won't go into. Psychologists also have explored this concept, and have discovered how forgiveness (and its converse, unforgiveness) influences an individual's mental and physical health.

Forgiveness is defined in many ways, but all of these definitions add up to one thing: forgiveness is something a wronged person offers to the one (or ones) who perpetrated the wrongdoing. It is generally viewed as a process that the forgiver works toward through many emotions and behaviors. Forgiveness is also often viewed as a personality trait; some people are simply more forgiving than others.

A lot of evidence suggests that being in a state of unforgiveness is damaging to both your mental and physical health (read one of many reviews here). Conversely, forgiveness is associated with better mental and physical health. Forgiveness is something you do, in part for the other person, but also for yourself. Refusing to forgive and continuing to hold a grudge is, for lack of a better word, toxic to your well-being.

This is because, in refusing to forgive, we often dwell on the wrongdoing. Psychologists refer to this constant dwelling on the negative as "rumination", and refer to rumination about perceived wrongdoing as "vengeful rumination". Research on rumination in general finds negative effects. Rumination is negatively correlated with sleep quality (abstract), as well as alcohol misuse, disordered eating, and self-harm (full paper). It makes focusing attention and problem solving difficult, because ruminators tend to be less confident in their problem solving abilities (full paper), and also because rumination uses working memory that could be devoted to the problem (full paper). Ruminators generate more biased interpretations of negative events, are more pessimistic about the future, and are poorer at solving interpersonal problems, as well (full paper).

Rumination is also associated with poor physical health. High ruminators show physiological stress markers, such as increased salivary cortisol (full paper here and abstract here) and immune system activity (full paper). People who ruminate also take longer for their heart rate and blood pressure to return to normal after being made to feel angry, which can put them at risk for organ damage over time (full paper).

Forgiveness is not "letting someone off the hook". It is not the same as condoning or absolving someone of wrong-doing. The old adage of "forgive and forget" doesn't necessarily lead to better outcomes, mainly because of the forgetting part. It is good to forgive, but not necessarily good to forget. Forgetting means failing to learn a lesson - a lesson that may be very important for you later on. This leaves the "forgiver" in a rather difficult position; one in which he or she must remember the wrongdoing without holding a grudge.

How, then, do you forgive? And how do you think about the act in such a way, that you can find forgiveness without simply ruminating on the event? Rumination has one key component - it is dwelling on the negative without trying to find a solution for the negative. You're stuck in the mud and simply spinning your wheels without really getting anywhere. Reflection, on the other hand, involves thinking through an event and trying to find closure. Reflective thought leads to a change in the thinker.

The review I linked to above (linked again here) discussed some of the reflection “forgivers” engage in. One cognitive process is empathy, in which the forgiver puts him- or herself in the other’s shoes, and attempts to experience the same emotional state. Not only do “trait forgivers” experience more empathy, but people who are randomly assigned to engage in empathy are also able to experience forgiveness. This provides some evidence that anyone, even people who are not naturally empathetic, can use this experience to forgive.

Forgivers are more generous in their appraisals of the one(s) to be forgiven, seeing them as more likable or having more likable traits. They are also better at understanding another person’s explanation for the behavior. In essence, they try to see the situation from the other person’s point-of-view. You don’t have to accept another person’s explanation, but rather, try to understand where they’re coming from. At the very least, this understanding can aid in finding a solution or determining a path to reconciliation. People often have a very self-centered view of the world in that they have difficulty recognizing that other people do not see things in the same way or have the same knowledge (a good blog post for another day).

Of course, one thing that may make forgiveness such a difficult process in the case of 9/11 is the severity of the wrong as well as the fact that the group responsible has such different worldviews. In a previous blog entry, I talked about stereotypes and ingroup/outgroup, all of which is definitely relevant here. Our tendency to dehumanize the outgroup makes forgiveness complicated, because forgiveness is a between-human experience. Forgiveness in this case is not impossible, but would have to involve an even greater degree of understanding and attempts to characterize the other group’s point-of-view.

Forgiveness is a process. Even 10 years later, the emotions are still very raw, but we can still continue moving forward.

Thoughtfully yours,
Sara