Deeply Trivial: physical health

Showing posts with label physical health. Show all posts

Wednesday, December 9, 2020

COVID

Hey all,

It's been a long time since I've updated! Though I've commented a bit on the pandemic on this blog, I've mostly stayed pretty quiet. Unfortunately, the COVID pandemic has hit home quite literally.

I'm currently in Kansas City with my family. My parents are older and have a variety of risk factors, so they've been staying in all the time. My brother, who lives with them, works in an elementary school, and though he's always been safe and careful, it appears he caught COVID shortly before Thanksgiving. Other than a bad cough, he reported feeling fine. Late last week, my dad had a COVID test done in advance of a procedure, and though he also felt fine, his test came back positive. Shortly after, my mom got a test that also came back positive. They're both experiencing more symptoms now, like shortness of breath and fatigue. My test done that same day came back negative, but yesterday, I started to feel some COVID symptoms myself, mostly fatigue (which could be as much due to stress as COVID).

We're all very lucky that our cases appear to be mild, and my parents' providers are checking in with them regularly to make sure they're recovering well. After this week, I'll probably take advantage of my excess vacation time and take time off from work to rest and recover. I'm in Kansas City for the rest of the year, and thanks to my parents' huge backyard, don't even have to leave to give Zep his much-needed outdoor time.

Stay safe and healthy, everyone!

Tuesday, August 11, 2020

Coronavirus "Truthers" and Men Without Masks

Two articles related to coronavirus crossed my newsfeed this morning. First is an inside look at the various Coronavirus "Truth" sites on Facebook, which peddle a variety of misinformation - from the argument that mask-wearing is a prelude to the imposition of Sharia law to masks as a way to increase child sex trafficking:

Just searching “coronavirus” will take you to a host of legitimate resources: pages for the CDC, the World Health Organization and the American Medical Association. But add a word like “truth” and suddenly you’re on a different planet: groups that exist as safe spaces for coronavirus skeptics to share theories of what’s really going on.

For every post or meme that bears a “False Information” label and links to fact-checking sites, there are dozens that elude this moderation, often as they do not present a debunkable statement. How exactly are you supposed to disprove the notion that face-mask enforcement is a prelude to some requirement that women wear the Muslim niqab?

The misinformation is so diversified (yet interconnected and overlapping) that you are bound to find your personal bogeyman at the bottom of the rabbit hole. These memes and talking points are made to frighten while appealing to your “common sense,” to flatter your intellect as it suckers you in with specious “logic” and emotional whataboutery.

Sadly, I've seen a lot of these memes and specious arguments on the pages of friends and acquaintances.

The second article discusses research that attempts to explain why men are being hit harder with Coronavirus: performative masculinity:

Poll after poll, most recently a Gallup poll from July 13, has found American men are more likely to not wear masks compared to women. Specifically, the survey found that 34 percent of men compared to 54 percent of women responded they “always” wore a mask when outside their home and that 20 percent of men said they “never” wore a mask outside their home (compared to just 8 percent of women).

Tyler Reny, a postdoctoral research fellow at Washington University in St. Louis, found [similar results] by combing through data from the Democracy Fund + UCLA Nationscape project, a public opinion survey that’s been interviewing more than 6,000 Americans about the virus per week since March 19.

“Those who had more sexist attitudes were far less likely to report feeling concerned about the pandemic, less likely to support state and local coronavirus policies, less likely to take precautions like washing their hands or wearing masks, and more likely to get sick than those with less sexist attitudes,” Reny told me. “What I found is that sexist attitudes are very predictive of all four sets of [aforementioned] outcomes, even after accounting for differences in partisanship, ideology, age, education, and population density.”

Stay healthy, stay informed, and please:

Friday, March 20, 2020

Talking with Numbers

From Reddit user Itsactuallywhom, a chart of US COVID-19 cases along with Trump's statements about the pandemic:

Illinois residents have been asked to shelter-in-place. I've got plans for writing projects, gardening, and an office to clean/reorganize, so I should be able to keep busy when I'm not working (from home). A few of my writing friends are talking about organizing CoroNaNo - National Novel Writing during COVID-19 quarantine. Anyone want to participate?

Monday, March 16, 2020

Updates

What a strange time we are living in. My CEO has closed our office and is having everyone working from home for at least the next two weeks. On top of that, we've learned that employees in my office building as well as a coworker have been exposed to COVID-19 (so far, none have tested positive). Since my parents are both over 70 and have health conditions that put them at greater risk for complications if they contract COVID, I'm back in Kansas City, working there and helping out by running errands and doing things around the house.

Plus I've got two of the best social distancing buddies on the planet:

Teddy, on the left (my parents' dog), is almost 11 and Zeppelin, on the right, just turned 2 yesterday! We celebrated with a drive to Kansas City, a new toy, and a yummy peanut butter treat he'll get later this evening.

I was invited to complete the 2020 Census, so I hopped online and filled it out. Surprisingly, all they asked me was gender (binary - male or female, with no options for other or even prefer not to say), birthdate, zip, race, and whether I rented or owned. Nothing about marital status, income level, sexual orientation... I expected a series of demographic questions, but what I ended up with took 2 minutes. So yeah, the Census is going to be a joke this year. Of course, I suspected this would be the case when I blogged about it previously (here, here and here). And Trump from the beginning has been problematic with data (see here) and his (lack of) response to COVID is likely also to keep the numbers artificially low and help his reelection odds.

Otherwise, things are quiet here. Schools are closed (mostly because they're on Spring Break, but they're likely going to stay closed even after that) so there's not much traffic, and we have plenty of food plus Netflix, Amazon Prime and Disney+. The dogs have no idea what's going on in the world, so they're pretty happy. I'm crunching away analyzing data for work and texting with friends, dreaming about spring and my garden, and taking inventory of my huge stack of books to read.

That's all for now! Hopefully more blog posts soon!

Tuesday, January 14, 2020

Updates

New year, new job, new blog post describing it all. On January 6, I started working as a Data Analyst at the American Board of Medical Specialties, which oversees certification and maintenance of certification activities for 24 Member Boards (such as the American Board of Dermatology, American Board of Nuclear Medicine, and so on).

The main part of my job will be doing analysis, research, and program evaluation of the CertLink program, which is a really cool online system that tests physician knowledge in their certification area, provides feedback and introduces new information to improve over time, and measures the relevance of items to their practice, so that their maintenance of certification assessments can become more targeted to the population and types of cases they encounter in their practice. We're hoping that this kind of system will become the future of medical specialty certification, so rather than taking a high stakes exam every 10 years, medical specialists can maintain their certifications through targeted, longitudinal assessment and continuing education. And we're hoping to show this approach works by tying it to long-term, quality of care outcomes, like prescribing patterns. I'll share more as I learn more about the company and my role, to the degree that I can based on data privacy. But I'm so excited to be involved with this, using my psychometrics and statistics skills for the data I'm working with, and my research/program evaluation skills to show (how) the system works. I also finally get to use my SQL knowledge as part of my job, and will be using my R and Python programming skills pretty regularly as well.

Zeppelin is adjusting well to me working again. He adores his dog walker, who he sees three times a week, and has made many new friends in the doggy daycare he attends twice a week. He also has a huge crush on Mona, who can be found at Uncharted Books, stopping to stare longingly at her every time we walk by the shop. As is the case with so many crushes, this love seems to be unrequited; Mona tolerates Zeppelin but doesn't like the way he drinks out of her water bowl when we stop in.

On the blogging front, I'm working on an analysis of the 88 books I read last year, and might even do some long-term analysis of my last few years of reading data. Stay tuned for that.

Wednesday, April 3, 2019

C is for Category Function

Up to now, I’ve been talking mostly about Rasch with correct/incorrect or yes/no data. But Rasch can also be used with measures using rating scales or where multiple points can be awarded for an answer. If all of your items have the same scale – that is, they all use a Likert scale of Strongly Disagree to Strongly Agree or they’re all worth 5 points – you can use the Rasch Rating Scale Model.

Note: If your items have differing scales, you could use a Partial Credit Model, which fits each item separately, or if you have sets of items worth the same number of points, you could use a Grouped Rating Scale model, which is sort of a hybrid of the RSM and PCM. I’ll try to touch on these topics later.

Again, in Rasch, every item has a difficulty and every person an ability. But for items worth multiple points or with rating scales, there’s a third thing, which is on the same scale as item difficulty and person ability – the difficulty level for each point on the rating scale. How much ability does a person need to earn all 5 points on a math item? (Or 4 points? Or 3 points? …) How much of the trait is needed to select “Strongly Agree” on a satisfaction with life item? (Or Agree? Neutral? ...) Each point on the scale is given a difficulty. When you examine these values, you’re looking at Category Function.

When you look at these category difficulties, you want to examine two things. First, you want to make certain that higher points on the scale require more ability or more of the trait. Your category difficulties should stairstep up. When your scale points do this, we say the scale is “monotonic” (or “proceeds monotonically”).

Let’s start by looking at a scale that does not proceed monotonically, where the category difficulties are disordered. There are two types of category function data you’ll look at. The first is the “observed measure,” which is the average ability of the people who selected that category. The second are category thresholds – how much more of the trait is needed to select that particular category. When I did my Facebook study, I used the Facebook Questionnaire (Ross et al., 2009), which is a 4-item measure assessing intensity of use and attitudes toward Facebook. All 4 items use a 7-point scale from Strongly Disagree to Strongly Agree. Just for fun, I decided to run this measure through a Rasch analysis in Winsteps, and see how the categories function. Specifically, I looked at the thresholds. (I also looked the observed measures, but they were monotonic, which is good. But the thresholds were not, which can happen, where one looks good and the other looks bad.) Because these are thresholds between categories, there isn’t one for the first category, Strongly Disagree. But there is one for each category after that, which reflects how much ability or the trait they need to be more likely to select that category than the one below it. Here’s what those look like for the Facebook Questionnaire.

The threshold for the neutral category is lower than for slightly disagree. People are not using that category as I intended them to – perhaps they’re using it when they generally have no opinion, for instance, rather than when they’re caught directly between agreement and disagreement. If I were developing this measure, I might question whether to drop this category, or perhaps find a better descriptor for it. Regardless, I would probably collapse this category into another one (which I usually determine based on frequencies), or possibly drop it, and rerun my analysis with a new 6-point scale to see if category function improves.

The second thing you want to look for is a good spread on those thresholds; you want them to be at least a certain number of logits apart. When you have more options on a rating scale, this adds additional cognitive effort to answer the question. So you want to make sure that each additional point on the rating scale actually gives you useful information – information that allows you to differentiate between people at one point on the ability scale and others. If two categories have basically the same threshold, it means people are having trouble differentiating the two; maybe they’re having trouble parsing the difference between “much of the time” and “most of the time,” leading people of approximately the same ability level to select these two categories about equally.

I’ve heard different guidelines on how big a “spread” is needed. Linacre, who created Winsteps, recommends 1.4 logits, and recommends collapsing categories until you’re able to attain this spread. That’s not always possible. I’ve also heard smaller, such as 0.5 logits. But either way, you definitely don’t want two categories to have the exact same observed measure or category threshold.

Also as part of the Facebook study, I administered the 5-item Satisfaction with Life Scale (Diener et al., 1985). Like the Facebook Questionnaire, this measure uses a 7-point scale (Strongly Disagree to Strongly Agree).

The middle categories are all closer together, and certainly don’t meet Linacre’s 1.4 logits guideline. I’m not as concerned about that, but I am concerned that Neither Agree nor Disagree and Slightly Agree are so close together. Just like above, where the category thresholds didn’t advance, there might be some confusion about what this “neutral” category really means. Perhaps this measure doesn’t need a 7-point scale. Perhaps it doesn’t need a neutral option. These are some issues to explore with the measure.

As a quick note, I don’t want it to appear I’m criticizing either measure. They were not developed with Rasch and this idea of category function is a Rasch-specific one. It might not be as important for these measures. But if you’re using the Rasch approach to measurement, these are ideas you need to consider. And clearly, these category function statistics can tell you a lot about whether there seems to be confusion about how a point on a rating scale is used or what it means. If you’re developing a scale, it can help you figure out what categories to combine or even drop.

Tomorrow’s post – dimensionality!

References

Diener, E., Emmons, R. A., Larsen, R. J., & Griffin, S. (1985). The Satisfaction with Life Scale. Journal of Personality Assessment, 49, 71-75.

Ross, C., Orr, E. S., Sisic, M., Arseneault, J. M., Simmering, M. G., & Orr, R. R. (2009). Personality and motivations associated with Facebook use. Computers in Human Behavior, 25, 578-586.

Thursday, February 28, 2019

A New Trauma Population for the Social Media Age

Even if you aren't a Facebook use, you're probably aware that there are rules about what you can and cannot post. Images or videos that depict violence or illegal behavior would of course be taken down. But who decides that? You as a user can always report an image or video (or person or group) if you think it violates community standards. But obviously, Facebook doesn't want to traumatize its users if it can be avoided.

That's where the employees of companies like Cognizant come in. It's their job to watch some of the most disturbing content on the internet - and it's even worse than it sounds. In this fascinating article for The Verge, Casey Newton describes just how traumatic doing such a job can be. (Content warning - this post has lots of references to violence, suicide, and mental illness.)

The problem with the way these companies do business is that, not only do employees see violent and disturbing content; they also don't have the opportunity to talk about what they see with their support networks:

Over the past three months, I interviewed a dozen current and former employees of Cognizant in Phoenix. All had signed non-disclosure agreements with Cognizant in which they pledged not to discuss their work for Facebook — or even acknowledge that Facebook is Cognizant’s client. The shroud of secrecy is meant to protect employees from users who may be angry about a content moderation decision and seek to resolve it with a known Facebook contractor. The NDAs are also meant to prevent contractors from sharing Facebook users’ personal information with the outside world, at a time of intense scrutiny over data privacy issues.

But the secrecy also insulates Cognizant and Facebook from criticism about their working conditions, moderators told me. They are pressured not to discuss the emotional toll that their job takes on them, even with loved ones, leading to increased feelings of isolation and anxiety.

The moderators told me it’s a place where the conspiracy videos and memes that they see each day gradually lead them to embrace fringe views. One auditor walks the floor promoting the idea that the Earth is flat. A former employee told me he has begun to question certain aspects of the Holocaust. Another former employee, who told me he has mapped every escape route out of his house and sleeps with a gun at his side, said: “I no longer believe 9/11 was a terrorist attack.”

It's a fascinating read on an industry I really wasn't aware existed, and a population that could be diagnosed with PTSD and other responses to trauma.

Thursday, September 27, 2018

All Thumbs

Recently, I was diagnosed with flexor tendinitis in my left thumb. After weeks of pain, I finally saw an orthopedic specialist yesterday, who confirmed the original diagnosis, let me peek at my hand x-rays, and gave me my first (and hopefully last) cortisone injection. It left my thumb puffy, with a rather interesting almond-shaped bruise at the injection point.

If I thought the pain was bad before, it was nothing compared to the pain that showed up about an hour after injection. So severe, I could feel it in my teeth, and I spent the whole day feeling nauseous and jittery from it. Any movement of my thumb was excruciating. I stupidly ordered a dinner that required a knife and fork last night, but I discovered how to hold my fork between my forefinger and middle finger, with my pinky to steady it, much to the amusement of my dinner companion. I'm sure I looked like a kid just learning to use a fork. Who knows? Perhaps that will be a useful skill in the future.

I slept in my thumb brace with my hand resting flat on a pillow, and even still, woke up after every sleep cycle with my hand curled up and on fire.

Today, my thumb is stiff and sore, but I can almost make a fist before it starts to hurt, and I can grasp objects with it for short periods of time. The pain is minimal. It kind of makes yesterday's suffering worth it.

Best case scenario is that this injection cures my tendinitis. Worst case is that it does nothing, but based on how much better I'm feeling, I'm hopeful that isn't what's happening. Somewhere between best and worst case is that I may need another injection in the future. If I keep making a recovery, and feeling better tomorrow, I would probably be willing to do it again. But next time, I'll know not to try going to work after. I was useless and barely able to type.

I'm moving out of my apartment this weekend and I have a draft of a chapter I'm writing due Monday, so sadly, there will probably be no Statistics Sunday post this week. Back to regular posting in October.

Tuesday, May 29, 2018

Stress and Its Effect on the Body

After a stressful winter and spring, I'm finally taking a break from work. So of course, what better time to get sick? After a 4-day migraine (started on my first day of vacation - Friday) with a tension headache and neck spasm so bad I couldn't look left, I ended up in urgent care yesterday afternoon. One injection of muscle relaxer, plus prescriptions for more muscle relaxers and migraine meds, and I'm finally feeling better.

Why does this happen? Why is it that after weeks or months of stress, we get sick when we finally get to "come down"?

I've blogged a bit about stress before. Stress causes your body to release certain hormones, such as adrenaline and norepinephrine, which gives an immediate physiological response to stress, and cortisol, which takes a bit longer for you to feel at work in your body. And in fact, cortisol is also involved in many negative consequences of chronic stress. Over time, it can do things like increase blood sugar, suppress the immune system, and contribute to acne breakouts.

You're probably aware that symptoms of sickness are generally caused by your body reacting to and fighting the infection or virus. So the reason you suddenly get sick when the stressor goes away is because your immune system increases function, realizes there's a foreign body that doesn't belong, and starts fighting it. You had probably already caught the virus or infection, but didn't have symptoms like fever (your body's attempt to "cook" it out) or runny nose (your body increasing mucus production to push out the bug), that let you know you were sick.

And in my case in particular, a study published in Neurology found that migraine sufferers were at increased risk of an attack after the stress "let-down." According to the researchers, this effect is even stronger when there is a huge build-up of stress and a sudden, large let-down; it's better to have mini let-downs throughout the stressful experience.

And here I thought I was engaging in a good amount of self-care throughout my stressful February-May.

Friday, April 27, 2018

X is for By

X is for By Today's post will be rather short, demonstrating a set of functions from the psych package, which allows you to conduct analysis by group. These commands add "By" to the end of existing functions. But first, a word of caution: With great power comes great responsibility. This function could very easily turn into a fishing expedition (also known as p-hacking). Conducting planned group comparisons is fine. Conducting all possible group comparisons and cherry-picking any differences is problematic. So use these group by functions with care.

Let's pull up the Facebook dataset for this.

Facebook<-read.delim(file="full_facebook_set.txt", header=TRUE)

This is the full dataset, which includes all the variables I collected. I don't want to run analyses on all variables, so I'll pull out the ones most important for this blog post demonstration.

smallFB<-Facebook[,c(1:2,77:80,105:116,122,133:137,170,187)]

First, I'll run descriptives on this smaller data frame by gender.

library(psych)

## Warning: package 'psych' was built under R version 3.4.4

describeBy(smallFB,smallFB$gender)

## 
##  Descriptive statistics by group 
## group: 0
##              vars  n      mean      sd   median   trimmed     mad      min
## RespondentId    1 73 164647.77 1711.78 164943.0 164587.37 2644.96 162373.0
## gender          2 73      0.00    0.00      0.0      0.00    0.00      0.0
## Rumination      3 73     37.66   14.27     37.0     37.41   13.34      8.0
## DepRelat        4 73     21.00    7.86     21.0     20.95    5.93      4.0
## Brood           5 73      8.49    3.76      9.0      8.42    2.97      1.0
## Reflect         6 73      8.16    4.44      8.0      8.24    4.45      0.0
## SavorPos        7 73     64.30   10.93     65.0     64.92    8.90     27.0
## SavorNeg        8 73     33.30   11.48     33.0     33.08   13.34     12.0
## SavorTot        9 73     31.00   20.15     34.0     31.15   19.27    -10.0
## AntPos         10 73     20.85    3.95     21.0     20.93    4.45     10.0
## AntNeg         11 73     11.30    4.23     11.0     11.22    4.45      4.0
## AntTot         12 73      9.55    6.90     10.0      9.31    7.41     -3.0
## MomPos         13 73     21.68    3.95     22.0     21.90    2.97      9.0
## MomNeg         14 73     11.45    4.63     11.0     11.41    5.93      4.0
## MomTot         15 73     10.23    7.63     11.0     10.36    8.90    -11.0
## RemPos         16 73     21.77    4.53     23.0     22.20    4.45      8.0
## RemNeg         17 73     10.55    4.39      9.0     10.27    4.45      4.0
## RemTot         18 73     11.22    8.05     14.0     11.68    7.41     -8.0
## LifeSat        19 73     24.63    6.80     25.0     24.93    7.41     10.0
## Extravert      20 73      4.32    1.58      4.5      4.33    1.48      1.5
## Agreeable      21 73      4.79    1.08      5.0      4.85    1.48      1.0
## Conscient      22 73      5.14    1.34      5.0      5.19    1.48      2.0
## EmotStab       23 73      5.10    1.22      5.0      5.15    1.48      1.0
## OpenExp        24 73      5.11    1.29      5.5      5.20    1.48      2.0
## Health         25 73     28.77   19.56     25.0     26.42   17.79      0.0
## Depression     26 73     10.26    7.27      9.0      9.56    5.93      0.0
##                 max  range  skew kurtosis     se
## RespondentId 168279 5906.0  0.21    -1.36 200.35
## gender            0    0.0   NaN      NaN   0.00
## Rumination       71   63.0  0.12    -0.53   1.67
## DepRelat         42   38.0  0.10    -0.04   0.92
## Brood            17   16.0  0.15    -0.38   0.44
## Reflect          19   19.0 -0.12    -0.69   0.52
## SavorPos         84   57.0 -0.69     0.76   1.28
## SavorNeg         57   45.0  0.14    -0.95   1.34
## SavorTot         72   82.0 -0.17    -0.75   2.36
## AntPos           28   18.0 -0.24    -0.46   0.46
## AntNeg           22   18.0  0.27    -0.55   0.49
## AntTot           24   27.0  0.11    -0.76   0.81
## MomPos           28   19.0 -0.69     0.55   0.46
## MomNeg           22   18.0  0.08    -0.98   0.54
## MomTot           24   35.0 -0.25    -0.55   0.89
## RemPos           28   20.0 -0.88     0.35   0.53
## RemNeg           22   18.0  0.56    -0.66   0.51
## RemTot           24   32.0 -0.53    -0.77   0.94
## LifeSat          35   25.0 -0.37    -0.84   0.80
## Extravert         7    5.5 -0.09    -0.93   0.19
## Agreeable         7    6.0 -0.60     1.04   0.13
## Conscient         7    5.0 -0.24    -0.98   0.16
## EmotStab          7    6.0 -0.60     0.28   0.14
## OpenExp           7    5.0 -0.49    -0.55   0.15
## Health           91   91.0  1.13     1.14   2.29
## Depression       36   36.0  1.02     0.95   0.85
## -------------------------------------------------------- 
## group: 1
##              vars   n      mean      sd    median   trimmed     mad
## RespondentId    1 184 164373.49 1515.34 164388.00 164253.72 1891.80
## gender          2 184      1.00    0.00      1.00      1.00    0.00
## Rumination      3 184     38.09   15.28     40.00     38.16   17.05
## DepRelat        4 184     21.67    8.78     21.00     21.66    8.90
## Brood           5 184      8.57    4.14      8.50      8.47    3.71
## Reflect         6 184      7.84    4.06      8.00      7.73    4.45
## SavorPos        7 184     67.22    9.63     68.00     67.71    8.90
## SavorNeg        8 184     29.75   11.62     27.50     28.72    9.64
## SavorTot        9 184     37.47   19.30     40.00     38.66   20.02
## AntPos         10 184     22.18    3.37     23.00     22.28    2.97
## AntNeg         11 184     10.10    4.44      9.00      9.78    4.45
## AntTot         12 184     12.08    6.85     14.00     12.36    5.93
## MomPos         13 184     22.28    3.88     23.00     22.59    2.97
## MomNeg         14 184     10.60    4.88      9.50     10.13    5.19
## MomTot         15 184     11.68    7.75     13.00     12.29    7.41
## RemPos         16 184     22.76    3.85     23.00     23.10    2.97
## RemNeg         17 184      9.05    3.79      8.00      8.68    2.97
## RemTot         18 184     13.71    6.97     15.00     14.34    5.93
## LifeSat        19 184     23.76    6.25     24.00     24.18    7.41
## Extravert      20 184      4.66    1.57      5.00      4.74    1.48
## Agreeable      21 184      5.22    1.06      5.50      5.26    1.48
## Conscient      22 184      5.32    1.24      5.50      5.42    1.48
## EmotStab       23 184      4.70    1.31      4.75      4.75    1.11
## OpenExp        24 184      5.47    1.08      5.50      5.56    0.74
## Health         25 184     32.54   16.17     30.00     31.43   16.31
## Depression     26 184     12.19    8.48      9.00     11.09    5.93
##                   min    max  range  skew kurtosis     se
## RespondentId 162350.0 167714 5364.0  0.46    -0.90 111.71
## gender            1.0      1    0.0   NaN      NaN   0.00
## Rumination        3.0     74   71.0 -0.05    -0.60   1.13
## DepRelat          0.0     42   42.0  0.00    -0.46   0.65
## Brood             0.0     19   19.0  0.19    -0.62   0.31
## Reflect           0.0     19   19.0  0.25    -0.48   0.30
## SavorPos         33.0     84   51.0 -0.59     0.36   0.71
## SavorNeg         12.0     64   52.0  0.79     0.25   0.86
## SavorTot        -18.0     72   90.0 -0.57    -0.10   1.42
## AntPos            9.0     28   19.0 -0.49     0.41   0.25
## AntNeg            4.0     22   18.0  0.63    -0.39   0.33
## AntTot           -8.0     24   32.0 -0.43    -0.48   0.50
## MomPos           10.0     28   18.0 -0.81     0.54   0.29
## MomNeg            4.0     24   20.0  0.81    -0.03   0.36
## MomTot          -13.0     24   37.0 -0.69    -0.03   0.57
## RemPos            9.0     28   19.0 -0.87     0.81   0.28
## RemNeg            4.0     21   17.0  0.83     0.33   0.28
## RemTot           -9.0     24   33.0 -0.82     0.50   0.51
## LifeSat           8.0     35   27.0 -0.53    -0.32   0.46
## Extravert         1.0      7    6.0 -0.36    -0.72   0.12
## Agreeable         2.5      7    4.5 -0.27    -0.63   0.08
## Conscient         1.0      7    6.0 -0.70     0.13   0.09
## EmotStab          1.5      7    5.5 -0.35    -0.73   0.10
## OpenExp           1.5      7    5.5 -0.91     0.62   0.08
## Health            2.0     85   83.0  0.60    -0.05   1.19
## Depression        0.0     39   39.0  1.14     0.66   0.62

In this dataset, I coded men as 0 and women as 1. The descriptive statistics table generated includes all scale and subscale scores, and gives me mean, standard deviation, median, a trimmed mean (dropping very low and very high values), median absolute deviation, minimum and maximum values, range, skewness, and kurtosis. I'd need to run t-tests to find out if differences were significant, but this still gives me some idea of how men and women might differ on these measures.

There are certain measures I included that we might hypothesize would show gender differences. For instance, some research suggests gender differences for rumination and depression. In addition to running descriptives by group, I might also want to display these differences in a violin plot. The psych package can quickly generate such a plot by group.

violinBy(smallFB,"Rumination","gender",grp.name=c("M","F"))

violinBy(smallFB,"Depression","gender",grp.name=c("M","F"))

ggplot2 will generate a violin plot by group, so this feature might not be as useful for final displays, but could help in quickly visualizing the data during analysis. And you may find that you prefer the appearance of this plots. To each his own.

Another function is error.bars.by, which plots means and confidence intervals by group for multiple variables. Again, this is a way to get some quick visuals, though differences in scale among measures should be taken into consideration when generating this plot. One set of variables for which this display might be useful is the 5 subscales of the Five-Factor Personality Inventory. This 10-item measure assesses where participants fall on the so-called Big Five personality traits: Openness to Experience, Conscientiousness, Extraversion, Agreeableness, and Neuroticism (Emotional Stability). These subscales are all on the same metric.

error.bars.by(smallFB[,c(20:24)],group=smallFB$gender,xlab="Big Five Personality Traits",ylab="Score on Subscale")

Finally, we have the statsBy function, which gives descriptive statistics by group as well as between group statistics. This functions generates a lot of output, and you can read more about everything it gives you here.

FBstats<-statsBy(smallFB[,2:26],"gender",cors=TRUE,method="pearson",use="pairwise")
print(FBstats,short=FALSE)

## Statistics within and between groups  
## Call: statsBy(data = smallFB[, 2:26], group = "gender", cors = TRUE, 
##     method = "pearson", use = "pairwise")
## Intraclass Correlation 1 (Percentage of variance due to groups) 
##     gender Rumination   DepRelat      Brood    Reflect   SavorPos 
##       1.00      -0.01      -0.01      -0.01      -0.01       0.03 
##   SavorNeg   SavorTot     AntPos     AntNeg     AntTot     MomPos 
##       0.03       0.04       0.05       0.02       0.05       0.00 
##     MomNeg     MomTot     RemPos     RemNeg     RemTot    LifeSat 
##       0.00       0.01       0.02       0.05       0.04       0.00 
##  Extravert  Agreeable  Conscient   EmotStab    OpenExp     Health 
##       0.01       0.05       0.00       0.03       0.03       0.01 
## Depression 
##       0.01 
## Intraclass Correlation 2 (Reliability of group differences) 
##     gender Rumination   DepRelat      Brood    Reflect   SavorPos 
##       1.00     -22.34      -2.06     -50.93      -2.21       0.77 
##   SavorNeg   SavorTot     AntPos     AntNeg     AntTot     MomPos 
##       0.80       0.83       0.86       0.75       0.86       0.19 
##     MomNeg     MomTot     RemPos     RemNeg     RemTot    LifeSat 
##       0.39       0.46       0.68       0.87       0.84      -0.04 
##  Extravert  Agreeable  Conscient   EmotStab    OpenExp     Health 
##       0.60       0.88       0.05       0.80       0.81       0.60 
## Depression 
##       0.66 
## eta^2 between groups  
## Rumination.bg   DepRelat.bg      Brood.bg    Reflect.bg   SavorPos.bg 
##          0.00          0.00          0.00          0.00          0.02 
##   SavorNeg.bg   SavorTot.bg     AntPos.bg     AntNeg.bg     AntTot.bg 
##          0.02          0.02          0.03          0.02          0.03 
##     MomPos.bg     MomNeg.bg     MomTot.bg     RemPos.bg     RemNeg.bg 
##          0.00          0.01          0.01          0.01          0.03 
##     RemTot.bg    LifeSat.bg  Extravert.bg  Agreeable.bg  Conscient.bg 
##          0.02          0.00          0.01          0.03          0.00 
##   EmotStab.bg    OpenExp.bg     Health.bg Depression.bg 
##          0.02          0.02          0.01          0.01 
## Correlation between groups 
##               Rmnt. DpRl. Brd.b Rflc. SvrP. SvrN. SvrT. AntP. AntN. AntT.
## Rumination.bg  1                                                         
## DepRelat.bg    1     1                                                   
## Brood.bg       1     1     1                                             
## Reflect.bg    -1    -1    -1     1                                       
## SavorPos.bg    1     1     1    -1     1                                 
## SavorNeg.bg   -1    -1    -1     1    -1     1                           
## SavorTot.bg    1     1     1    -1     1    -1     1                     
## AntPos.bg      1     1     1    -1     1    -1     1     1               
## AntNeg.bg     -1    -1    -1     1    -1     1    -1    -1     1         
## AntTot.bg      1     1     1    -1     1    -1     1     1    -1     1   
## MomPos.bg      1     1     1    -1     1    -1     1     1    -1     1   
## MomNeg.bg     -1    -1    -1     1    -1     1    -1    -1     1    -1   
## MomTot.bg      1     1     1    -1     1    -1     1     1    -1     1   
## RemPos.bg      1     1     1    -1     1    -1     1     1    -1     1   
## RemNeg.bg     -1    -1    -1     1    -1     1    -1    -1     1    -1   
## RemTot.bg      1     1     1    -1     1    -1     1     1    -1     1   
## LifeSat.bg    -1    -1    -1     1    -1     1    -1    -1     1    -1   
## Extravert.bg   1     1     1    -1     1    -1     1     1    -1     1   
## Agreeable.bg   1     1     1    -1     1    -1     1     1    -1     1   
## Conscient.bg   1     1     1    -1     1    -1     1     1    -1     1   
## EmotStab.bg   -1    -1    -1     1    -1     1    -1    -1     1    -1   
## OpenExp.bg     1     1     1    -1     1    -1     1     1    -1     1   
## Health.bg      1     1     1    -1     1    -1     1     1    -1     1   
## Depression.bg  1     1     1    -1     1    -1     1     1    -1     1   
##               MmPs. MmNg. MmTt. RmPs. RmNg. RmTt. LfSt. Extr. Agrb. Cnsc.
## MomPos.bg      1                                                         
## MomNeg.bg     -1     1                                                   
## MomTot.bg      1    -1     1                                             
## RemPos.bg      1    -1     1     1                                       
## RemNeg.bg     -1     1    -1    -1     1                                 
## RemTot.bg      1    -1     1     1    -1     1                           
## LifeSat.bg    -1     1    -1    -1     1    -1     1                     
## Extravert.bg   1    -1     1     1    -1     1    -1     1               
## Agreeable.bg   1    -1     1     1    -1     1    -1     1     1         
## Conscient.bg   1    -1     1     1    -1     1    -1     1     1     1   
## EmotStab.bg   -1     1    -1    -1     1    -1     1    -1    -1    -1   
## OpenExp.bg     1    -1     1     1    -1     1    -1     1     1     1   
## Health.bg      1    -1     1     1    -1     1    -1     1     1     1   
## Depression.bg  1    -1     1     1    -1     1    -1     1     1     1   
##               EmtS. OpnE. Hlth. Dprs.
## EmotStab.bg    1                     
## OpenExp.bg    -1     1               
## Health.bg     -1     1     1         
## Depression.bg -1     1     1     1   
## Correlation within groups 
##               Rmnt. DpRl. Brd.w Rflc. SvrP. SvrN. SvrT. AntP. AntN. AntT.
## Rumination.wg  1.00                                                      
## DepRelat.wg    0.95  1.00                                                
## Brood.wg       0.88  0.78  1.00                                          
## Reflect.wg     0.80  0.63  0.59  1.00                                    
## SavorPos.wg   -0.20 -0.20 -0.18 -0.15  1.00                              
## SavorNeg.wg    0.43  0.43  0.36  0.30 -0.64  1.00                        
## SavorTot.wg   -0.36 -0.36 -0.31 -0.25  0.89 -0.92  1.00                  
## AntPos.wg     -0.06 -0.05 -0.08 -0.03  0.86 -0.49  0.73  1.00            
## AntNeg.wg      0.32  0.32  0.28  0.21 -0.54  0.89 -0.80 -0.50  1.00      
## AntTot.wg     -0.23 -0.23 -0.21 -0.15  0.78 -0.82  0.89  0.83 -0.89  1.00
## MomPos.wg     -0.26 -0.26 -0.22 -0.19  0.86 -0.60  0.80  0.60 -0.47  0.61
## MomNeg.wg      0.46  0.46  0.39  0.35 -0.51  0.88 -0.78 -0.33  0.66 -0.59
## MomTot.wg     -0.42 -0.42 -0.36 -0.32  0.75 -0.85  0.89  0.51 -0.65  0.68
## RemPos.wg     -0.20 -0.19 -0.17 -0.15  0.89 -0.56  0.79  0.66 -0.44  0.62
## RemNeg.wg      0.34  0.35  0.28  0.23 -0.65  0.87 -0.85 -0.49  0.69 -0.69
## RemTot.wg     -0.29 -0.30 -0.25 -0.21  0.85 -0.79  0.90  0.63 -0.62  0.72
## LifeSat.wg    -0.47 -0.47 -0.43 -0.31  0.54 -0.50  0.57  0.39 -0.33  0.41
## Extravert.wg  -0.20 -0.19 -0.11 -0.20  0.34 -0.35  0.38  0.21 -0.29  0.29
## Agreeable.wg  -0.18 -0.18 -0.20 -0.10  0.35 -0.45  0.45  0.28 -0.39  0.39
## Conscient.wg  -0.25 -0.30 -0.20 -0.10  0.24 -0.21  0.25  0.16 -0.14  0.17
## EmotStab.wg   -0.48 -0.44 -0.49 -0.34  0.34 -0.44  0.43  0.20 -0.33  0.32
## OpenExp.wg    -0.16 -0.14 -0.21 -0.10  0.37 -0.31  0.37  0.27 -0.27  0.31
## Health.wg      0.44  0.47  0.36  0.29 -0.30  0.34 -0.35 -0.21  0.26 -0.27
## Depression.wg  0.57  0.58  0.49  0.38 -0.44  0.55 -0.55 -0.27  0.39 -0.39
##               MmPs. MmNg. MmTt. RmPs. RmNg. RmTt. LfSt. Extr. Agrb. Cnsc.
## MomPos.wg      1.00                                                      
## MomNeg.wg     -0.56  1.00                                                
## MomTot.wg      0.86 -0.91  1.00                                          
## RemPos.wg      0.65 -0.42  0.59  1.00                                    
## RemNeg.wg     -0.55  0.63 -0.67 -0.65  1.00                              
## RemTot.wg      0.66 -0.58  0.69  0.91 -0.91  1.00                        
## LifeSat.wg     0.55 -0.55  0.62  0.48 -0.42  0.49  1.00                  
## Extravert.wg   0.39 -0.37  0.43  0.28 -0.25  0.29  0.27  1.00            
## Agreeable.wg   0.33 -0.43  0.43  0.31 -0.36  0.37  0.25  0.12  1.00      
## Conscient.wg   0.25 -0.16  0.22  0.23 -0.26  0.26  0.33  0.03  0.29  1.00
## EmotStab.wg    0.40 -0.50  0.51  0.27 -0.32  0.32  0.44  0.12  0.41  0.27
## OpenExp.wg     0.39 -0.26  0.36  0.30 -0.28  0.32  0.34  0.29  0.36  0.14
## Health.wg     -0.30  0.33 -0.36 -0.27  0.29 -0.31 -0.42 -0.10 -0.25 -0.24
## Depression.wg -0.45  0.56 -0.58 -0.41  0.49 -0.50 -0.65 -0.24 -0.29 -0.26
##               EmtS. OpnE. Hlth. Dprs.
## EmotStab.wg    1.00                  
## OpenExp.wg     0.24  1.00            
## Health.wg     -0.31 -0.18  1.00      
## Depression.wg -0.54 -0.28  0.56  1.00
## 
## Many results are not shown directly. To see specific objects select from the following list:
##  mean sd n F ICC1 ICC2 ci1 ci2 r within pooled sd.r raw rbg pbg rwg nw pwg etabg etawg nwg nG Call

The variance explained by gender is quite small for all of the variables. Instead, the relationships between the variables seem to be more meaningful.

A to Z is almost done! Just Y and Z, plus look for an A-to-Z-influenced Statistics Sunday post!

Thursday, April 26, 2018

Predictive Analytics and Veteran Suicide Prevention

One of the best things about working for the Department of Veterans Affairs was the vast amount of data available on Veterans receiving care through VA. While my research center often used surveys, focus groups, and interviews to collect data on Veterans, we frequently pulled in data from Veterans' medical records (with their permission, of course). And other researchers were accessing Veteran data directly to understand and improve care.

An issue we frequently heard about, and sometimes dealt with firsthand, was the high rate of suicide among our Veterans. Veterans are at high risk for many physical and mental conditions, and are at heightened risk for suicide. The National Suicide Prevention Lifeline was created to help anyone, including Veterans, who is feeling helpless. Through partnerships with VA and other federal agencies, we heard about many success stories of the Lifeline.

But with the vast amount of data available on our Veterans, it would be great if we could intervene and help before someone gets to the crisis point. Today, I read about how the REACH Vet program is using predictive analytics to identify Veterans at risk for suicide:

The REACH Vet program draws on the agency’s vast trove of electronic health records and uses predictive analytics to identify patients who might be at risk of suicide. It alerts VA clinicians of veterans who could benefit from more attention, and the program prompts clinicians to call and check in with their patients.

“What we found … not surprisingly is that veterans at highest risk of suicide are also at very high risk of some other things,” Aaron Eagan, VA’s deputy director for innovation, said Thursday at ACT-IAC’s Health Innovation Day in Washington. “They’re at significantly increased rates of all-cause mortality, accident morality, overdoses, violence … [and] opioids.”

Veterans who engaged with REACH Vet were admitted to mental health inpatient units less often, showed up to more mental health and primary care appointments and visited the VA more frequently, compared to veterans who weren’t part of the program.

Eagan said he expected veterans would be frustrated by the phone calls, but his team hasn’t gotten any complaints.

“It’s a great reminder that people really feel good about us caring about them, and that’s what the response generally is,” he said.

The REACH Vet team is updating its predictive model for the program now, and it’s starting a new collaboration with the Energy Department’s super computer, Eagan said.

Tuesday, April 17, 2018

O is for Overview Reports

O is for overview Reports with dataMaid One of the best things you can do is to create a study codebook to accompany your dataset. In it, you should include information about the study variables and how they were created/computed. It's also nice to have a summary of your dataset, all in one place, so you can quickly check for any issues and begin cleaning your data, and/or plan your analysis. But this can be a rather tedious process of creating and formatting said report, and running the various descriptive statistics and plots. But what if an R package could do much of that for you?

dataMaid to the rescue! In addition to having some great functions to help streamline data cleaning, dataMaid can create an overview report of your dataset, containing the information you request, and generate an R Markdown file to which you could add descriptive information, like functions used to calculate variables, item text, and so on.

For today's post, I'll use the simulated Facebook dataset I created and shared. You can replicate the exact results I get if you use that dataset. After importing the file, I want to make certain all variables are of the correct type. If necessary, I can make some changes. This becomes important when we go to generate our report. I also want to score all my scales, so I have total and subscale scores in the file.

Facebook<-read.delim(file="simulated_facebook_set.txt", header=TRUE)
str(Facebook)

## 'data.frame': 257 obs. of  111 variables:
##  $ ID            : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ gender        : int  1 1 1 0 1 0 1 0 1 1 ...
##  $ Rum1          : int  3 2 3 2 2 2 0 4 1 0 ...
##  $ Rum2          : int  1 2 4 2 4 1 1 2 2 2 ...
##  $ Rum3          : int  3 3 2 2 2 2 0 1 2 2 ...
##  $ Rum4          : int  1 2 4 0 2 4 1 0 2 2 ...
##  $ Rum5          : int  3 1 2 2 1 3 1 0 0 2 ...
##  $ Rum6          : int  2 3 4 3 2 2 1 1 3 2 ...
##  $ Rum7          : int  3 1 4 3 0 3 3 4 3 2 ...
##  $ Rum8          : int  1 2 4 1 0 1 1 1 3 3 ...
##  $ Rum9          : int  3 0 2 0 1 0 3 2 2 0 ...
##  $ Rum10         : int  1 1 2 2 2 2 2 0 1 1 ...
##  $ Rum11         : int  1 0 0 3 1 2 3 0 4 3 ...
##  $ Rum12         : int  0 2 2 1 0 1 0 2 2 0 ...
##  $ Rum13         : int  4 2 3 3 3 2 1 1 2 2 ...
##  $ Rum14         : int  0 1 3 1 2 2 2 2 4 2 ...
##  $ Rum15         : int  2 2 1 2 2 2 2 1 3 0 ...
##  $ Rum16         : int  2 4 4 0 1 2 0 1 2 4 ...
##  $ Rum17         : int  1 2 2 2 1 3 2 1 2 3 ...
##  $ Rum18         : int  2 2 4 1 2 2 2 1 1 1 ...
##  $ Rum19         : int  0 2 2 1 2 4 2 2 1 0 ...
##  $ Rum20         : int  1 1 2 2 1 1 1 2 4 2 ...
##  $ Rum21         : int  2 2 1 1 1 1 1 3 4 0 ...
##  $ Rum22         : int  2 1 2 2 1 2 0 1 1 1 ...
##  $ Sav1          : int  5 6 7 4 5 6 6 7 6 7 ...
##  $ Sav2          : int  3 2 6 2 2 6 3 6 3 2 ...
##  $ Sav3          : int  7 7 7 6 7 5 6 6 7 7 ...
##  $ Sav4          : int  4 5 5 4 3 5 1 5 2 5 ...
##  $ Sav5          : int  7 6 7 7 5 6 6 6 4 6 ...
##  $ Sav6          : int  4 0 6 4 2 2 3 4 6 4 ...
##  $ Sav7          : int  3 6 5 6 7 7 7 7 7 6 ...
##  $ Sav8          : int  2 2 3 3 2 4 3 3 3 5 ...
##  $ Sav9          : int  6 4 6 6 6 6 6 7 6 5 ...
##  $ Sav10         : int  2 4 1 2 1 3 2 5 1 1 ...
##  $ Sav11         : int  3 3 6 2 6 6 4 1 3 4 ...
##  $ Sav12         : int  0 3 3 3 4 4 3 4 5 3 ...
##  $ Sav13         : int  3 7 7 4 4 3 5 5 7 4 ...
##  $ Sav14         : int  2 2 5 0 3 2 2 2 3 2 ...
##  $ Sav15         : int  5 6 5 5 4 7 4 6 7 7 ...
##  $ Sav16         : int  3 2 2 6 2 3 1 3 2 2 ...
##  $ Sav17         : int  6 3 6 6 5 4 6 6 6 5 ...
##  $ Sav18         : int  2 2 2 3 2 6 3 2 1 2 ...
##  $ Sav19         : int  6 7 6 6 6 7 2 4 6 4 ...
##  $ Sav20         : int  3 2 3 4 6 6 6 3 3 7 ...
##  $ Sav21         : int  6 3 3 6 4 7 6 6 7 4 ...
##  $ Sav22         : int  1 4 2 2 2 2 2 2 3 3 ...
##  $ Sav23         : int  7 7 6 4 6 7 6 4 6 5 ...
##  $ Sav24         : int  2 1 5 1 1 1 3 1 2 2 ...
##  $ LS1           : int  3 5 6 4 7 4 4 6 6 7 ...
##  $ LS2           : int  6 2 6 5 7 5 6 5 6 4 ...
##  $ LS3           : int  7 4 6 4 3 6 5 2 6 6 ...
##  $ LS4           : int  2 6 6 3 6 6 5 6 7 5 ...
##  $ LS5           : int  3 6 7 5 4 4 4 1 4 2 ...
##  $ Extraverted   : int  5 4 6 6 3 3 7 4 4 6 ...
##  $ Critical      : int  5 5 5 3 4 5 1 6 6 5 ...
##  $ Dependable    : int  6 6 6 5 6 6 6 6 7 7 ...
##  $ Anxious       : int  6 6 6 4 6 6 5 6 5 4 ...
##  $ NewExperiences: int  7 6 6 6 6 6 7 6 3 6 ...
##  $ Reserved      : int  3 4 6 5 5 5 3 4 7 2 ...
##  $ Sympathetic   : int  7 6 7 6 5 6 3 7 6 6 ...
##  $ Disorganized  : int  6 5 6 5 5 5 3 4 7 5 ...
##  $ Calm          : int  6 7 6 5 6 7 6 3 5 6 ...
##  $ Conventional  : int  3 4 2 3 2 2 2 2 3 3 ...
##  $ Health1       : int  1 1 4 2 1 3 2 3 3 0 ...
##  $ Health2       : int  2 1 1 0 2 1 0 2 2 1 ...
##  $ Health3       : int  2 1 2 0 1 2 3 2 2 1 ...
##  $ Health4       : int  0 2 0 0 1 0 0 1 0 0 ...
##  $ Health5       : int  1 1 1 0 1 1 1 0 3 0 ...
##  $ Health6       : int  0 0 2 0 0 0 1 0 1 0 ...
##  $ Health7       : int  0 0 3 2 0 1 1 2 1 0 ...
##  $ Health8       : int  2 3 4 2 1 1 1 0 2 2 ...
##  $ Health9       : int  2 3 3 1 2 4 4 2 2 3 ...
##  $ Health10      : int  0 0 1 1 0 1 1 1 0 0 ...
##  $ Health11      : int  0 1 2 1 2 0 0 0 2 0 ...
##  $ Health12      : int  0 2 1 2 0 1 1 0 0 0 ...
##  $ Health13      : int  0 2 2 0 2 3 1 1 0 1 ...
##  $ Health14      : int  2 2 3 0 0 2 1 2 2 0 ...
##  $ Health15      : int  2 1 1 0 1 0 0 0 1 0 ...
##  $ Health16      : int  1 1 3 0 2 3 2 1 0 0 ...
##  $ Health17      : int  0 0 0 2 2 3 0 2 1 0 ...
##  $ Health18      : int  1 4 1 1 0 0 0 1 2 0 ...
##  $ Health19      : int  0 2 2 0 0 1 0 0 2 1 ...
##  $ Health20      : int  2 1 2 0 1 1 0 0 0 0 ...
##  $ Health21      : int  1 0 1 1 0 2 0 1 1 1 ...
##  $ Health22      : int  3 1 2 0 2 4 2 2 0 2 ...
##  $ Health23      : int  1 0 3 2 2 0 2 3 2 2 ...
##  $ Health24      : int  0 0 1 1 1 0 0 2 1 0 ...
##  $ Health25      : int  0 3 1 2 2 0 2 0 1 0 ...
##  $ Health26      : int  1 0 0 0 0 0 2 1 1 2 ...
##  $ Health27      : int  2 1 0 1 1 0 0 1 0 1 ...
##  $ Health28      : int  0 3 2 0 1 3 0 2 3 2 ...
##  $ Health29      : int  1 2 1 0 1 1 2 1 1 2 ...
##  $ Health30      : int  0 0 0 2 0 0 0 0 0 0 ...
##  $ Health31      : int  1 0 0 0 0 2 1 0 0 0 ...
##  $ Health32      : int  2 1 2 1 2 2 2 1 2 0 ...
##  $ Dep1          : int  0 0 2 0 1 1 1 1 0 0 ...
##  $ Dep2          : int  0 1 0 0 0 0 0 0 1 2 ...
##  $ Dep3          : int  0 1 0 0 0 0 1 0 2 2 ...
##  $ Dep4          : int  1 0 1 1 0 0 0 1 1 0 ...
##   [list output truncated]

Facebook$ID<-as.character(Facebook$ID)
Facebook$gender<-factor(Facebook$gender, labels=c("Male","Female"))
Rumination<-Facebook[,3:24]
Savoring<-Facebook[,25:48]
SatwithLife<-Facebook[,49:53]
CHIPS<-Facebook[,64:95]
CESD<-Facebook[,96:111]
Facebook$RRS<-rowSums(Facebook[,3:24])
Facebook$RRS_D<-rowSums(Facebook[,c(3,4,5,6,8,10,11,16,19,20,21,24)])
Facebook$RRS_R<-rowSums(Facebook[,c(9,13,14,22,23)])
Facebook$RRS_B<-rowSums(Facebook[,c(7,12,15,17,18)])
reverse<-function(max,min,x) {
  y<-(max+min)-x
  return(y)
  }
Facebook$Sav2R<-reverse(7,1,Facebook$Sav2)
Facebook$Sav4R<-reverse(7,1,Facebook$Sav4)
Facebook$Sav6R<-reverse(7,1,Facebook$Sav6)
Facebook$Sav8R<-reverse(7,1,Facebook$Sav8)
Facebook$Sav10R<-reverse(7,1,Facebook$Sav10)
Facebook$Sav12R<-reverse(7,1,Facebook$Sav12)
Facebook$Sav14R<-reverse(7,1,Facebook$Sav14)
Facebook$Sav16R<-reverse(7,1,Facebook$Sav16)
Facebook$Sav18R<-reverse(7,1,Facebook$Sav18)
Facebook$Sav20R<-reverse(7,1,Facebook$Sav20)
Facebook$Sav22R<-reverse(7,1,Facebook$Sav22)
Facebook$Sav24R<-reverse(7,1,Facebook$Sav24)
Facebook$SBI<-Facebook$Sav2R+Facebook$Sav4R+Facebook$Sav6R+
  Facebook$Sav8R+Facebook$Sav10R+Facebook$Sav12R+Facebook$Sav14R+
  Facebook$Sav16R+Facebook$Sav18R+Facebook$Sav20R+Facebook$Sav22R+
  Facebook$Sav24R+Facebook$Sav1+Facebook$Sav3+Facebook$Sav5+
  Facebook$Sav7+Facebook$Sav9+Facebook$Sav11+Facebook$Sav13+Facebook$Sav15+
  Facebook$Sav17+Facebook$Sav19+Facebook$Sav21+Facebook$Sav23
Facebook$SavPos<-Facebook$Sav2R+Facebook$Sav4R+Facebook$Sav6R+
  Facebook$Sav8R+Facebook$Sav10R+Facebook$Sav12R+Facebook$Sav14R+
  Facebook$Sav16R+Facebook$Sav18R+Facebook$Sav20R+Facebook$Sav22R+
  Facebook$Sav24R
Facebook$SavNeg<-Facebook$Sav1+Facebook$Sav3+Facebook$Sav5+
  Facebook$Sav7+Facebook$Sav9+Facebook$Sav11+Facebook$Sav13+Facebook$Sav15+
  Facebook$Sav17+Facebook$Sav19+Facebook$Sav21+Facebook$Sav23
Facebook$Anticipating<-Facebook$Sav1+Facebook$Sav4R+Facebook$Sav7+
  Facebook$Sav10R+Facebook$Sav13+Facebook$Sav16R+Facebook$Sav19+Facebook$Sav22R
Facebook$Moment<-Facebook$Sav2R+Facebook$Sav5+Facebook$Sav8R+
  Facebook$Sav11+Facebook$Sav14R+Facebook$Sav17+Facebook$Sav20R+Facebook$Sav23
Facebook$Reminiscing<-Facebook$Sav3+Facebook$Sav6R+Facebook$Sav9+
  Facebook$Sav12R+Facebook$Sav15+Facebook$Sav18R+Facebook$Sav21+Facebook$Sav24R
Facebook$SWLS<-rowSums(SatwithLife)
Facebook$CritR<-reverse(7,1,Facebook$Critical)
Facebook$AnxR<-reverse(7,1,Facebook$Anxious)
Facebook$ResR<-reverse(7,1,Facebook$Reserved)
Facebook$DisR<-reverse(7,1,Facebook$Disorganized)
Facebook$ConvR<-reverse(7,1,Facebook$Conventional)
Facebook$Extraversion<-(Facebook$Extraverted+Facebook$ResR)/2
Facebook$Agree<-(Facebook$CritR+Facebook$Sympathetic)/2
Facebook$Consc<-(Facebook$Dependable+Facebook$DisR)/2
Facebook$EmoSt<-(Facebook$AnxR+Facebook$Calm)/2
Facebook$Openness<-(Facebook$NewExperiences+Facebook$ConvR)/2
Facebook$Health<-rowSums(CHIPS)
Facebook$Dep4R<-reverse(3,0,Facebook$Dep4)
Facebook$Dep8R<-reverse(3,0,Facebook$Dep8)
Facebook$Dep12R<-reverse(3,0,Facebook$Dep12)
Facebook$CESD<-Facebook$Dep1+Facebook$Dep2+Facebook$Dep3+Facebook$Dep4R+Facebook$Dep5+
  Facebook$Dep6+Facebook$Dep7+Facebook$Dep8R+Facebook$Dep9+Facebook$Dep10+Facebook$Dep11+
  Facebook$Dep12R+Facebook$Dep13+Facebook$Dep14+Facebook$Dep15+Facebook$Dep16
library(dataMaid)

## Loading required package: ggplot2

To generate an overview report, we'll use the function, makeDataReport, which gives you a great overview of your dataset, with summary statistics, some analysis to assist with data cleaning, and customization options. And you can very easily add codebook information to your data report once it's in R Markdown format.

The makeDataReport function has many arguments, which you can read all about here. Today, we'll focus on some of the key arguments in the function. The very first argument is the dataset you want to perform the function on, in this case Facebook. You could run the function with only this argument if you wanted, and this might be fine if you have a small dataset and want to take a quick look.

Next, define what kind of file you want the function to output, "pdf", "word", or "html". The default, NULL, produces one kind of output based on three checks: 1) Is there a LaTeX installation available? If yes, PDF. If no: 2) Does the computer have Windows? If yes, Word. If no:, 3) HTML if the first two checks produces no's. After that is render, meaning R will render whatever output file you select, and save that file. If you plan on making changes to the R Markdown file after it is created, you want to set this to FALSE. The .Rmd file will automatically open, for you to edit, and will be saved in your working directory.

You can also specify file name, and/or a volume number (for updated reports) with vol=#. If you've already generated a report and try to create a new one without specifying a new file name, you'll get an error; you can override that and force dataMaid to save over the existing file with replace=TRUE.

Putting some of these arguments together will generate a report for that includes all variables in the dataset, with default summaries and visualization.

makeDataReport(Facebook, output="html", render=FALSE, file="report_Facebook.Rmd")

You can take a look at the example report I generated here. In fact, by running this report, I noticed some problems with the simulated dataset. Instead of using R to generate it for me, I used Winsteps (the Rasch program I use). I didn't notice until now that some of the items have values out of range for the rating scale used, such as Rumination item scores greater than 3. Thanks to this report, I identified (and fixed) these problems, then updated the simulated dataset available here. (This the same link used in previous posts; regardless of where you're clicking from, this will take you to the updated file.) I also noticed that I miscoded gender when creating that factor, and switched Male and Female. So I can go back up to that code and fix that as well.

But a report with every single variable might be more than I actually I want. I may want to specify only certain variables, such as those that appear problematic based on the built-in cleaning analysis dataMaid does. I can set that with onlyProblematic=TRUE:

makeDataReport(Facebook, output="html", render=FALSE, file="report_Facebook_sum.Rmd",
               onlyProblematic = TRUE)

You can check out that report here.

I always used "render=FALSE" so I could make changes to the report before knitting it to an HTML document. Since this opens an R Markdown file, I could add whatever information I wanted to the report, just by clicking to that point in the document and adding the information I want. For instance, I could add a note on the reverse function I used to reverse-code variables. I could include code for how I defined the gender factor - just to make certain I don't do it wrong again! You can add code to only display (rather than run) by putting ' on either side of it. I generated one last data report, using the cleaned data, with gender factor correctly coded, and added some codebook details. You can take a look at that document here.