Deeply Trivial: October 2017

Tuesday, October 31, 2017

Jack O'Lantern Sun

In honor of Halloween, NASA released this picture of the sun - which bears a striking resemblance to a Jack O'Lantern - from October 2014:

Happy Halloween, everyone!

Happy Halloween!

It's my favorite holiday! My costume today is a pun that only makes sense if you know my maiden name. How will you be celebrating?

In honor of the day, Google's Doodle is a cute cartoon - enjoy:

Monday, October 30, 2017

While taking a break from work yesterday, I created a word tracking template for NaNoWriMo. It's a really simple Excel template, enhanced with some basic logic statements. This lets me make goals for each day during the challenge (I made different goals depending on what's going on in my life on a particular day) and make sure I'm on track, and even gives me words of encouragement for different word count milestones.

In case this template sounds useful to you, I've added it as a shared file in my Dropbox. Find it here. The document includes the template plus a second sheet explaining how everything works. Enjoy!

Sunday, October 29, 2017

Statistics Sunday: Likelihood versus Probability

I recently finished reading Inference and Disputed Authorship: The Federalist by Frederick Mosteller and David L. Wallace. This book, which details the study Mosteller and Wallace did to determine (statistically) who authored the disputed Federalist papers, was highly recommended as a good primer on not only authorship studies but Bayesian inference.

As is often the case, this book frequently used a term I see in many statistical texts: likelihood. I breezed over this word multiple times (as I usually do) before I finally stopped and really considered it. (Which I do now, usually before saying, "Ooo, would this make a good blog post?")

Likelihood is a term used often in statistics, but I realized I wasn't completely clear on this concept, or rather, how it differed from related concepts, like odds or probability. If it meant the same thing as, say, probability, why use this other term? And why use it in seemingly specific ways?

It turns out that these terms are, in fact, different from each other, but they reflect related concepts. As with many statistics concepts (like degrees of freedom - see posts here and here), there are simple ways to describe these concepts, and more difficult ways.

First, the simple way. Probability deals with the chance that something will happen. That is, it is generated beforehand. When I flip a coin, I know the probability of heads is 50%. And I can use what I know about probability to determine chance of a certain string of events. Likelihood deals with something that already happened, and gets at the inference for the thing that happened. So if I flip a coin 20 times and get heads each time, you might want to discuss the likelihood that I'm using a fair coin. (Maybe I'm using a double-headed coin, for instance.)

Now, the more complex way. Likelihood is specifically related to our use of data to derive underlying truths. Remember that when we conduct statistical analyses, we're often using sample data to estimate underlying population values (parameters). We may never know what those parameters actually are, because we can't measure everyone in the population, or because of measurement error, or any number of explanations. We can only estimate those values.

We know that sample data can be biased in a number of ways, and we have different corrections to help us turn those sample values (statistics) into estimated population values (parameters). We want to make sure that we estimate population values that make sense with our sample data. We're never going to get sample data that exactly matches population values, so there will be margins of error, but we want our sample data to have a high chance of occurring given our estimated population value is correct. This concept - the chance of observing our sample data given the estimated population value is correct - is likelihood.

In the coming weeks, check back for a post discussing an application of this concept of likelihood: maximum likelihood estimation!

Saturday, October 28, 2017

Working This Weekend

We're on a tight schedule with a project at work, so I'm putting in lots of hours this weekend. (Don't worry, I'll make up for it by taking comp time next week). I've been using SPSS to create datasets for psychometric analysis and, though I told myself initially this probably wasn't necessary, I forced myself to create a syntax file for everything I do to the original data. And as I realized I missed an important step and had to start over, I was thankful I'd done that.

So, for some stats humor, I've created the following chart as a reminder to myself why I (and you, if you're doing something similar) should take that extra to create a syntax file:

NaNoWriMo Resources

Next Wednesday, I'll begin writing my novel for National Novel Writing Month. It's still not too late to join in on this crazy fun! If you're on the fence about doing NaNoWriMo, or you're new to this whole process, here are some of the resources I'll be drawing upon this November:

Baby name websites, like this one - Yes, really. Super helpful when figuring out names for characters. Sure, sometimes I'll create a character and just know instantly what to name them, but since there are certain letters I like more than others, this can lead me to name multiple characters with similar names. These lists help pull me out of the letter rut and are great for throwaway characters that still need a name. And here's another site for surnames.
Color thesaurus, like this one - For when you want a better word than just "blue"
Personality descriptions or free tests, like this one - I've started collecting information about different personality types. At first, I focused on ENFJ (my type) as a method of self understanding*, but I've started branching out and reading about other types, because I realized it could help my writing. Some authors even go so far as to take a personality test as their character. I'm not sure I'll do that, and if I do, it will probably only be for one or two main characters. But whatever helps you understand your character will enhance your writing and advance your plot. If this is something that interests you, here's an article on using the MBTI to create characters.
Articles about writing believable antagonists, such as this one - Sure, you could have the crazy "tie them to the railroad tracks with a death ray pointed at them" type, but in some of the best writing, the antagonist is someone the reader can understand, maybe even identify with. For my backup NaNoWriMo idea, I really struggled with how to explain the villain and give her an understandable motivation for her actions. If I ultimately decide to go with that idea, I'm still going to need a little help/work to create that character.
Sites to help with... you know... when the word you want is right there but you... Here's a great site for when that word you want is on the tip of your tongue - This site is freaking brilliant.
Spotify - Lots of writers create writing soundtracks. I didn't always do this, but recently, I realize when listening to music that a song reminds me of a character or sets a scene I'm visualizing. So I've started saving those songs to a playlist for later consumption. It's a great short-hand for getting you back into the mindset when you first thought about the character or scene, which is often where your best ideas come from.
A calendar to note word count goals - The recommendation is to shoot for 1667 words per day, which gets you to 50,000 by the end of November. But some days will be better than others. Though I don't always know in advance how my mood will affect word count, I do know in advance what days might be better for writing than others. Weekends will generally be better than weekdays, days I'm traveling (and have downtime on the plane) will generally be better than days I need to run errands, and so on. So I can go through the month and set different word count goals for each day. Then I won't beat myself up for not hitting 1667 on a day I knew in advance would be busy. Relatedly, here's an article on writing a novel while still living a full-time life.

And yes, a book written in a month is likely going to be terrible, but (and this could be inspiring or intimidating) Kazuo Ishiguro, who was recently awarded the Nobel Prize in Literature, wrote The Remains of the Day in just 4 weeks. So, you know, no pressure.

You can find many of the online resources I'll be using on my NaNoWriMo Pinterest board - note that you don't have to be on Pinterest to view this board and its links, but if you are on Pinterest, I highly recommend creating a board for your book, where you can post articles you find helpful (whether about writing or related to the topic of your book), pictures of people your characters look like (great for describing appearance), and anything else you find inspiring.

BTW, if you're participating in NaNoWriMo, feel free to add me as a writing buddy!

*I took the Myers Briggs in college and got ENTJ. I forgot about this result (found it again, very recently) and over the years, began thinking the Myers Briggs was BS. I took it again recently and got ENFJ. Since then I've been 1) shocked/amazed at how accurate this type is for me, and 2) really shocked/amazed (when I found my test results from freshmen year) that I have such a consistent personality over the span of 17 years.

Friday, October 27, 2017

This is How Rumors Get Started

After seeing yet another email forward conspiracy theory from his parents, a friend of mine simply started responding with a link to Snopes. Every. Time. And I've certainly encouraged my relatives to do some research on a fact-checker site like Snopes before passing something on.

The thing is, we're all guilty of believing stories with no basis in reality, and recent efforts by social media sites like Facebook and Twitter have done little to stave off misinformation. In fact, there's every reason to believe that Facebook and Twitter make the issue worse:

For all the suspicions about social media companies’ motives and ethics, it is the interaction of the technology with our common, often subconscious psychological biases that makes so many of us vulnerable to misinformation, and this has largely escaped notice.

Skepticism of online “news” serves as a decent filter much of the time, but our innate biases allow it to be bypassed, researchers have found — especially when presented with the right kind of algorithmically selected “meme.”

At a time when political misinformation is in ready supply, and in demand, “Facebook, Google, and Twitter function as a distribution mechanism, a platform for circulating false information and helping find receptive audiences,” said Brendan Nyhan, a professor of government at Dartmouth College (and occasional contributor to The Times’s Upshot column).

Digital social networks are “dangerously effective at identifying memes that are well adapted to surviving, and these also tend to be the rumors and conspiracy theories that are hardest to correct,” Dr. Nyhan said.

One reason is the raw pace of digital information sharing, he said: “The networks make information run so fast that it outruns fact-checkers’ ability to check it. Misinformation spreads widely before it can be downgraded in the algorithms.”

This is especially problematic when you consider recent survey research finding that two-thirds of Americans get at least some news from social media.

Thursday, October 26, 2017

Stuff to Read Later

Here are my currently open tabs, that I'm hoping to read at some point today:

Actress and writer Brit Marling talks about why gender inequality feeds into rape culture and complicates the issue of consent
Kristen Kieffer at Well-Storied.com offers some advice on rocking NaNoWriMo this year, including a great tip - have a back-up story in case you run out of steam on your chosen project; I'm way ahead of you Kristen
Google engineer, Felix Krause, shows that iPhone apps with camera permission can surreptitiously record or photograph you, with either front or back camera
In India, a near-complete ichthyosaur (a marine reptile) was found
The Forté Handbell Quartet plays an incredible handbell rendition of the Hallelujah chorus; considering all the running involved, we'll forgive them for a slightly over-the-top ritardando near the end
And finally, just for fun, Deschutes offers beer pairings for your favorite Halloween candy

Wednesday, October 25, 2017

Statistical Sins: Throw Some Psychometrics At It

Around the time I started learning about social media data mining approaches, I began subscribing to a newsletter from a social media analysis group called Simply Measured. I've shared some of these resources with friends who oversee social media for different nonprofits, and I occasionally do learn something from these newsletters. I was excited when I saw the top story for their recent newsletter was using psychometrics in marketing.

And then confused when I read the one-line summary of the blog post: Learn how psychological motivations can help you build compelling content and drive more sales.

Wait, do they think psychometrics means "measuring something psychological"? I decided to read the post. Here's how they define the term:

The term refers to ways of measuring an individual’s motivations, preferences, and interests without directly asking, adding a level of authenticity to your data as we observe our audience’s natural behaviors. For this reason, psychometrics is often more accurate than direct questioning.

Yeah, not exactly. You could certain directly question people about something, then use psychometric analysis approaches to identify the best questions. Ways of measuring something refers to the type of test or measure, which could be paper/pencil, observation, etc.

Later on they say, "Psychometrics is an old method of gathering data." Again, no. Psychometrics is not a data gathering method at all. A survey would be a data gathering method, and you could use psychometrics to create and analyze your survey data.

Psychometrics is basically a set of methods and statistical analyses that are used to develop, validate, and standardize measures. Those measures could be about "motivations, preferences, and interests," but they could also be about ability, personality, presence of symptoms of a physical health condition, and so on. It could be about anything you want to measure, not just about individual people but even about organizations or teams. And psychometric analyses provide tools to assess the validity (measures what it's supposed to) and reliability (measures something consistently) of your measure. Many of these tools are general statistics, like correlation coefficients, that are simply being applied to psychometric research, but other tools, like item difficulty, internal reliability, and item fit, were specifically developed for psychometric applications.

But the shark-jumping moment of the post was when they started talking about using psychometrics for online quizzes - those quizzes that promise to tell you "what puppy you should adopt based on the color of your aura" or "which Taylor Swift song you embody when you're depressed" and so on. I mean, I love dumb quizzes as much as the next guy, but I don't ascribe any validity to these measures. And if someone told me they used psychometrics analyses to develop one of these quizzes, I would be very confused.

I mean, sure, I guess you could do that. It wouldn't really work, but hey, give it a shot. You see, part of developing a strong measure is establishing content validity. As I mentioned in a previous post on content validation studies, there are different ways to do that, but you need some method of showing the items you have on your measure really do relate to the underlying concept. And I'm going to go out on a limb and say there is no gold standard for assessing your personal Taylor Swift song. Maybe if my content validation study panel consisted of the various Taylor Swifts from the "Look What You Made Me Do" video:

The whole post felt like the author wasn't sure what to write about, found a word on a webpage somewhere, and decided to write an entire post on what he thought that thing was without doing any kind of research into what the thing actually is. Yes, you can absolutely use psychometrics in marketing research. In fact, it would be great if people could demonstrate some validity of the measures they use in marketing research. But you won't learn how to do that from that Simply Measured post. In fact, you'll walk away from the post with a flat-out incorrect conception of what psychometricians actually do.

I'm fully aware that in writing this blog post, I'm basically this guy:

Simply Measured, you done me and my field wrong.

Tuesday, October 24, 2017

Want to Write a Bad Novel with Me?

November is 8 days away! What? When did this happen?

I know, hard to believe. And I'm super excited because I can't wait to start on my novel for NaNoWriMo. In fact, I keep oscillating between two different story ideas, so I'm tempted to sit down in front of my computer November 1st and see what I start typing. Will it be my story of an introverted hero who turns the comic book genre on its head? Or the story of a woman who loses her job, moves back in with her parents, and sees where her childhood friends have ended up (for better or worse)? Or maybe something else entirely...

So I was thrilled when a non-writing newsletter I subscribe to included a link to this blog post by Beth Skwarecki on how she binge writes bad novels every November. And she urges you to do the same - because writing a book in 30 days means you have to adopt some rules: just keep writing, don't edit and don't look back:

50,000 words in a month is 1,667 per day. At an average typing speed, you could finish your day’s quota in an hour. But that assumes you already know what you’re going to write.

After agonizing over the first chapter for days, I realized that I could not make each chapter perfect the first time. I started writing chapters quickly, badly, with notes about what to fix when I had the time. Pretty soon the book was flying along, and I even had time before deadline to go back and edit it to perfection. If I had insisted on polishing each chapter as I wrote it, I never would have finished.

Rewriting might make your first chapter better, but it will not get you any closer to your goal of actually finishing a draft of a novel. As NaNoWriMo founder Chris Baty says, revisiting what you’ve already written is like turning around in the middle of a marathon to try to run the earlier miles better.

So when anybody would ask if a novel written in a month-long haze of caffeine could possibly be any good, I gleefully answered that of course my novel will be terrible!

And that’s why you write another draft. But don’t worry about that now. Edits are for December.

And if you want to ~~procrastinate during your writing~~ learn some tricks of the trade, Beth recommends this blog.

I'll probably be carrying my laptop or at least a notebook with me most of the time in November. I have my morning and evening commute, during which I can write, and I've found some great coffee shops and bars near my job that are quiet enough to hang out and write in during evenings after work.

I'm planning on adding a widget to this blog to show where I am in terms of word count.

Burger King Stands Up to Bullies

I think most of us know what it's like to be bullied at some point in our lives. Sometimes the bullying stops when you reach adulthood, and sometimes it doesn't. That's why this ad from Burger King (yes, really!) is something we all should see:

I'm not usually a fan of the hidden camera approach to anything. Not because I don't think the messages are important; it's because, as a social psychologist, I pretty much know what's going to happen. Everyone will notice the person in trouble, but few will do anything to help. As some of the kids say in the video, standing up to bullies puts them in danger of being a target of the bully themselves. If everyone stood up to bullies, they would quickly be outnumbered. Never be afraid to stand up for someone else.

Monday, October 23, 2017

Statistics Sunday: How Does the Grant Review Process Work?

While I've posted my Statistical Sins posts late a few times, this is the first time I missed Statistics Sunday. Unfortunately, once NaNoWriMo starts up, this will become more likely, so I'm hoping to sit down in the coming days and write a few Statistics Sunday posts ahead of time. There's still time for suggestions if there are any topics you want to see here!

Today, I thought I'd follow up on a post from the other day, about a proposal for a new grant review process in the National Science Foundation, a federal agency that supports scientific research in many areas. Though NSF is funded by taxpayer money, it is an independent agency, which is meant to shield it from political influence. And in an America where scientific issues, like vaccination and climate change, become more and more politicized, that division is very important.

I realized, as I was reading about this proposal, that people might not be familiar with how peer review occurs for grants. So I thought I'd quickly outline what that process looks like. I should add a disclaimer that the process I'm about to outline is how things are done for Veterans Affairs merit-based grant reviews, but I'm told the process is similar to what they do in the National Institutes of Health (NIH) (and I would imagine NSF as well).

First, within any institute awarding grants, there are multiple panels. These panels reflect different themes within that area of research. Experts in that area of research are invited to serve on these panels; this is not their full-time job, but a service they do for their profession. Usually, they are researchers themselves, but they might also be practitioners in the field. Each panel also has a program manager (or similar title) who oversees the review process, organizes meetings, assigns proposals, and so on.

When you submit a grant, you can often request which panel your proposal goes to. This information will be taken into account, but there is absolutely no guarantee your proposal will go to the panel you request. If there are too many conflicts of interest (people on the panel who have to recuse themselves when it comes to discussing your proposal), or the proposal doesn't seem to fit that panel, it would probably go somewhere else.

Once those decisions are made, the program manager will send the information on the proposals to the members of the panel. Panel members tell the program manager which proposals they cannot review (because of a conflict of interest). The program manager then assigns each proposal a primary, secondary, and tertiary reviewer.

These reviewers read the proposal ahead of time, and use a scoring template to assign ratings to each part of the proposal. They also note strengths and weakness of the proposal. Basically the job of these reviewers is to know the proposal very well. And here's why:

When the full panel meets, it is the job of those three reviewers to present the research to the rest of the panel. (Note: Anyone who has to recuse themselves wouldn't be involved in the discussion.) The primary reviewer will go first, being given the most time to summarize the proposal and then give his or her initial assessment. The secondary reviewer is then asked if he or she has anything to add to the primary reviewer's summary and assessment. The tertiary reviewer goes last, once again being asked for additional comments. The secondary reviewer gets less time than the primary reviewer, and the tertiary reviewer less time than the secondary.

Next, the full panel discusses the project, asking questions of the reviewers (such as, "Does the proposal talk about X?") and discussing the strengths and weaknesses of the proposal. While it seems the opinion of the three reviewers would have a lot of weight when it comes to decisions about whether to fund the proposal (spoiler alert: funding decisions aren't up to anyone on the panel), the rest of the panel can (and do) identify strengths and weaknesses (more likely) the reviewers missed. The reviewers are able to change their scores and the comments in their review as a result of the discussion.

The panel does not decide which proposals get funded. Let me say that again, for emphasis: The panel does not choose which proposals will receive funding. They have a good idea which proposals they think should be funded or not, but that's not their call. Their job is for each proposal to be scored. Decisions are made later - and by someone else - what the funding cut scores are. That is, based on the distribution of scores on the proposals, and how much money is available to award, a score is chosen and anything above that score is funded. Lots of very strong proposals will find themselves on the wrong side of the funding line. And there's nothing anyone on the panel can do about that.

My point is that, despite what people outside of the research and/or soft money settings think, the possibility of abuse of the system is small. If I wanted to guarantee my project is funded through influence, I'd have to 1) make sure my proposal makes it to the panel I want, 2) make sure at least one of the people I can influence is a reviewer (and also make sure there won't be any obvious conflicts of interest that anyone else on the panel can point out), 3) get my influenced reviewer to convince the other reviewers to give my proposal a good score, 4) get those reviewers to convince the panel that any flaws in my study aren't dealbreakers (and also make sure no one on the panel is aware that one of my reviewers should have recused him- or herself), and 5) get my proposal scored so that it will be above the funding line, which is not known until later.

Dude, it would just be easier to put my energy into writing a strong proposal.

And if the reviewers give your proposal a bad score to begin with, your proposal likely won't even be discussed in the meeting. The focus will be on those proposals that could be fundable, depending on where the cutoff is set. Program managers take notes during the meeting, and can usually share some information about what was discussed with the principal investigator of the proposal. This information is useful if the PI wants to revise their proposal and submit it at the next grant cycle. There are usually limits to how many times you can submit the same proposal. And if your proposal isn't discussed at the meeting (that is, it was "triaged"), you sometimes can't revise and resubmit. You can likely recycle pieces of the proposal, but basically have to submit as a brand new proposal.

There's a great deal of uncertainty when it comes to grant money. This is the main reason I left VA; as much as I loved the people, the mission, and the job itself, I didn't want to live my life from grant cycle to grant cycle. It was a very personal choice, and I have nothing but admiration and respect for my colleagues still fighting the good fight.

Friday, October 20, 2017

Reviewer 2 Could Be Anybody

We all dread getting reviewer 2 on our research articles and grant proposals. Now imagine a world where anyone with an ax to grind could be reviewer 2. Rand Paul wants us to live in that world:

Senate Republicans have launched a new attack on peer review by proposing changes to how the U.S. government funds basic research.

New legislation introduced this week by Senator Rand Paul (R–KY) would fundamentally alter how grant proposals are reviewed at every federal agency by adding public members with no expertise in the research being vetted. The bill (S.1973) would eliminate the current in-house watchdog office within the National Science Foundation (NSF) in Alexandria, Virginia, and replace it with an entity that would randomly examine proposals chosen for funding to make sure the research will “deliver value to the taxpayer.” The legislation also calls for all federal grant applications to be made public.

Paul’s proposed solution starts with adding two members who have no vested interest in the proposed research to every federal panel that reviews grant applications. One would be an “expert … in a field unrelated to the research” being proposed, according to the bill. Their presence, Paul explained, would add an independent voice capable of judging which fields are most worthy of funding. The second addition would be a “taxpayer advocate,” someone who Paul says can weigh the value of the research to society.

I mean, we can give him the benefit of the doubt, since we know the Republicans have no idea what happens when they put someone with zero experience into power.</sarcasm>

I've already commented on how incredibly dangerous it is to allow political parties to serve as gatekeepers to education and the media. Now imagine they also serve as gatekeepers of what scientific research is acceptable. And it easily opens the door to grant decisions that are entirely politically motivated, by giving this new office full veto power. That is, with a flick of a pen, it can overrule any decision made by this committee.

Are you scared yet?

On Victim Blaming, Trust, and the "Me Too" Movement

Yesterday, I finally participated in the "me too" movement, sharing mostly the aftermath of an event from my childhood. It's sad how many people in my life have shared their own "me too" story, and how many of them come from people experiencing a "me too" moment multiple times. In addition to sharing their stories, some have shared their thoughts on the movement in general, and whether (for example) using the same tag for both harassment and assault somehow lessens the experience of survivors. For instance, Kaitlyn Buss writes:

[S]exual harassment and sexual assault are very different things. Even with Harvey Weinstein’s reported abuses, most of the accounts describe uncomfortable advances that women were mostly able to reject.

Conflating harassment and assault insults those who have actually been sexually assaulted. It cheapens the trauma they’ve endured.

But I was a victim of repeated sexual assault as a 4-year-old. I’m someone who should feel empowered by the recent wave of attention. Instead, it feels empty.

Harassment involves words and innuendo. It’s uncomfortable and unfair, and can certainly affect career mobility — as Hollywood’s leading women have now decided to emphasize. But it can be rebutted. It typically doesn’t involve violence or physical force. It takes place on street corners, in offices, bars, movie studios and pretty much anywhere people interact.

Assault, on the other hand, is one of the most brutal experiences a person can endure — at any age and in any situation.

I'll admit, when I first saw the "me too" stories, I was initially frustrated for many of the same reasons Buss highlights. Not because I was disappointed that all these women hadn't experienced an assault - this is something I would never wish on anyone, even my worst enemy - but because I worried that wouldn't understand why my experience of a lewd comment or tasteless joke was so different. Why a street harasser might be an annoying inconvenience for some, but an event that triggers symptoms of years of undiagnosed PTSD for others.

At the same time, the point is not who has had the worse experience. It isn't a competition. What happened to me was horrible. But worse things have happened to thousands of others. Making people justify why their experience is worse than others is nothing more than thinly veiled victim blaming. It's forcing men and women to explain why their experience was most egregious, which usually translates to least preventable and therefore, not the victim's fault. But regardless of what the victim was wearing, consuming, saying, or doing, it isn't the victim's fault that another person took away their autonomy.

And others point out, sexual harassment is a part of rape culture. If we normalize that, it becomes more and more difficult to draw the line between innocuous and offensive. Instead of empowering women that they don't have to stand for that treatment, we're empowering abusers to keep pushing the line until it ultimately breaks.

In light of the movement, a post from 2014 is making the rounds again: Men Just Don't Trust Women. And This is a Problem by Damon Young. I highly encourage you to read the whole thing, but here's one section that really struck me:

The theme that women’s feelings aren’t really to be trusted by men drives (an estimated) 72.81% of the sitcoms we watch, 31.2% of the books we read, and 98.9% of the conversations men have with other men about the women in their lives. Basically, women are crazy, and we are not. Although many women seem to be very annoyed by it, it’s generally depicted as one of those cute and innocuous differences between the sexes.

And perhaps it would be, if it were limited to feelings about the dishes or taking out the garbage. But, this distrust can be pervasive, spreading to a general skepticism about the truthfulness of their own accounts of their own experiences. If women’s feelings aren’t really to be trusted, then naturally their recollections of certain things that have happened to them aren’t really to be trusted either.

This is part of the reason why it took an entire high school football team full of women for some of us to finally just consider that Bill Cosby might not be Cliff Huxtable. It’s how, despite hearing complaints about it from girlfriends, homegirls, cousins, wives, and classmates, so many of us refused to believe how serious street harassment can be until we saw it with our own eyes. It’s why we needed to see actual video evidence before believing the things women had been saying for years about R. Kelly.

If we want to stop the spread of rape culture... If we want to empower survivors to come forward... If we want to weed out the abusers in schools, and churches, and scout troops... We need to believe people's experiences are valid. We need to trust that their feelings are not overreactions.

Thursday, October 19, 2017

Know Your Friend Limits

In the early 1990s, anthropologist Robin Dunbar suggested that there was an upper limit to the number of people one can include in his or her social circle. That number, known as Dunbar's number, is typically about 150. This limit, he argues, is based on brain size. There's a limit to how many people we can keep on our mind.

In more recent years, Dunbar has taken into account strength of emotional connection, which helped him identify circles of friends, from those we are closest to emotionally (5, our best friends), the next layer (10, close friends), the next (35, people we encounter regularly but aren't especially close to), and the last layer, those we are least close to (100, acquaintances).

Today, a friend shared some recent research that was able to use mobile phone records to test these numbers in real-world settings. A summary of the research is available here. To conduct their analysis, they used records of about 6 billion calls involving 35 million people from an unnamed European country, and used the following to help isolate their sample:

To screen out business calls and casual calls, Dunbar and co include only individuals who make reciprocated calls and focus on individuals who call at least 100 other people. That screens out people who do not regularly use mobile phones to call social contacts.

That leaves some 27,000 people who call on average 130 other people. Each of these people make 3,500 calls per year, about 10 a day.

They used a computer algorithm to look at clusters of results, patterns of calling that would coincide with different levels closeness. The average size of the inner circle was 4.1, not far off from the 5 estimated above. In fact:

[T]he team says the average cumulative layer turns out to hold 4.1, 11.0, 29.8, and 128.9 users.

“These numbers are a little smaller than the conventional numbers for Dunbar layers, but within their natural range of variation,” they say. The numbers could be smaller because mobile phone data captures only a portion of a person’s total social interactions.

The team also finds some evidence of an extra layer among some people. “This could, for example, mean introverts and extroverts have a different number of layers of friends,” they suggest. But interestingly, extroverts, while having more friends, still have a similar number of layers.

So don't feel bad if you see someone who seems more popular or has more friends. Chances are, their friend circles are quite similar to yours.

Wednesday, October 18, 2017

Statistical Sins: Know Your Variables (A Confession)

We all have the potential to be a statistical sinner; I definitely have been on more than one occasion. This morning, I was thinking about a sin I committed about a year ago at Houghton Mifflin Harcourt. So this is a confessional post.

We were working on a large language survey, involving 8 tests, one of which was new. This is an individually-administered battery of tests, meaning a trained individual gives the test one-on-one to the examinee. Questions are read aloud and the examinee responds either verbally or in writing. Each test only has one set of questions, and is adaptive: the set of questions the examinee receives depends on their pattern of correct answers. If they get the first few questions right, they go on to harder questions, but if they get the first few wrong, they go back in the book to easier questions. The test ends when the examinee gets a certain number incorrect in a row or reaches the end of the book (whichever comes first).

When giving the test, the administrator won't always start at the beginning of the book. Those are the easiest questions, reserved for the youngest/lowest ability test-takers. Each test has recommended starting places, usually based on age, but the administrator is encouraged to use his or her knowledge of the examinee (these tests are often administered by school psychologists, who may have some idea of the examinee's ability) to determine a starting point.

We had one brand new test and needed to generate starting points, since we couldn't use starting points from a previous revision of the battery. We decided, since this new test was strongly related to another test, to generate recommended starting points based on their raw score on this other test. We knew we would need a regression-based technique, but otherwise, I was given complete control over this set of analyses.

After generating some scatterplots, I found the data followed a pretty standard growth curve, specifically a logistic growth curve:

So standard linear regression would not work, because of the curve. We would deal with this in regression by adding additional terms (squared, cubed, and so on) to address the curve.

But the data violated another assumption of regression, even polynomial regression: the variance was not equal (or even approximately equal) across the curve. There was substantially more variation in some parts of the curve than others. In statistical terms, we call this heteroscedasticity. I did some research and found a solution: quantile regression. It's a really cool technique that is pretty easy to pick up if you can understand regression. Essentially, quantile regression allows for different starting points (constants) and slopes depending on the percentile of the individual data point. You can set those percentiles at whatever value you would like. And quantile regression makes no assumptions about heteroscdasticity. I read some articles, learned how to do the analysis in R (using the quantreg package), and away I went.

I was so proud of myself.

We decided to use raw score instead of scale score for the starting points. These tests were developed with the Rasch measurement model, but the test administrator would only get approximate scale score from the tables in the book. Final scores, which are conversions of Rasch logits, are generated by a scoring program used after administering all tests. Since the administrator is obtaining raw scores as he or she goes (you have to know right away if a person responded correctly to determine what question to ask next), this would be readily available and most logical to administrators. I had my Winsteps output, which gave person ID, raw score, Rasch ability, and some other indicators (like standard error of measurement), for each person in our pilot sample. So I imported those outputs from the two tests, matched on ID, and ran my analysis.

I stress once again: I used the Winsteps person output to obtain my raw scores.

My data were a mess. There seemed to be no relationship between scores on the two tests. I went back a step, generating frequencies and correlations. I presented the results to the team and we talked about how this could have happened. Was there something wrong with the test? With the sample? Were we working with the wrong data?

I don't know who figured it out first, but it was not me. Someone asked, "Where did the raw scores come from?" And it hit me.

Winsteps generates raw scores based on the number of items a person answered correctly. Only the questions answered and no others. But for adaptive tests, we don't administer all questions. We only administer the set needed to determine a person's ability. We don't give them easy questions because they don't tell us much about ability. We know the person will get most, if not all, easy questions correct. So when the administrator generates raw scores, he or she adds in points for the easy questions not administered. Winsteps doesn't do that. It simply counts and adds.

There was no relationship between the two variables because I wasn't using the correct raw score variable. I had a column called raw score and just went on autopilot.

So I had a couple days of feeling super proud of myself for figuring out quantile regression... and at least that long feeling like an idiot for running the analysis without really digging into my data. The lack of relationship between the two tests should have been a dead giveaway that there was something wrong with my data. And problems with data are often caused by human error.

Monday, October 16, 2017

Preparing for NaNoWriMo

I'm preparing once again to write 50,000 words in November, as part of National Novel Writing Month (or NaNoWriMo). October is affectionately known as "Preptober" - it's the month where NaNoWriMos go through whatever planning they need to do to win (i.e., get 50,000 words).

As I've blogged before, I'm a plantser: I like having some freedom to see where the story takes me, but I need a roadmap or I get overwhelmed and don't finish. This makes me a sort of hybrid of the plot-driven writers, like Orson Scott Card, and the character-driven writers, like Stephen King.

Speaking of Stephen King, as part of Preptober, I've been reading advice on writing, and yesterday, just finished Stephen King's book on the topic:

It was nice to learn more about his approach, because last year, I really didn't think his approach was for me. I'd tried just sitting down and writing, seeing where I ended up, and that has worked reasonably well for me on short stories, but for something as long as a novel, I get blocked.

What Stephen King does is he comes up with a situation, which may also include where he thinks things will end up when he's finished writing. Then he comes up with the characters and develops them as they encounter the situation. And that's when he lets things just... unfold. The characters tell him where things go.

This may sound crazy to a non-writer: the characters tell him... They're creations of the author, so how could they tell the author what to write? It's a very strange thing when you're writing, and you create characters that take on a life of their own. I've had this experience several times when I was writing Killing Mr. Johnson, the book I worked on for last year's NaNoWriMo (a book I still need to finish). In fact, I was thinking about the story the other day, and trying to understand a character's motivation for doing something in the story. She reacted in a strange way to the chain of events, and as I was thinking about her, I realized why - or rather, she told me why. And it all made sense. I also have a couple of characters who are demanding more attention, so I plan on writing a few more scenes for them.

For this year's NaNoWriMo, I'll be working on a new idea. And I'm going to try taking the Stephen King approach, at least in part. I already know approximately how things are going to end up, and I've been working on developing the characters. In November, I'm going to try just sitting down and writing. We'll see how it goes.

Sunday, October 15, 2017

Statistics Sunday: These Are a Few of My Favorite Posts

You know how sit-coms would occasionally have a flashback episode? There would be some sort of situation where the main characters are stuck somewhere (like an elevator) and as they wait to get out, reminisce about different things that happened over the last season. You got to review the highlights, and the writers (and actors) got a break.

That's what today is: here are some of my favorite posts from the course of my statistics writing - posts people seemed to enjoy or that I had a lot of fun writing.

Statistical Sins: Handing and Understanding Criticism - I'm really enjoying blogging about the Fisher, Pearson, and Neyman feuds. In fact, in line with the new Edison vs. Westinghouse movie, The Current War, I'm thinking of writing my own dramatic account of these feuds. I mean, if they can make alternating current versus direct current seem exciting, just wait until you watch the scene where Neyman escapes Fisher's criticism because Fisher can't speak a word of French. I just need to figure out who Benedict Cumberbatch should play.

Statistics Sunday: Why Is It Called Data Science? - This post generated so much discussion. It's exactly what I've wanted to see from my blog posts: thoughtful discussion in response. Even vehement disagreement is awesome. My professor's heart grew three sizes that day.

Statistical Sins: Three Things We Love - And the greatest of these is still bacon.

Statistics Sunday: What Are Degrees of Freedom? (Part 2) - My favorite type of post to write; one where I learn something by going through the act of explaining it to others.

Statistical Sins: Women in Tech (Here It Goes Again) - In which I rip apart the Google memo, written by a guy who clearly doesn't remember (know anything about) the long history of women in programming and mathematics. Seriously, didn't he at least watch Hidden Figures?

Statistics Sunday: Everyone Loves a Log (Odds Ratio) - Which helped set the stage for a post about Rasch.

Statistics Sunday: No Really, What's Bayes' Got to Do With It? - When I first encountered Bayes' Theorem, I had some trouble wrapping my head around it. So I did the same thing as I did for degrees of freedom: I made myself sit down and write about it. And I finally understand it. Tversky and Kahneman would be so proud.

Statistics Sunday: Null and Alternative Hypotheses - Philosophy of science is one of my favorite topics to pontificate about. It's even more fun for me than debating semantics... and I love debating semantics.

Great Minds in Statistics: F.N. David versus the Patriarchy - Ooo, another movie idea. I very nearly called this post F.N. David versus the Mother F***ing Patriarchy, but decided against it.

That's all for today! This afternoon, I'll be performing Carmina Burana with the Chicago Philharmonic and my choir, the Apollo Chorus of Chicago. And heading out of town tomorrow.

Also, I'm horribly unoriginal: I did this once before. And of course, you can dig through my April 2017 Blogging A to Z Challenge, in which I wrote about the A to Z of statistics.

I'm working on some new statistics Sunday posts. What topics would you like to see here?

Friday, October 13, 2017

Statistical Sins: Hidden Trends and the Downfall of Sears

Without going into too much details, it's been a rough week, so I haven't really been blogging much. But today, I made myself sit down and start thinking about what statistical sin to talk about this week. There were many potential topics. In fact, the recent events in Las Vegas has resulted in a great deal of writing about various analyses, trying to determine whether or not shootings are preventable - specifically by assessing what impact various gun laws have had on the occurrence of these events. Obviously this is a difficult thing to study statistically because these types of shootings are still, in the relative sense, rare, and a classical statistics approach is unlikely to uncover many potential predictors with so little variance to partition. (A Bayesian approach would probably be better.) I may write more on this in the future, but I'll admit I don't have the bandwidth at the moment to deal with such a heavy subject.

So instead, I'll write about a topic my friend over at The Daily Parker also covers pretty frequently: the slow death of Sears. You see, Sears made some really great moves by examining the statistics, but also made some really bad moves by failing to look at the statistics and use the data-drive approaches that allowed its competitors to thrive.

The recession following World War I, combined with an increase in chain stores, threatened Sears's mail order business. It was General Robert Wood, who was incredibly knowledgeable about the U.S. Census and Statistical Abstract, that put Sears back on track by urging them to open brick-and-mortar stores. By the mid-20th century, Sears revenue accounted for 1 percent of U.S. GDP.

But then the market shifted again in the 1970s and 80s, and the decisions Sears made at this time paved the way for its downfall, at least according to Daniel Raff and Peter Temin. As Derek Thompson of CityLab summarizes their insightful essay:

Eager to become America’s largest brokerage, and perhaps even America’s largest community bank, Sears bought the real-estate company Coldwell Banker and the brokerage firm Dean Witter. It was a weird marriage. As the financial companies thrived nationally, their Sears locations suffered from the start. Buying car parts and then insuring them against future damage makes sense. But buying a four-speed washer-dryer and then celebrating with an in-store purchase of some junk bonds? No, that did not make sense.

But the problem with the Coldwell Banker and Dean Winter acquisitions wasn’t that they flopped. It was that their off-site locations didn’t flop—instead, their moderate success disguised the deterioration of Sears’s core business at a time when several competitors were starting to gain footholds in the market.

The largest competitor was Walmart, which not only offered cheap goods, it used data-driven approaches to ensure shelves were stocked with products most likely to sell, and that inventory was determined entirely by those figures. Sears was instead asking local managers to report trends back to headquarters.

As I commented almost exactly 6 years ago (what are the odds of that?), using "big data" to understand the customer is becoming the norm. Businesses unwilling to do this are not going to last.

Sears, sadly, is not going to survive. But Derek Thompson has some advice for Amazon, which he considers today's counterpart for yesterday's Sears.

First, retail is in a state of perpetual metamorphosis. People are constantly seeking more convenient ways of buying stuff, and they are surprisingly willing to embrace new modes of shopping. As a result, companies can’t rely on a strong Lindy Effect in retail, where past success predicts future earnings. They have to be in a state of constant learning.

Second, even large technological advantages for retailers are fleeting. Sears was the logistics king of the middle of the 20th century. But by the 1980s and 1990s, it was woefully behind the IT systems that made Walmart cheaper and more efficient. Today, Amazon now finds itself in a race with Walmart and smaller online-first retailers. Amazon shows few signs of technological complacency, but the company is still only in its early 20s; Sears was overtaken after it had been around for about a century.

Third, there is no strategic replacement for being obsessed with people and their behavior. Walmart didn’t overtake Sears merely because its technology was more sophisticated; it beat Sears because its technology allowed the company to respond more quickly to shifting consumer demands, even at a store-by-store level. When General Robert Wood made the determination to add brick-and-mortar stores to Sears’s mail-order business, his decision wasn’t driven by the pursuit of grandeur, but rather by an obsession with statistics that showed Americans migrating into cities and suburbs inside the next big shopping technology—cars.

Finally, adding more businesses is not the same as building a better business. When Sears added general merchandise to watches, it thrived. When it added cars and even mobile homes to its famous catalogue, it thrived. When it sold auto insurance along with its car parts, it thrived. But then it chased after the 1980s Wall Street boom by absorbing real-estate and brokerage firms. These acquisitions weren’t flops. Far worse, they were ostensibly successful mergers that took management’s eye off the bigger issue: Walmart was crushing Sears in its core business. Amazon should be wary of letting its expansive ambitions distract from its core mission—to serve busy shoppers with unrivaled choice, price, and delivery speed.

Monday, October 9, 2017

Complex Models and Control Files: From the Desk of a Psychometrician

We're getting ready to send out a job analysis survey, part of our content validation study. In the meantime, I'm working on preparing control files to analyze the data when we get it back. I won't be running the analysis for a couple weeks, but the model I'll be using is complex enough (in part because I added in some nerdy research questions to help determine best practices for these types of surveys), I decided to start thinking about it now.

I realize there's a lot of information to unpack in that first paragraph. Without going into too much detail, here's a bit of background. We analyze survey data using the Rasch model. This model assumes that an individual's response to an item is a function of his/her ability level and the difficulty level of the item itself. For this kind of measure, where we're asking people to rate items on a scale, we're not measuring ability; rather, we're measuring a trait - an individual's proclivity toward a job task. In this arrangement, items are not difficult/easy but more common/less common, or more important/less important, and so on. The analysis gives us probabilities that people at different ability (trait) levels will respond to an item in a certain way:

It's common for job analysis surveys to use multiple rating scales on the same set of items, such as having respondents go through and rate items on how frequently they perform them, and then go through again and rate how important it is to complete a task correctly. For this kind of model, we use a Rasch Facets model. A facet is something that affects responses to an item. Technically, any Rasch model is a facets model; in a basic Rasch model, there are two facets: respondents (and their ability/trait level) and items. When you're using multiple rating scales, scale is a facet.

And because I'm a nerd, I decided to add another facet: rating scale order. The reason we have people rate with one scale then go through and rate with the second (instead of seeing both at once) is so that people are less likely anchor responses on one scale to responses on another scale. That is, if I rate an item as very frequent, I might also view it as more important when viewing both scales than I would have had I used the scales in isolation. But I wonder if there still might be some anchoring effects. So I decided to counterbalance. Half of respondents will get one scale first, and the other half will get the other scale first. I can analyze this facet to see if it affected responses.

This means we have 4 facets, right? Items, respondents, rating scale, and order. Well, here's the more complex part. We have two different versions of the frequency scale: one for tasks that happen a lot (and assess daily frequency) and one for less common tasks (that assess weekly/monthly frequency). All items use the same importance scale. The two frequency scales have the same number of categories, but because we may need to collapse categories during the analysis phase, it's possible that we'll end up with two very different scales. So I need to factor in that, for one scale, half of items share one common response structure and the other half share the other common response structure, but for the other scale, all items share a common response structure.

I'm working on figuring out how to express that in the control file, which is a text file used by Rasch software to describe all the aspects of the model and analysis. It's similar to any syntax file for statistics software: there's a specific format needed for the program to read the file and run the specified analysis. I've spent the morning digging through help files and articles, and I think I'm getting closer to having a complete control file that should run the analysis I need.

The Voice of the CTA

Whenever I get into a speaking elevator, or follow the sound of "Track 8" to find my way to my Metra home, I wonder about the person behind the voice. Today, a friend shared a great video of Lee Crooks, the voice behind the Chicago Transit Authority:

Now I'll have a face to picture as I'm crowding onto a train to the tune of "Doors closing."

Sunday, October 8, 2017

Statistics Sunday: Why Is It Called 'Data Science'?

In one of the Facebook groups where I share my statistics posts, a user had an excellent question: "Why is it called data science? Isn't any science that uses empirical data 'data science'?"

I thought this was a really good point. Calling this one field data science implies that other scientific fields using data are not doing so scientifically or rigorously. And even data scientists recognize that there's a fair amount of 'art' involved in data science, because there isn't always a right way to do something - there are simply ways that are more justified than others. In fact, I just started working through this book on that very subject:

What I've learned digging into this field of data science, in the hopes of one day calling myself a data scientist, is that statistics is an integral part of the field. Further, data science is a team sport - it isn't necessary (and it may even be impossible) to be an expert in all the areas of data science: statistics, programming, and domain knowledge. As someone with expertise in statistics, I'm likely better off building additional knowledge in statistical analysis used in data science, like machine learning, and building up enough coding knowledge to be able to understand my data science collaborators with expertise in programming.

But that still doesn't answer our main question: why is it called data science? I think what it comes down to is that data science involves teaching (programming) computers to do things that once had to be done by a person. Statistics as a field has been around much longer than computers (and I mean the objects called computers, not the people who were once known as computers). In fact, statistics has been around even prior to mechanical calculators. Many statistical approaches didn't really need calculators or computers. It took a while, but you could still do it by hand. All that was needed was to know the math behind it. And that is how we teach computers - as long as we know the math behind it, we can teach a computer to do just about anything.

First, we were able to teach computers to do simple statistical analyses: descriptives and basic inferential statistics. A person can do this, of course; a computer can just do it faster. We kept building up new statistical approaches and teaching computers to do those analyses for us - complex linear models, structural equation models, psychometric approaches, and so on.

Then, we were able to teach computers to learn from relationships between words and phrases. Whereas before we needed a person to learn the "art" of naming things, we developed the math behind it and taught it to computers. Now we have approaches like machine learning, where you can feed in information to the computer (like names of paint shades or motivational slogans) and have the computer learn how to generate that material itself. Sure, the results of these undertakings are still hilarious and a long way away from replacing people, but as we continue to develop the math behind this approach, computers will get better.

Related to this concept (and big thanks to a reader for pointing this out) is the movement from working with structured data to unstructured data. Once again, we needed a person to enter/reformat data so we could work with it; that's not necessary anymore.

So we've moved from teaching computers to work with numbers to words (really any unstructured data). And now, we've also taught computers to work with images. Once again, you previously needed a person to go through pictures and tag them descriptively; today, a computer can do that. And as with machine learning, computers are only going to get better and more nuanced in their ability to work with images.

Once we know the math behind it, we can teach a computer to work with basically any kind of data. In fact, during the conference I attended, I learned about some places that are working with auditory data, to get computers to recognize (and even translate in real-time) human languages. These were all tasks that needed a human, because we didn't know how to teach the computers to do it for us. That's what data science is about. It still might not be a great name for the field, but I can understand where that name is coming from.

What are you thoughts on data science? Is there a better name we could use to describe it? And what do you think will be the next big achievement in data science?

Friday, October 6, 2017

Reading Challenges and Nobel Prizes

This year, I decided to double last year's reading challenge goal on Goodreads. I've challenged myself to read 48 books this year. I'm doing really well!

This morning, I started Never Let Me Go by Kazuo Ishiguro, which was highly recommended by a friend. Yesterday, that same friend let me know that Kazuo Ishiguro is being awarded the Nobel Prize in Literature:

Mr. Ishiguro, 62, is best known for his novels “The Remains of the Day,” about a butler serving an English lord in the years leading up to World War II, and “Never Let Me Go,” a melancholy dystopian love story set in a British boarding school. He has obsessively returned to the same themes in his work, including the fallibility of memory, mortality and the porous nature of time. His body of work stands out for his inventive subversion of literary genres, his acute sense of place and his masterly parsing of the British class system.

“If you mix Jane Austen and Franz Kafka then you have Kazuo Ishiguro in a nutshell, but you have to add a little bit of Marcel Proust into the mix,” said Sara Danius, the permanent secretary of the Swedish Academy.

At a news conference at his London publisher’s office on Thursday, Mr. Ishiguro was characteristically self-effacing, saying that the award was a genuine shock. “If I had even a suspicion, I would have washed my hair this morning,” he said.

He added that when he thinks of “all the great writers living at this time who haven’t won this prize, I feel slightly like an impostor.”

BTW, I just added a Goodreads widget to my blog to show what I'm currently reading.

Thursday, October 5, 2017

This is Pretty Grool

In the movie Mean Girls, Aaron (the love interest) asks Cady (the heroine) what day it is, and she responds October 3rd. Hence, October 3rd was dubbed "Mean Girls Day," and people celebrate by posting Mean Girls memes, watching the movie, and probably wearing pink.

This year, 4 members of the cast released this video, asking fans to help victims of the Las Vegas shooting. Here it is:

“On #October3rd, he asked me to help.” #MeanGirls

Please help the victims of the tragedy in Las Vegas at https://t.co/YMwEV1SDsL pic.twitter.com/OhXNSMvCYC

— Jonathan Bennett (@JonathanBennett) October 3, 2017

Wednesday, October 4, 2017

Statistical Sins: Stepwise Regression

This evening, I started wondering: what do other statisticians think are statistical sins? So I'm perusing message boards on a sleepless Tuesday night/Wednesday morning, and I've found one thing that pops up again and again: stepwise regression.

No stairway. Denied.

Why? Stepwise regression is an analysis process in which one adds or subtracts predictors in a regression equation based on whether they are significant or not. There are, then, two types of stepwise regression: forwards and backwards.

In either analysis, you would generally choose your predictors ahead of time. But then, there's nothing that says you can't include far more predictors than you should (that is, more than the data can support), or predictors that have no business being in a particular regression equation.

In forward stepwise regression, the program would select the variable among your identified predictors that is most highly related to the outcome variable. Then it adds the next most highly correlated predictor. It keeps doing this until additional predictors result in no significant improvement of the model (significant improvement being determine by change in R²).

In backward stepwise regression, the program includes all of your predictor variables, then begins removing variables with the smallest effect on the outcome variable. It stops when removing a variable results in a significant decrease in explained variance.

As you can probably guess, this analysis approach is rife with the potential of false positives and chance relationships. Many of the messages boards said, rightly, there is basically no situation where this approach is justified. It isn't even good exploratory data analysis; it's just lazy.

But is there a way this analysis technique could be salvaged? Possibly, if one took a page from the exploratory data analysis playbook and first plotted data, examined potential confounds and alternative explanations for relationships between variables, then made an informed choice about the variables to include in the analysis.

And, most importantly, the analyst should have a way of testing a stepwise regression procedure in another sample, to verify the findings. Let's be honest; to use a technique like this one, where you can add in any number of predictors, you should have a reasonably large sample size or else you should find a better statistic. Therefore, you could randomly split your sample into a development sample, where you determine best models, and a testing sample, where you confirm the models created through the development sample. This approach is often used in data science.

BTW, I've had some online conversations with people about the term data science and I've had the chance to really think about what it is and what it means. Look for more on that in my next Statistics Sunday post!

What do you think are the biggest statistical sins?

Tuesday, October 3, 2017

Free Tools for Meta-Analysis

My boss is attending a two-day course on meta-analysis, and shared these tools with me, available through Brown School of Health:

The Systematic Review Data Repository - as the name suggests, this is a repository of systematic review data, so you pull out data relevant to your own systematic review as well as contribute your own data for others to use. Systematic reviews are a lot of work, so a tool that lets you build off of the work of others can help systematic reviews be performed (and their findings disseminated and used to make data-driven decisions) much more quickly
Abstrackr - a free, open-source tool for the citation screening process. Conducting a systematic review or meta-analysis involves an exhaustive literature review, and those citations then have to be inspected to see if they qualify to be included in the study. It isn't unusual to review 100s of studies only to include a couple dozen (or fewer). This tool lets you upload abstracts, and invite reviewers to examine abstracts for inclusion. This tool is still in beta, but they're incorporating machine learning to automate some of the screening process in the future. Plus, they use "automagically" in the description, which is one of my favorite portmanteaus.
Open Meta-Analyst - another free, open-source tool for conducting meta-analysis. You can work with different types of data (binary, continuous, diagnostic), conduct fixed- or random-effects models, and even use different estimation methods, like maximum likelihood or Bayesian.
Open MEE - a free, open-source tool based on Open Meta-Analyst, with extra tools for ecological and evolutionary meta-analysis. This might be the tool to use in general, because it has the ability to conduct meta-regression with multiple covariates.

I think of all of these, I'm looking forward to trying out Abstrackr the most.

And of course, there are many great meta-analysis packages for R. I'm currently working on a methods article describing how to conduct a mini meta-analysis to inform a power analysis using R tools - something I did for my dissertation, but not something everyone knows how to do. (By working, I mean I have an outline and a few paragraphs written. But I'm hoping to have more time to dedicate to it in the near future. I'm toying with the idea of spending NaNoWriMo this year on scholarly pursuits, rather than a novel.)

BTW, if you like free stuff, check out these free data science and statistics resources (and let me know if you know of any not on the list).