Wednesday, February 28, 2018

Statistical Sins: Sensitive Items

Part of my job includes developing and fielding surveys, which we use to gather data that informs our exam efforts and even content. Survey design was a big part of my graduate and postdoctoral training, and survey is a frequently used methodology in many research institutions. Which is why it is so disheartening to watch the slow implosion of the Census Bureau under the Trump administration.

Now, the Bureau is talk about adding an item about citizenship to the Census - that is, an item asking a person whether they are a legal citizen of the US - which the former director calls "a tremendous risk."

You can say that again.

The explanation makes it at least sound like it is being suggested with good intentions:
In December, the Department of Justice sent a letter to the Census Bureau asking that it reinstate a question on citizenship to the 2020 census. “This data is critical to the Department’s enforcement of Section 2 of the Voting Rights Act and its important protections against racial discrimination in voting,” the department said in a letter. “To fully enforce those requirements, the Department needs a reliable calculation of the citizen voting-age population in localities where voting rights violations are alleged or suspected.”
But regardless of the reasoning behind it, this item is a bad idea. In surveys, this item is what we'd call a sensitive item - an item that relates to a behavior that is illegal or taboo. Some other examples would include questions about behaviors like drug use, abortion, or masturbation. People are less likely to answer these questions honestly, because of fear of legal action or stigma.

Obviously, we have data on some of these sensitive issues. How do we get it? There are some important controls that help:
  • Ensure that data collected is anonymous - that is, the person collecting the data (and anyone accessing the data) doesn't know who it comes from
  • If complete anonymity isn't possible, confidentiality is the next best thing - unable to be linked back to respondent by anyone not on the study team, with personal data stored separately from responses
  • If the topic relates to illegal activity, additional protections (a Certificate of Confidentiality) may be necessary to prevent the data collection team from being forced to divulge information by subpoena 
  • Data collected through forms rather than an interview with a person might also lead to more honest responding, because there's less embarrassment writing something than saying it out loud; but remember, overall response rate drops with paper or online forms
The Census is confidential, not anonymous. Data is collected in person, by an interviewer, and personally identifiable data is collected, though extracted when data are processed. And yes, there are rules and regulations about who has access to that data. Even if those protections are held and people who share that they are not legal citizens have no need to fear legal action, the issue really has to do with perception, and how that perception will impact the validity of the data collected. 

When people are asked to share sensitive details that they don't want to share for whatever reason, they'll do one of two things: 1) refuse to answer the question completely or 2) lie. Either way, you end up with junk data. 

I'll be honest - I don't think the stated good intentions are the real reason for this item. We may disagree on how to handle people who are in this country illegally, but I think the issue we need to focus on here is that, methodologically, this items doesn't make sense and is going to fail. But because of the source and government seal, the data are going to be perceived as reliable, with the full weight of the federal government behind them. That's problematic. Census data influences policies, funding decisions, and distribution of other resources. If we cannot guarantee the reliability and validity of that data, we should not be collecting it.

No comments:

Post a Comment