Accuracy and Privacy by Mark Hansen
April 1, 2020, Census Day. It seems far away, and eclipsed by other large, political events taking place the same year. But decisions are being made now about how the data collected from the census will be published.
From apportionment and legislative redistricting, to urban planning and economic development, to journalism and social science research, census data is a basemap of public life. But the way these data are released is going to change in 2020 as the Census Bureau adopts a new “formal privacy” framework to protect respondents' confidentiality.
I recently published a story in the New York Times about these plans, but reserved the technical details for a different forum. Hence this weekly (fingers crossed) email.
The Census Bureau is required by Title 13 of the U.S. Code to keep our data private. They are in the business of publishing aggregate statistics and are not allowed to release data that identifies individuals. The bureau takes various steps to comply with this mandate, but there are new concerns that the larger data ecosystem we now live in makes the bureau’s previous “disclosure limitation” procedures ineffective.
For me, regular writing is new. I am a statistician by training and only masquerading as a journalist. But I first experienced the sometimes scratchy interface between statistical procedures and the practical requirements of the census when I helped a team working as experts on a lawsuit filed against the Commerce Department about the bureau’s plans to adjust the 1990 count. I was lowly graduate student, but faculty in my program approached me to help out with the computation they needed to evaluate statistical procedures developed by the bureau.
It’s hard to admit, but at that stage in my training, I had never seen statistics with any consequences. Nor had I seen significant disagreement between academics. In this situation, it was my faculty one side and people whose names were assigned to important estimators and statistical methods on the other. I was young and had never experienced this before.
But the same kinds of concerns are playing out now, with the mathematical guarantees of a formal privacy mechanism trading off against the concerns of end users of census numbers. The debate is often technical and I will try to explain the issues as best I can.
I hope someone finds this useful. At very least it will help me organize my thoughts.