Accuracy and Privacy

Regular updates about the Census and its plans for data publication in 2020

April 1, 2020, Census Day. It seems far away, and eclipsed by other large, political events taking place the same year. But decisions are being made now about how the data collected from the census will be published.

From apportionment and legislative redistricting, to urban planning and economic development, to social science research and journalism, census data is a basemap of public life. But the way these data are released is going to change in 2020 as the Census Bureau adopts a new “formal privacy” framework to protect respondents' confidentiality.

I recently published a story in the New York Times about these plans, but reserved the technical details for a different forum. Hence this weekly (fingers crossed) email.

The Census Bureau is required by Title 13 of the U.S. Code to keep our data private. They are in the business of publishing aggregate statistics and are not allowed to release data that identifies individuals. The bureau takes various steps to comply with this mandate, but there are new concerns that the larger data ecosystem we now live in makes the bureau’s previous “disclosure limitation” procedures ineffective.

For me, regular writing is new. I am a statistician by training and only masquerading as a journalist. But I feel qualified to report on these issues — I first experienced the sometimes scratchy interface between pristine mathematics and the practical requirements of their implementation because of the census.

While I was still a graduate student, I helped a team of my faculty working as expert witnesses on a lawsuit filed against the Commerce Department about the bureau’s plans to adjust the 1990 count. I was approached to help out with the computation they needed to evaluate statistical adjustment procedures developed by the bureau. (The final report was submitted as evidence, prompting lawyers to talk about logspline density estimation!)

It’s hard to admit, but at that stage in my training, I had never seen a statistical application with any consequences. Nor had I seen significant disagreement between academics. In this situation, it was my faculty one side and people whose names were assigned to important estimators and statistical methods on the other. I was young and had never experienced anything like this before.

But the same kinds of concerns are playing out now, with the mathematical guarantees of a formal privacy mechanism trading off against the concerns of end users of census numbers. The debate is often technical and I will try to explain the issues as best I can. There is a lot at stake and I have been told by many that the privacy issues related to census data — and the ways the bureau will resolve them — will, no doubt, again end up in court.

I hope someone finds this useful. At very least it will help me organize my thoughts.