Mining a Demographic Mother LodeMine census data to enhance your organizations's marketing programs and customer relationshipsBy Seth Grimes The first detailed results from the U.S. Census 2000 - a demographic snapshot of the American people - hit the streets back in March 2001, with increasingly informative data sets slated for release through 2003 and beyond. You won't find a better statistical picture of the country. Study the New Economy all you want, but you'll make the best sense of your findings in light of the baseline provided by census data. Census summary (aggregated) data sets are yours to use as you see fit - for state and local government, marketing, or academic research - to better understand your neighbors, customers, and business opportunities. They're available on CD and for download on the U.S. Census Bureau (USCB) Web site. You can query them and draw thematic maps at the USCB's American FactFinder Web site or obtain value-added analytic and geospatial-display packages from commercial vendors. The first steps, however, are to learn what data is available and understand how you can (and can't) use it: the subject of this column. What You GetApportionment of congressional seats among the states is the raison d'etre of the U.S. census, and the states use the most important data set to draw congressional district boundaries. Sets of redistricting data files, one set for each state plus the District of Columbia and Puerto Rico, were released in February and March 2001. This data focuses on total and voting-age population counts. Given voting-rights sensitivity, it breaks out tallies by membership in one or more of six major racial categories and Hispanic or Latino origin. Data sets slated for release starting in July will cover housing and family, as well as population. They will constitute an in-depth demographics analysis with, for instance, detailed age categories and more than 250 iterations of race, ancestry, ethnicity, and much more. Data drawn from the detailed questionnaire sent to a sample of one in six households will be weighted to extend their validity to the full population and analyzed in data sets slated for release in 2002 and 2003. Because the survey universe for the sample data products is smaller and the results are more detailed than for the 100 percent coverage data products, to avoid disclosing information about individuals, the sample products will be computed only to census tract rather than census block level. What You Don't GetU.S. law backs its requirement that each U.S. household file a census form with guaranteed confidentiality: Untreated responses from individual forms may not be released until 72 years after the census. These rules, U.S. Titles 13 and 44, clearly says that legislative representation is more important than gleaning evidence of illegal immigration or criminal activity from household forms. So what you don't get first and foremost is access to unsummarized microdata. The USCB does accommodate academic and commercial researchers by creating "public use microdata samples," large-geographic-area subsets of the raw census form responses, with information that can identify individuals removed. Other steps are taken to prevent disclosure of individual information in the aggregated macro data. They include thresholding (the USCB suppresses certain results if the population of a geographic area is less than 100) and rounding of aggregate income values if too few individuals contribute to them. Lastly, the USCB alters raw data through techniques such as swapping: exchanging selected records between geographic areas in ways that should not alter the statistical characteristics of results. These techniques protect against dangers like complementary disclosure, where manipulating published values can reveal protected, unpublished values. Disclosure-avoidance alterations should make little difference relative to survey errors, however. The USCB computed summary results both with and without statistical adjustments designed to counteract the effects of undercounting and double counting. The USCB conducted an exhaustive control survey of 314,000 households from which it estimated a net undercount of 1.18 percent, or about 3.3 million people. The USCB ultimately determined that efforts to improve the accuracy of full census results through reference to the control survey could introduce its own errors. Given the current balance of power in Washington, D.C. - the consensus understanding is that correcting the miscount would add a high proportion of minority voters who have not historically voted Republican - the government decided not to release the adjusted figures.
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
| |||||||||||||||||||||||||||||||





















