About the Data

A barrier to public participation in redistricting is access to data needed to draw legal districts. These data include geographical, population, and election data.

Quick links to important information:

Geographical Data

The Census Bureau collects individuals' characteristics through the decennial census as required by Article 1, Section 2 of the U.S. Constitution and through other large-scale surveys such as the American Community Survey and the Current Population Survey. The Census Bureau protects individuals' confidentiality by tallying their responses within areas that are known as census geography.

The smallest unit of census geography is known as the census block. These census blocks are literally the building blocks for districts. A census block is analogous to a city block in an urban area. In rural areas, blocks also follow roads, but more often follow natural features such as rivers and lakes, which they also do in urban areas. Collections of blocks are known as block groups; collections of block groups and known as tracts, and tracts are nested within counties. As this implies, states work with the Census Bureau to identify their political boundaries -- their counties, cities, townships, legislative districts, and almost all also identify their precinct boundaries. Census blocks also follow these political boundaries. There are numerous other geographies that make up the census geography.

The Standard Hierarchy of Census Geography

Source: Census Bureau documentation.

Most census blocks are not squares. This should be obvious since rivers to not follow straight lines. Indeed, city boundaries in particular can be oddly shaped, resulting from annexation battles among county residents as cities encroach on their land. Some people prefer city services and some do not, so these battles often have a distinct ideological flavor and can be meaningful from a representation perspective.

A map of Springfield, Illinois illustrates these complexities. The census blocks are in green city's boundaries are in yellow. The census blocks generally follow roads, but are not regularly shaped since the roads themselves are not laid out on a regular grid. Lake Springfield is the blue extension at the bottom of the image, and it is not regularly shaped, either. Throughout, pockets of Sangamon County are interspersed within the city of Springfield.

Springfield, Illinois Census Blocks

Some advocate drawing districts to be squares or other compact shapes to combat gerrymandering. It should be obvious that the census geography is not amenable to drawing compact shapes since the lines are not nicely arrayed for such purposes. It is possible to split census blocks, but because districts must be of equal population, a supplementary census would need to be taken in the split blocks to confirm that district populations are equal -- an expensive and time-consuming task. This approach would also ignore the political character of communities, and the reasons why they do not have nice compact shapes. A further issue is that compact districts have predictable political effects, in that they often pack Democrats into urban districts, which is often a Republican gerrymandering strategy.

Population Data

The entity responsible for disseminating population data for redistricting is known as the Census Redistricting Data Office. The office releases counts of the population within census geography for redistricting purposes in what is known as the PL94-171 file -- so named after the public law that authorizes the data release.

The Census Bureau asks only a limited number of questions on the decennial census short form, and only some of these data are released in the PL94-171 file.

Total population counts are used to determine compliance with the federal requirement that each district has an equal number of persons.

These population data by age are:

The voting-age race and ethnicity categories are used to determine compliance with the federal Voting Rights Act.

There are six racial categories, and people may choose one or more of these categories:

There are two ethnicity categories.

Race and ethnicity are separate questions. A person can choose to identify as a White Non-Hispanic, a White Hispanic, and so on.

Starting in 2000, individuals were able to choose to identify themselves as one or more of six racial categories. This causes some complications for the Voting Rights Act, which requires in some cases that districts have greater than 50% African-American, 50% American Indian and Alaska Native, or 50% Hispanic populations. (At this time, Asian and Native Hawaiian/Pacific Islander are not protected racial classifications under the Voting Rights Act, although states may consider these populations as they draw districts.)

The Census Bureau reports counts for all permutations of the six racial categories, by ethnicity, in separate fields in the PL94-171 database. The Office of Management and Budget provides guidance on how to count people within these racial categories (see also this Department of Justice Federal Register Notice 66 Fed. Reg. 5412). For Voting Rights purposes, the African-American population is considered to be the sum of all the categories where individuals identify themselves as African-American in one or more combination with other races. Chapter 6 of the Census Bureau documentation lists all of these variables (Note the naming convention in the PL94-171 file is slightly different than in the documentation, the documentation has an extra "0").

For some states, we construct and disseminate our own redistricting databases that we use with the Public Mapping Project (listed below). The variables we use to construct total population and voting-age population are as follows.

American Community Survey Data

In the past, the Census Bureau conducted as part of the decennial census the long form, a survey given to 1 in 100 households. The long form had many other questions, such as income, education, and citizenship. This decade, the census long form has been replaced by the American Community Survey (ACS). The ACS is a monthly survey of thousands of households with similar questions as the old long form. This will be the first redistricting where these questions will be available during redistricting. Previously, the census long form data was released well after redistricting completed in many states.

The Census Bureau is releasing results from the ACS in yearly, 3-year, and 5-year increments. The 5-year increments have the largest number of respondents. Remember that the Census Bureau does not release individuals' information, they report summary statistics to protect individuals' confidentiality. The 5-year ACS thus has the lowest level of geographical reporting, at the census block group or tract level.

The ACS data are important for redistricting because they may be used to provide evidence to define a community of interest, where this is a state redistricting criterion. They may also be used to generate citizen voting-age population estimates by race and ethnicity, which may be used to determine compliance with the Voting Rights Act. The Department of Justice requested a special tabulation of citizen voting-age population data by race and ethnicity, and it is posted on the Census Redistricting Office's website.

There are a number of issues with the ACS data that limit its accuracy.

Election Data

Some states make available election data merged with census data, which enables one to assess the potential political consequences of proposed redistricting plans.

Election results are reported in precincts, wards, or election districts, or what the Census Bureau generically calls Voting-Tabulation Districts or VTDs. A correspondence between census geography and VTD boundaries is needed to merge election data with census data. The creation of a correspondence between VTDs and blocks can be accomplished by three general methods, described below.

Note that all disaggregation methods will almost always create fractions of people assigned to census blocks, for example a census block with 52.19 votes for Obama and 112.56 votes for McCain. This is an unavoidable consequence of apportioning people from the larger VTDs to the smaller census blocks. In this example, 52.19 represents a raw count of votes, not a percentage vote for Obama of 52.19%. If you do not understand why this is so after reading this section, you should not use these data.

Methods of Merging Election and Census Data

Method 1. Geo-spatial Join. Some states make available electronic maps of their VTDs. These electronic maps can be overlaid on electronic maps of census blocks, and a correspondence between VTDs and blocks can be created.

A difficulty with this method is that the precinct and census block boundaries often do not neatly correspond to one another. When this happens, a method of assigning a census block to a precinct must be devised, such as assigning the block to the precinct that it shares the largest area or the precinct that it shares the longest street segments.

Method 2. Geo-coding Voter Registration. Another method of creating a correspondence between VTDs and census blocks is to geo-code voter registration addresses. Registered voters are assigned to VTDs so that election administrators know which polling places registered voters should vote at. Geo-coding is a technique whereby address are assigned to census blocks using street ranges that the Census Bureau associates with each census block. These data are the same from which evolved the ability of popular on-line mapping programs to place an address pin drop on a street map. By geo-coding, each registered voters address -- and their VTD -- can be associated with a census block.

A difficulty with this method is that voter registration addresses are notoriously error-prone, they can have incorrect zip codes or street addresses. Sometimes newly built streets are not in the Census Bureau's records. Splits of geography can still occur. When a census block has more than one VTD associated with it, the VTD with the greatest proportion of registered voters located in the census block can be the one uniquely associated with the block.

Method 3. Using Census Geography VTD Identifiers. States transmit their VTD boundaries to the Census Bureau and these boundaries are incorporated into the census geography. Election results reported in precincts, wards, and election districts can then be associated with the VTDs in the census geography.

A difficulty with this method is that precincts, wards, and election district boundaries can change. When they become too large, they may be split into two new VTDs and when they become smaller -- or there is a low turnout election, such as a local election -- two or more VTDs may be consolidated into one VTD. The VTD boundaries are transmitted to the Census Bureau as part of the "Phase 2" program, two years before the census -- in 2008. Even by the time of the 2008 general election, splits and consolidations may occur. These changes can often be resolved by calling local election administrators. Still, these VTD boundary changes can proliferate in fast growing areas. Sometimes a change in polling locations forces a redistricting of the VTD boundaries, making a correspondence using Method 3 impossible. For this reason, the 2008 election is often the easiest election to correspond precincts, wards and election districts to the VTDs found in the census geography.

Mail, Provisional, and Other Special Ballots  

Some states and local jurisdictions report their mail, provisional and other special ballots within a voter's home precinct. Some report the election returns for these ballots in special at-large precincts for the entire jurisdiction. When this happens, it is desirable to apportion these ballots among the geographic precincts.

The most common method of apportioning election results from mail, provisional, and other special ballots reported in jurisdiction-wide precincts into geographic precincts is to apportion the votes by the proportion of the candidate vote within each precinct. For example, suppose a Republican candidate received 100 votes in two geographic precincts, 60 in precinct 1 and 40 in precinct 2. There is one at-large absentee precinct with 20 absentee votes, 10 for the Republican candidate. These absentee votes are then apportioned as follows: Precinct 1 receives 10*60/100 or 6 votes for a total of 66 votes for the Republican candidate and Precinct 2 receives 10*40/100 or 4 votes for a total of 44 vote for the Republican candidate. 

Disaggregation of Election Data to Census Blocks

VTDs are larger than census blocks, so multiple blocks are assigned to each VTD. Census blocks are what districts are built from, so election results must be disaggregated to census blocks for use in redistricting. There are multiple ways to do this, consistent with the three methods described above.

In most instances Method 2 is the most accurate, Method 3 is the next most accurate, and Method 1 is the least accurate method of disaggregation. Method 2 is labor intensive and requires access to well-maintained voter registration files. Method 3 requires only census data and elbow grease.

 

Redistricting databases created for the Public Mapping Project use Method 3. States may expend more resources to execute other methods. For consistency, our preference is to use state's redistricting databases where they are made public.

Public Redistricting Databases

Some states publicly release redistricting databases augmented with election data, some do not. We provide links here to states that make redistricting databases publicly available in some format. We expect more states will be added to this page as redistricting progresses. For some states that do not provide public databases, we will create our own databases to be used with our District Builder software.

We do not provide user support for these databases. To use these databases, you should have large-scale database management skills and be familiar with census and election data.

Alabama

Alaska

Arizona

Arkansas

California

Colorado

Connecticut

Delaware

Florida

Georgia

Hawaii

Idaho

Illinois

Indiana (Marion County Only)

Iowa

Kansas

Kentucky

Louisiana

Maine

Maryland

Massachusetts

Michigan

Minnesota

Mississippi

Missouri

Montana

Nebraska

Nevada

New Hampshire

New Jersey

New Mexico

New York

North Carolina

North Dakota

Ohio

Oklahoma

Oregon

Pennsylvania

Rhode Island

South Carolina

South Dakota

Tennessee

Texas

Utah

Vermont

Virginia

Washington

West Virginia

Wisconsin

Wyoming

 

 

This block level database is in comma delimited format. It may be accessed here.

The unique census block identifier is GEOID10, and is consistent with census naming conventions.

The database includes total population and voting-age population by race and ethnicity variables, as described above.

2008 and 2010 election data are disaggregated to census blocks by the following steps:

As one can imagine, in practice, these election data are not disaggregated to census blocks in whole numbers. To ensure that the statewide totals are preserved, candidates' votes are reported with fractions in double precision format.   

Election data for eight statewide races are provided in the following fields.

 

 California Statewide Database at UC Berkeley

 Redistricting in Colorado

 

 

This block level database is in comma delimited format. It may be accessed here.

The unique census block identifier is GEOID10, and is consistent with census naming conventions.

The database includes total population and voting-age population by race and ethnicity variables, as described above.

2006 and 2010 election data are provided by Florida Redistricting, the Florida House redistricting web site. The data reside on the Open Data portion of the web site. We have converted these data into a flat file data structure for use with our software. The Florida House redistricting staff have disaggregated election data to census blocks by the proportion of the area of a census block that is contained within a precinct. This method is unlike many other databases available here, where data are disaggregated by the proportion of the voting-age population. While we could modify these data to apportion the election data by voting-age population, we have chosen not to in order to keep the data in a format that is consistent with what the Florida House is using.

Election data for selected statewide races are provided in the following fields.

 This block level database is in comma delimited format. It may be accessed here.

The unique census block identifier is GEOID10, and is consistent with census naming conventions.

The database includes total population and voting-age population by race and ethnicity variables, as described above.

2008 election data are disaggregated to census blocks by the following steps:

As one can imagine, in practice, these election data are not disaggregated to census blocks in whole numbers. To ensure that the statewide totals are preserved, candidates' votes are reported with fractions in double precision format.

Note that Georgia reports mail-in absentee and in-person early votes in separate jurisdiction-wide precincts. These election results are not apportioned to the geographic precincts. As a result, the sum of the block level election results does not equal the statewide election results reported on by the Georgia Secretary of State. 

Election data for four statewide races are provided in the following fields.

 

 

This block level database is in comma delimited format. It may be accessed here.

The unique census block identifier is GEOID10, and is consistent with census naming conventions.

The database includes total population and voting-age population by race and ethnicity variables, as described above.

2008 election data are disaggregated to census blocks by the following steps:

As one can imagine, in practice, these election data are not disaggregated to census blocks in whole numbers. To ensure that the statewide totals are preserved, candidates' votes are reported with fractions in double precision format.   

Election data for four statewide races are provided in the following fields.

This block level database is in comma delimited format. It may be accessed here.

The unique census block identifier is GEOID10, and is consistent with census naming conventions.

The database includes total population and voting-age population by race and ethnicity variables, as described above. In addition, a variable VAP_NH_DOJ_BLK is the Department of Justice definition of Black that includes non-Hispanic persons who identify as single-race Black or bi-racial Black and White.

2008 election results for the Marion County Coroner race are disaggregated as follows:

As one can imagine, in practice, these registration data are not disaggregated to census blocks in whole numbers. To ensure that the statewide totals are preserved, registration totals are reported at the census block level with fractions in double precision format.   

Election data are provided in the following fields.

CORN08D

CORN08R

2008 Votes for Democratic Coroner candidate

2008 Votes for Republican Coroner candidate

 

 

 

 

 

 

This block level database is in comma delimited format. It may be accessed here.

The unique census block identifier is GEOID10, and is consistent with census naming conventions.

The database includes total population and voting-age population by race and ethnicity variables, as described above.

2010 partisan voter registration data are disaggregated to census blocks by the following steps:

As one can imagine, in practice, these registration data are not disaggregated to census blocks in whole numbers. To ensure that the statewide totals are preserved, registration totals are reported at the census block level with fractions in double precision format.   

Partisan registration data are provided in the following fields.

This block level database is in comma delimited format. It may be accessed here.

The unique census block identifier is GEOID10, and is consistent with census naming conventions.

The database includes total population and voting-age population by race and ethnicity variables, as described above.

2008 and 2006 election data are disaggregated to census blocks by the following steps:

As one can imagine, in practice, these election data are not disaggregated to census blocks in whole numbers. To ensure that the statewide totals are preserved, candidates' votes are reported with fractions in double precision format.   

Election data for four statewide races are provided in the following fields.

A census block level database of merged election and census data has been created by Minnesota's Legislative Coordinating Commission Geographic Information Systems. The LCC-GIS requests the following disclaimer be noted about these data:

LCC-GIS makes no representation or warranties, express or implied, with respect to the reuse of data provided herewith, regardless of its format or the means of its transmission. There is no guarantee or representation to the user as to the accuracy, currency, suitability, or reliability of this data for any purpose. The user accepts the data 'as is', and assumes all risks associated with its use. By accepting this data, the user agrees not to transmit this data or provide access to it or any part of it to another party unless the user shall include with the data a copy of this disclaimer.

The data are available on this website. Merged election and census data for census blocks, census voting tabulation districts (precincts and wards), and cities can be downloaded from the "Shapefiles - Statewide" file. Do not be distracted by the other data, which at first blush may appear to be the redistricting data you seek.

The available election data includes every statewide, congressional, and state legislative election from 2002 to 2010.

We have processed data provided by LCC_GIS so that selected variables conform to our database format. This block level database is in comma delimited format. It may be accessed here.

The unique census block identifier is GEOID10, and is consistent with census naming conventions.

The database includes total population and voting-age population by race and ethnicity variables, as described above.

Election data for four statewide races are provided in the following fields.

 

 

 

 

 

 

This block level database is in comma delimited format. It may be accessed here.

The unique census block identifier is GEOID10, and is consistent with census naming conventions.

The database includes total population and voting-age population by race and ethnicity variables, as described above. At this time, we do not provide any election data.

This block level database is in comma delimited format. It may be accessed here.

The unique census block identifier is GEOID10, and is consistent with census naming conventions.

The database includes total population and voting-age population by race and ethnicity variables, as described above

2008 and 2010 election data are disaggregated to census blocks by the following steps:

As one can imagine, in practice, these election data are not disaggregated to census blocks in whole numbers. To ensure that the statewide totals are preserved, candidates' votes are reported with fractions in double precision format.

Election data are provided in the following fields.

This block level database is in comma delimited format. It may be accessed here.

The unique census block identifier is GEOID10, and is consistent with census naming conventions.

The database includes total population and voting-age population by race and ethnicity variables, as described above.

Prison populations are obtained from Prisoners of the Census, which has analyzed the Census Bureau's Group Quarters data to identify prisons populations that qualify for reallocation for state legislative redistricting according to New York law. For the 79 census blocks identified with a qualifying prison population, we subtract these prison populations from the total population, or TOTPOP, to create a variable called TOTPOP_PRISON_ADJ. Note that New York law requires prisoners to be reallocated to their originating jurisdiction, while Prisoners of the Census data only enables subtracting these populations. When the NYS Legislative Task Force releases reallocated prison counts, we will update this database.

2006, 2008 and 2010 election data are disaggregated to census blocks by the following steps:

Election data are provided in the following fields.

 

 

We are providing block level election data created by Mark Salling at Cleveland State University for the state of Ohio's redistricting efforts. We expect the full redistricting database will be posted at the Ohio Secretary of State's office soon.

 

This block level database is in comma delimited format. It may be accessed here.

The unique census block identifier is GEOID10, and is consistent with census naming conventions.

The database includes total population and voting-age population by race and ethnicity variables, as described above. At this time, we do not provide any election data.

 

This block level database is in comma delimited format. It may be accessed here.

The unique census block identifier is GEOID10, and is consistent with census naming conventions.

The database includes total population and voting-age population by race and ethnicity variables, as described above. At this time, we do not provide any election data.

This block level database is in comma delimited format. It may be accessed here.

The unique census block identifier is GEOID10, and is consistent with census naming conventions.

The database includes total population and voting-age population by race and ethnicity variables, as described above.

Election data were obtained from the Senate Judiciary Committee. These data provided by the Senate Judiciary Committee have 2008 and 2010 election returns associated with the 2010 census VTDs. do not include returns from county-wide absentee, emergency, failsafe, provisional, or failsafe/provisional precincts, which the South Carolina State Election Commission does make available in county level election return reports. We have apportioned these county-wide precincts to the geographical precincts within counties since approximately 1/3rd of South Carolina ballots are cast by absentee and our analysis determined that these votes are not cast for candidates at a proportional rate equal to the geographical precinct vote shares.

Election data are disaggregated to census blocks by the following steps:

As one can imagine, in practice, these election data are not disaggregated to census blocks in whole numbers. To ensure that the statewide totals are preserved, candidates' votes are reported with fractions in double precision format.   

Election data are provided in the following fields.

 

 

 Texas Legislative Council

 

 

Source: Public Mapping Project Database

This block level database is in comma delimited format. A revised version that fixes a Census Bureau error which allocated a naval base population to the wrong census block can be accessed here. The previous version without the correction can be accessed here.

The unique census block identifier is GEOID10, and is consistent with census naming conventions.

The database includes total population and voting-age population by race and ethnicity variables, as described above.

Election data are disaggregated to census blocks by the following steps:

As one can imagine, in practice, these election data are not disaggregated to census blocks in whole numbers. To ensure that the statewide totals are preserved, candidates' votes are reported with fractions in double precision format.   

Election data are provided in the following fields.

 

 

This block level database is in comma delimited format. It may be accessed here.

The unique census block identifier is GEOID10, and is consistent with census naming conventions.

The database includes total population and voting-age population by race and ethnicity variables, as described above.

2008 election data are disaggregated to census blocks by the following steps:

As one can imagine, in practice, these election data are not disaggregated to census blocks in whole numbers. To ensure that the statewide totals are preserved, candidates' votes are reported with fractions in double precision format.   

Election data for the following statewide races are provided in the following fields.

 

 

Database Management Tips

Note that there are some poorly documented steps to import the PL94-171 data. The Census Bureau has a tips page. Here are two further tips: