This repository contains data, analytic code, and findings around gentrification, economic and demographic changes for 50 different US cities. It is based on an analysis that was used for the BuzzFeed News article, “These 11 Maps Show How Black People Have Been Driven Out Of Neighborhoods In Five Of The Most Gentrified US Cities,” published February 27, 2020. Please read that article, which contains important context and details, before proceeding.
The data used in this analysis come from three sources: US Census Bureau, censusreporter.org, and Logan et al.’s Longitudinal Tract Data Base.
The analysis uses two Census datasets, described below.
The analysis uses data from the American Community Survey’s 2013–2017 estimates, the most recent five-year demographic estimates available from the Census Bureau.
To obtain this data, we downloaded it from the Census’s API. (The Python code used to do so can be found in this repository’s 01-download-census-data.ipynb
notebook.)
The results can be found in output/census_tracts.csv
. For each tract in the MSA, that dataset includes the following variables:
geoid
— Census tract IDtotal_population
— The tract’s total populationtotal_population_25_over
— The tract’s population of people age 25 or oldermedian_income
— Median income (1,000,001 is the upper limit for this column)median_home_value
— Median home value (1,000,001 is the upper limit for this column)educational_attainment
— The number people who are 25 or older and have the equivalent of a 4-year college degreewhite_alone
— The number of people whose race is white alone, and are not Hispanicblack_alone
— The number people who are black or African American alone, and are not Hispanicnative_alone
— The number of people who are American Indian and Alaska Native aloneasian_alone
— The number of people who are Asian alonenative_hawaiian_pacific_islander
— The number of people who are Native Hawaiian and Other Pacific Islander alonesome_other_race_alone
— The number of people who are some other race alonetwo_or_more
— The number of people who are two or more raceshispanic_or_latino
— The number of people who are Hispanic or Latino
We also downloaded shapefiles detailing the geographic boundaries and Census tracts for all 50 US states from the Census Bureau’s website. These files have been saved in data/censusTracts/states/
.
We used Census Reporter to obtain a list of Census tracts that intersect with the official Census boundaries of the 50 cities to be analyzed.
These files have been saved in data/city_tracts/
. A spreadsheet of the cities included is in data/cities_metroareas_tracts_walkover.csv
.
Every decade, the Census updates some of its tract boundaries, based on population increases and decreases. To make tract-level Census data from the 2000s and 2010s comparable, Logan et al. have created the Longitudinal Tract Data Base
(LTDB). BuzzFeed News used this dataset to obtain demographic estimates for the year 2000, and to link them to the data for the tract-level data from the 2013-2017 American Community Survey.
Due to republishing limitations, the LTDB files are not included in this repository, but can be downloaded in full from the project’s website. To replicate the analysis, follow these steps:
- Open the download page in your web browser
- At the prompt, enter your email address and agree to the listed restrictions; press "Continue"
- In the first set of dropdowns, choose
Select a file type: Full
andSelect a year: 2000
; press "Download Standard Data Files" - Unzip the downloaded file, and then move the
LTDB_Std_2000_fullcount.csv
file directly into this repository’sdata
folder - In the same set of dropdowns, change
Select a file type: Full
toSelect a file type: Sample
; press "Download Standard Data Files" again - Move the downloaded
LTDB_Std_2000_Sample.csv
file directly into this repository’sdata
folder
The LTDB’s data dictionary can be found here.
BuzzFeed News’ analysis uses a methodology devised by Governing Magazine (which in turn is similar to the definition from a Columbia University study). The methodology focuses on a median income, median home value, and educational attainment metrics.
The methodology is comprised of the following two tests, as described by Governing Magazine:
A tract qualifies for potential gentrification if it meets all three of following criteria at the beginning of the study period (in this case, the year 2000):
- Had a population of at least 500 residents and was located within a central city*
- Its median household income was in the bottom 40th percentile when compared to all tracts within its metro area
- Its median home value was in the bottom 40th percentile when compared to all tracts within its metro area
- The city must also still have at least 500 residents at the end of the study period.
A tract is considered to have gentrified if it passes the test above, and also if it meets these three additional criteria at the end of the study period (in this case, in the 2013-2017 ACS survey results):
- Its median home value increased when adjusted for inflation
- Its increase inflation-adjusted median home value was in the top third of all tracts within a metro area
- Its increase in educational attainment (as measured by the percentage of residents age 25 who hold bachelor’s degrees) was in the top third of all tracts within a metro area
The data analysis was performed in the following Jupyter notebook, using the Python programming language.
The Python code for BuzzFeed News analysis, implementing the methodology above, can be found in the 02-analyze-gentrification-and-demographic-changes.ipynb
notebook. The notebook additionally calculates percentage-point changes for six non-overlapping race/ethnicity groups.
The notebook produces the following files:
-
output/census_data_metro.csv
— a merged spreadsheet of Census data for all metro areas in 2000 and 2017. -
output/gentrification.csv
— a spreadsheet, covering all Census tracts for the cities of interest (Atlanta, Baltimore, New York, Oakland and Washington, DC), with the following columns:GEOID
— Census tract IDtotal_population_19
— The tract’s total population in 2017gentrified
— Whether the tract gentrified between 2000 and 2017low_population
— Whether the tract’s population was too low to qualify for gentrificationeligible_for_gentrification
— Whether a tract was eligible for gentrification, based on Test 1 abovepct_white_alone_change
— Percentage-point change for population that was white alonepct_black_alone_change
— Percentage-point change for population that was black alonepct_native_alone_change
— Percentage-point change for population that was Native Americanpct_asian_alone_change
— Percentage-point change for population that was Asian alonepct_hispanic_or_latino_alone_change
— Percentage-point change for population that was Hispanic or Latino alonepct_native_hawaiian_pacific_islander_change
— Percentage-point change for population that was Native Hawaiian or Pacific Islander
-
for_maps/
, a directory that contains CSV and GeoJSON related to the maps displayed with the story, with the following variables:Geographic information:
GEOID
- the Census tract’s official IDname
- the name of the Census tractcity
- the name of the cityINTPTLAT
,INTPTLON
,geometry
- information from the shape files
Person-counts, overall and by race/ethnicity:
total_population_19
white_alone_19
black_alone_19
asian_alone_19
hispanic_or_latino_19
Analysis findings (among others):
gentrified
— Whether the tract gentrified, according to the two tests described abovepct_white_alone_change
— Percentage-point change for population that was white alonepct_black_alone_change
— Percentage-point change for population that was black alonepct_native_alone_change
— Percentage-point change for population that was Native Americanpct_asian_alone_change
— Percentage-point change for population that was Asian alonepct_hispanic_or_latino_alone_change
— Percentage-point change for population that was Hispanic or Latino alone
All code in this repository is available under the MIT License. All data files in the output/ directory are available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. All data files in the data/ directory are available, under their own terms, from the sources described above.
Contact Lam Thuy Vo at [email protected].
Looking for more from BuzzFeed News? Click here for a list of our open-sourced projects, data, and code.