Skip to content

Commit

Permalink
Merge pull request #113 from PolicyEngine/MaxGhenis/issue111
Browse files Browse the repository at this point in the history
Add methodology paper
  • Loading branch information
MaxGhenis authored Nov 12, 2024
2 parents 728cc41 + 97f4e66 commit 7bc6b19
Show file tree
Hide file tree
Showing 26 changed files with 1,201 additions and 1 deletion.
14 changes: 13 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.PHONY: all format test install download upload docker documentation data clean build
.PHONY: all format test install download upload docker documentation data clean build paper clean-paper

all: data test

Expand Down Expand Up @@ -49,3 +49,15 @@ build:

publish:
twine upload dist/*

paper: paper/main.pdf

paper/main.pdf: $(wildcard paper/sections/**/*.tex) $(wildcard paper/bibliography/*.bib) paper/main.tex paper/macros.tex
cd paper && \
BIBINPUTS=./bibliography pdflatex main && \
BIBINPUTS=./bibliography bibtex main && \
pdflatex main && \
pdflatex main

clean-paper:
rm -f paper/*.aux paper/*.bbl paper/*.blg paper/*.log paper/*.out paper/*.toc paper/main.pdf paper/sections/**/*.aux
49 changes: 49 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1 +1,50 @@
# PolicyEngine US Data

## Installation

```bash
pip install policyengine-us-data
```

## Building the Paper

### Prerequisites

The paper requires a LaTeX distribution (e.g., TeXLive or MiKTeX) with the following packages:

- graphicx (for figures)
- amsmath (for mathematical notation)
- natbib (for bibliography management)
- hyperref (for PDF links)
- booktabs (for tables)
- geometry (for page layout)
- microtype (for typography)
- xcolor (for colored links)

On Ubuntu/Debian, you can install these with:

```bash
sudo apt-get install texlive-latex-base texlive-latex-recommended texlive-latex-extra texlive-fonts-recommended
```

On macOS with Homebrew:

```bash
brew install --cask mactex
```

### Building

To build the paper:

```bash
make paper
```

To clean LaTeX build files:

```bash
make clean-paper
```

The output PDF will be at `paper/main.pdf`.
4 changes: 4 additions & 0 deletions changelog_entry.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
- bump: minor
changes:
added:
- Paper on methodology.
32 changes: 32 additions & 0 deletions paper/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
## Core latex/pdflatex auxiliary files:
*.aux
*.lof
*.log
*.lot
*.fls
*.out
*.toc
*.fmt
*.fot
*.cb
*.cb2
.*.lb

## Generated if empty string is given at "Please type another file name for output:"
.pdf

## Bibliography auxiliary files (bibtex/biblatex/biber):
*.bbl
*.bcf
*.blg
*-blx.aux
*-blx.bib
*.run.xml

## Build tool auxiliary files:
*.fdb_latexmk
*.synctex
*.synctex(busy)
*.synctex.gz
*.synctex.gz(busy)
*.pdfsync
186 changes: 186 additions & 0 deletions paper/bibliography/references.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
@techreport{cbo2018,
title = {An Overview of CBO's Microsimulation Tax Model},
author = {{Congressional Budget Office}},
institution = {Congressional Budget Office},
year = {2018},
url = {https://www.cbo.gov/publication/54096}
}

@techreport{jct2023,
title = {Overview of JCT Revenue Estimating Methods},
author = {{Joint Committee on Taxation}},
institution = {Joint Committee on Taxation},
number = {JCX-48-23},
year = {2023},
url = {https://www.jct.gov/publications/2023/jcx-48-23/}
}

@techreport{ota2012,
title = {Revenue Estimating Models at the U.S. Treasury Department},
author = {{Office of Tax Analysis}},
institution = {U.S. Department of the Treasury},
number = {Technical Paper 12},
year = {2012},
url = {https://home.treasury.gov/system/files/131/TP-12.pdf}
}

@article{saez2012,
title = {The Elasticity of Taxable Income with Respect to Marginal Tax Rates: A Critical Review},
author = {Saez, Emmanuel and Slemrod, Joel and Giertz, Seth H},
journal = {Journal of Economic Literature},
volume = {50},
number = {1},
pages = {3--50},
year = {2012}
}

@misc{tpc2022,
title = {Brief Description of the Tax Model},
author = {{Tax Policy Center}},
year = {2022},
url = {https://www.taxpolicycenter.org/resources/brief-description-tax-model},
note = {Updated March 2022}
}

@misc{itep2024,
title = {ITEP Tax Model Overview},
author = {{Institute on Taxation and Economic Policy}},
year = {2024},
url = {https://itep.org/itep-tax-model/}
}

@misc{tf2024,
title = {Overview of the Tax Foundation's Taxes and Growth Model},
author = {{Tax Foundation}},
year = {2024},
url = {https://taxfoundation.org/research/all/federal/overview-tax-foundations-taxes-growth-model/}
}

@misc{trim2024,
title = {TRIM3 Project Documentation: Transfer Income Model, Version 3},
author = {{Urban Institute}},
year = {2024},
url = {https://boreas.urban.org/documentation/input/Concepts%20and%20Procedures/Modifications%20to%20the%20Underlying%20Surveys.php}
}

@misc{attis2024,
title = {ATTIS Microsimulation Model},
author = {{Urban Institute}},
year = {2024},
url = {https://www.urban.org/research-methods/attis-microsimulation-model}
}

@misc{budgetlab2024,
title = {Tax Microsimulation at The Budget Lab},
author = {{Budget Lab}},
institution = {Yale University},
year = {2024},
url = {https://budgetlab.yale.edu/research/tax-microsimulation-budget-lab}
}

@misc{psl2024,
title = {Tax-Data Documentation},
author = {{Policy Simulation Library}},
year = {2024},
url = {https://github.com/PSLmodels/taxdata}
}

@article{ohare2009,
title = {Statistical Matching Using the Current Population Survey as the Donor: Techniques and Issues},
author = {O'Hare, William P},
journal = {National Tax Journal},
volume = {62},
number = {3},
pages = {519--537},
year = {2009}
}

@techreport{piketty2018,
title = {Distributional National Accounts: Methods and Estimates for the United States},
author = {Piketty, Thomas and Saez, Emmanuel and Zucman, Gabriel},
institution = {National Bureau of Economic Research},
number = {w22945},
year = {2018}
}

@article{burkhauser2012,
title = {Recent Trends in Top Income Shares in the United States: Reconciling Estimates from March CPS and IRS Tax Return Data},
author = {Burkhauser, Richard V and Feng, Shuaizhang and Jenkins, Stephen P and Larrimore, Jeff},
journal = {Review of Economics and Statistics},
volume = {94},
number = {2},
pages = {371--388},
year = {2012}
}

@article{auerbach2018,
title = {Macroeconomic Modeling of Tax Policy: A Comparison of Current Methodologies},
author = {Auerbach, Alan J and Kotlikoff, Laurence J and Koehler, Darryl},
journal = {National Tax Journal},
volume = {71},
number = {3},
pages = {541--576},
year = {2018}
}

@techreport{bryant2023a,
title = {General Description Booklet for the 2015 Public Use Tax File},
author = {Bryant, Victoria},
institution = {Statistics of Income Division, Internal Revenue Service},
year = {2023},
month = {February},
type = {Technical Documentation},
url = {https://drive.google.com/file/d/1WoTU70GEjYMO0KHsHvTTH0NwCc-kN5cE/view}
}

@techreport{bryant2023b,
title = {General Description Booklet for the 2015 Public Use Tax File Demographic File},
author = {Bryant, Victoria},
institution = {Statistics of Income Division, Internal Revenue Service},
year = {2023},
month = {February},
type = {Technical Documentation},
url = {https://drive.google.com/file/d/1WoTU70GEjYMO0KHsHvTTH0NwCc-kN5cE/view}
}

@techreport{census2024,
title = {Current Population Survey, 2024 Annual Social and Economic (ASEC) Supplement},
author = {{U.S. Census Bureau}},
institution = {U.S. Census Bureau},
year = {2024},
url = {https://www2.census.gov/programs-surveys/cps/datasets/2024/march/asec2024_ddl_pub_full.pdf}
}

@article{meinshausen2006quantile,
title = {Quantile regression forests},
author = {Meinshausen, Nicolai and Ridgeway, Greg},
journal = {Journal of machine learning research},
volume = {7},
number = {6},
year = {2006}
}

@misc{zillow2024quantile,
title = {quantile-forest: Scikit-learn compatible quantile regression forests},
author = {{Zillow Group}},
year = {2024},
howpublished = {\url{https://zillow.github.io/quantile-forest/}}
}

@article{pytorch2019,
title = {PyTorch: An Imperative Style, High-Performance Deep Learning Library},
author = {Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and others},
journal = {Advances in Neural Information Processing Systems},
volume = {32},
year = {2019}
}

@techreport{woodruff2023survey,
title = {Surveying the (loss) landscape: using machine learning to improve household survey accuracy},
author = {Woodruff, Nikhil},
institution = {University of Durham},
year = {2023},
month = {April},
note = {Demonstrates superiority of machine learning approaches over traditional methods for survey enhancement through comprehensive benchmarking},
url = {https://github.com/policyengine/survey-enhance/blob/main/docs/paper/project_paper.pdf}
}
Binary file added paper/figures/data_flow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added paper/figures/ecps_vs_cps_puf.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions paper/macros.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
% Custom commands and mathematics macros
\newcommand{\policyengine}{\textsc{PolicyEngine}}
\newcommand{\cps}{\textsc{CPS}}
\newcommand{\puf}{\textsc{PUF}}
Binary file added paper/main.pdf
Binary file not shown.
55 changes: 55 additions & 0 deletions paper/main.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
\documentclass[12pt]{article}

\usepackage{graphicx}
\usepackage{amsmath}
\usepackage[round]{natbib} % Keep round option
\usepackage{hyperref}
\usepackage{booktabs}
\usepackage{geometry}
\usepackage{microtype}
\usepackage{xcolor}

% Set citation style in preamble
\bibpunct{(}{)}{;}{a}{,}{,} % Move here
\setcitestyle{authoryear,round} % Move here

\input{macros}

\geometry{margin=1in}
\hypersetup{
colorlinks=true,
linkcolor=blue,
filecolor=magenta,
urlcolor=blue,
citecolor=blue,
}


\title{Enhancing Survey Microdata with Administrative Records: \\ A Novel Approach to Microsimulation Dataset Construction}
% Define the \samethanks command
\newcommand*\samethanks[1][\value{footnote}]{\footnotemark[#1]}

% Define authors with the same affiliation
\author{
Nikhil Woodruff\thanks{PolicyEngine} \and
Max Ghenis\samethanks
}
\date{\today}

\begin{document}

\maketitle

\input{sections/abstract}
\input{sections/introduction}
\input{sections/background}
\input{sections/data}
\input{sections/methodology}
\input{sections/results}
\input{sections/discussion}
\input{sections/conclusion}

\bibliographystyle{plainnat}
\bibliography{./bibliography/references}

\end{document}
3 changes: 3 additions & 0 deletions paper/sections/abstract.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
\section*{Abstract}

We combine the demographic detail of the Current Population Survey (CPS) with the tax precision of the IRS Public Use File (PUF) to create an enhanced microsimulation dataset. Our method uses quantile regression forests to transfer income and tax variables from the PUF to demographically-similar CPS households. We create a synthetic CPS-structured dataset using PUF tax information, stack it alongside the original CPS records, then use dropout-regularized gradient descent to reweight households toward administrative targets from IRS Statistics of Income, Census population estimates, and program participation data. This preserves the CPS's granular demographic and geographic information while leveraging the PUF's tax reporting accuracy. The enhanced dataset provides a foundation for analyzing federal tax policy, state tax systems, and benefit programs. We release both the enhanced dataset and our open-source enhancement procedure to support transparent policy analysis.
Loading

0 comments on commit 7bc6b19

Please sign in to comment.