-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #113 from PolicyEngine/MaxGhenis/issue111
Add methodology paper
- Loading branch information
Showing
26 changed files
with
1,201 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,50 @@ | ||
# PolicyEngine US Data | ||
|
||
## Installation | ||
|
||
```bash | ||
pip install policyengine-us-data | ||
``` | ||
|
||
## Building the Paper | ||
|
||
### Prerequisites | ||
|
||
The paper requires a LaTeX distribution (e.g., TeXLive or MiKTeX) with the following packages: | ||
|
||
- graphicx (for figures) | ||
- amsmath (for mathematical notation) | ||
- natbib (for bibliography management) | ||
- hyperref (for PDF links) | ||
- booktabs (for tables) | ||
- geometry (for page layout) | ||
- microtype (for typography) | ||
- xcolor (for colored links) | ||
|
||
On Ubuntu/Debian, you can install these with: | ||
|
||
```bash | ||
sudo apt-get install texlive-latex-base texlive-latex-recommended texlive-latex-extra texlive-fonts-recommended | ||
``` | ||
|
||
On macOS with Homebrew: | ||
|
||
```bash | ||
brew install --cask mactex | ||
``` | ||
|
||
### Building | ||
|
||
To build the paper: | ||
|
||
```bash | ||
make paper | ||
``` | ||
|
||
To clean LaTeX build files: | ||
|
||
```bash | ||
make clean-paper | ||
``` | ||
|
||
The output PDF will be at `paper/main.pdf`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
- bump: minor | ||
changes: | ||
added: | ||
- Paper on methodology. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
## Core latex/pdflatex auxiliary files: | ||
*.aux | ||
*.lof | ||
*.log | ||
*.lot | ||
*.fls | ||
*.out | ||
*.toc | ||
*.fmt | ||
*.fot | ||
*.cb | ||
*.cb2 | ||
.*.lb | ||
|
||
## Generated if empty string is given at "Please type another file name for output:" | ||
|
||
## Bibliography auxiliary files (bibtex/biblatex/biber): | ||
*.bbl | ||
*.bcf | ||
*.blg | ||
*-blx.aux | ||
*-blx.bib | ||
*.run.xml | ||
|
||
## Build tool auxiliary files: | ||
*.fdb_latexmk | ||
*.synctex | ||
*.synctex(busy) | ||
*.synctex.gz | ||
*.synctex.gz(busy) | ||
*.pdfsync |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,186 @@ | ||
@techreport{cbo2018, | ||
title = {An Overview of CBO's Microsimulation Tax Model}, | ||
author = {{Congressional Budget Office}}, | ||
institution = {Congressional Budget Office}, | ||
year = {2018}, | ||
url = {https://www.cbo.gov/publication/54096} | ||
} | ||
|
||
@techreport{jct2023, | ||
title = {Overview of JCT Revenue Estimating Methods}, | ||
author = {{Joint Committee on Taxation}}, | ||
institution = {Joint Committee on Taxation}, | ||
number = {JCX-48-23}, | ||
year = {2023}, | ||
url = {https://www.jct.gov/publications/2023/jcx-48-23/} | ||
} | ||
|
||
@techreport{ota2012, | ||
title = {Revenue Estimating Models at the U.S. Treasury Department}, | ||
author = {{Office of Tax Analysis}}, | ||
institution = {U.S. Department of the Treasury}, | ||
number = {Technical Paper 12}, | ||
year = {2012}, | ||
url = {https://home.treasury.gov/system/files/131/TP-12.pdf} | ||
} | ||
|
||
@article{saez2012, | ||
title = {The Elasticity of Taxable Income with Respect to Marginal Tax Rates: A Critical Review}, | ||
author = {Saez, Emmanuel and Slemrod, Joel and Giertz, Seth H}, | ||
journal = {Journal of Economic Literature}, | ||
volume = {50}, | ||
number = {1}, | ||
pages = {3--50}, | ||
year = {2012} | ||
} | ||
|
||
@misc{tpc2022, | ||
title = {Brief Description of the Tax Model}, | ||
author = {{Tax Policy Center}}, | ||
year = {2022}, | ||
url = {https://www.taxpolicycenter.org/resources/brief-description-tax-model}, | ||
note = {Updated March 2022} | ||
} | ||
|
||
@misc{itep2024, | ||
title = {ITEP Tax Model Overview}, | ||
author = {{Institute on Taxation and Economic Policy}}, | ||
year = {2024}, | ||
url = {https://itep.org/itep-tax-model/} | ||
} | ||
|
||
@misc{tf2024, | ||
title = {Overview of the Tax Foundation's Taxes and Growth Model}, | ||
author = {{Tax Foundation}}, | ||
year = {2024}, | ||
url = {https://taxfoundation.org/research/all/federal/overview-tax-foundations-taxes-growth-model/} | ||
} | ||
|
||
@misc{trim2024, | ||
title = {TRIM3 Project Documentation: Transfer Income Model, Version 3}, | ||
author = {{Urban Institute}}, | ||
year = {2024}, | ||
url = {https://boreas.urban.org/documentation/input/Concepts%20and%20Procedures/Modifications%20to%20the%20Underlying%20Surveys.php} | ||
} | ||
|
||
@misc{attis2024, | ||
title = {ATTIS Microsimulation Model}, | ||
author = {{Urban Institute}}, | ||
year = {2024}, | ||
url = {https://www.urban.org/research-methods/attis-microsimulation-model} | ||
} | ||
|
||
@misc{budgetlab2024, | ||
title = {Tax Microsimulation at The Budget Lab}, | ||
author = {{Budget Lab}}, | ||
institution = {Yale University}, | ||
year = {2024}, | ||
url = {https://budgetlab.yale.edu/research/tax-microsimulation-budget-lab} | ||
} | ||
|
||
@misc{psl2024, | ||
title = {Tax-Data Documentation}, | ||
author = {{Policy Simulation Library}}, | ||
year = {2024}, | ||
url = {https://github.com/PSLmodels/taxdata} | ||
} | ||
|
||
@article{ohare2009, | ||
title = {Statistical Matching Using the Current Population Survey as the Donor: Techniques and Issues}, | ||
author = {O'Hare, William P}, | ||
journal = {National Tax Journal}, | ||
volume = {62}, | ||
number = {3}, | ||
pages = {519--537}, | ||
year = {2009} | ||
} | ||
|
||
@techreport{piketty2018, | ||
title = {Distributional National Accounts: Methods and Estimates for the United States}, | ||
author = {Piketty, Thomas and Saez, Emmanuel and Zucman, Gabriel}, | ||
institution = {National Bureau of Economic Research}, | ||
number = {w22945}, | ||
year = {2018} | ||
} | ||
|
||
@article{burkhauser2012, | ||
title = {Recent Trends in Top Income Shares in the United States: Reconciling Estimates from March CPS and IRS Tax Return Data}, | ||
author = {Burkhauser, Richard V and Feng, Shuaizhang and Jenkins, Stephen P and Larrimore, Jeff}, | ||
journal = {Review of Economics and Statistics}, | ||
volume = {94}, | ||
number = {2}, | ||
pages = {371--388}, | ||
year = {2012} | ||
} | ||
|
||
@article{auerbach2018, | ||
title = {Macroeconomic Modeling of Tax Policy: A Comparison of Current Methodologies}, | ||
author = {Auerbach, Alan J and Kotlikoff, Laurence J and Koehler, Darryl}, | ||
journal = {National Tax Journal}, | ||
volume = {71}, | ||
number = {3}, | ||
pages = {541--576}, | ||
year = {2018} | ||
} | ||
|
||
@techreport{bryant2023a, | ||
title = {General Description Booklet for the 2015 Public Use Tax File}, | ||
author = {Bryant, Victoria}, | ||
institution = {Statistics of Income Division, Internal Revenue Service}, | ||
year = {2023}, | ||
month = {February}, | ||
type = {Technical Documentation}, | ||
url = {https://drive.google.com/file/d/1WoTU70GEjYMO0KHsHvTTH0NwCc-kN5cE/view} | ||
} | ||
|
||
@techreport{bryant2023b, | ||
title = {General Description Booklet for the 2015 Public Use Tax File Demographic File}, | ||
author = {Bryant, Victoria}, | ||
institution = {Statistics of Income Division, Internal Revenue Service}, | ||
year = {2023}, | ||
month = {February}, | ||
type = {Technical Documentation}, | ||
url = {https://drive.google.com/file/d/1WoTU70GEjYMO0KHsHvTTH0NwCc-kN5cE/view} | ||
} | ||
|
||
@techreport{census2024, | ||
title = {Current Population Survey, 2024 Annual Social and Economic (ASEC) Supplement}, | ||
author = {{U.S. Census Bureau}}, | ||
institution = {U.S. Census Bureau}, | ||
year = {2024}, | ||
url = {https://www2.census.gov/programs-surveys/cps/datasets/2024/march/asec2024_ddl_pub_full.pdf} | ||
} | ||
|
||
@article{meinshausen2006quantile, | ||
title = {Quantile regression forests}, | ||
author = {Meinshausen, Nicolai and Ridgeway, Greg}, | ||
journal = {Journal of machine learning research}, | ||
volume = {7}, | ||
number = {6}, | ||
year = {2006} | ||
} | ||
|
||
@misc{zillow2024quantile, | ||
title = {quantile-forest: Scikit-learn compatible quantile regression forests}, | ||
author = {{Zillow Group}}, | ||
year = {2024}, | ||
howpublished = {\url{https://zillow.github.io/quantile-forest/}} | ||
} | ||
|
||
@article{pytorch2019, | ||
title = {PyTorch: An Imperative Style, High-Performance Deep Learning Library}, | ||
author = {Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and others}, | ||
journal = {Advances in Neural Information Processing Systems}, | ||
volume = {32}, | ||
year = {2019} | ||
} | ||
|
||
@techreport{woodruff2023survey, | ||
title = {Surveying the (loss) landscape: using machine learning to improve household survey accuracy}, | ||
author = {Woodruff, Nikhil}, | ||
institution = {University of Durham}, | ||
year = {2023}, | ||
month = {April}, | ||
note = {Demonstrates superiority of machine learning approaches over traditional methods for survey enhancement through comprehensive benchmarking}, | ||
url = {https://github.com/policyengine/survey-enhance/blob/main/docs/paper/project_paper.pdf} | ||
} |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
% Custom commands and mathematics macros | ||
\newcommand{\policyengine}{\textsc{PolicyEngine}} | ||
\newcommand{\cps}{\textsc{CPS}} | ||
\newcommand{\puf}{\textsc{PUF}} |
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
\documentclass[12pt]{article} | ||
|
||
\usepackage{graphicx} | ||
\usepackage{amsmath} | ||
\usepackage[round]{natbib} % Keep round option | ||
\usepackage{hyperref} | ||
\usepackage{booktabs} | ||
\usepackage{geometry} | ||
\usepackage{microtype} | ||
\usepackage{xcolor} | ||
|
||
% Set citation style in preamble | ||
\bibpunct{(}{)}{;}{a}{,}{,} % Move here | ||
\setcitestyle{authoryear,round} % Move here | ||
|
||
\input{macros} | ||
|
||
\geometry{margin=1in} | ||
\hypersetup{ | ||
colorlinks=true, | ||
linkcolor=blue, | ||
filecolor=magenta, | ||
urlcolor=blue, | ||
citecolor=blue, | ||
} | ||
|
||
|
||
\title{Enhancing Survey Microdata with Administrative Records: \\ A Novel Approach to Microsimulation Dataset Construction} | ||
% Define the \samethanks command | ||
\newcommand*\samethanks[1][\value{footnote}]{\footnotemark[#1]} | ||
|
||
% Define authors with the same affiliation | ||
\author{ | ||
Nikhil Woodruff\thanks{PolicyEngine} \and | ||
Max Ghenis\samethanks | ||
} | ||
\date{\today} | ||
|
||
\begin{document} | ||
|
||
\maketitle | ||
|
||
\input{sections/abstract} | ||
\input{sections/introduction} | ||
\input{sections/background} | ||
\input{sections/data} | ||
\input{sections/methodology} | ||
\input{sections/results} | ||
\input{sections/discussion} | ||
\input{sections/conclusion} | ||
|
||
\bibliographystyle{plainnat} | ||
\bibliography{./bibliography/references} | ||
|
||
\end{document} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
\section*{Abstract} | ||
|
||
We combine the demographic detail of the Current Population Survey (CPS) with the tax precision of the IRS Public Use File (PUF) to create an enhanced microsimulation dataset. Our method uses quantile regression forests to transfer income and tax variables from the PUF to demographically-similar CPS households. We create a synthetic CPS-structured dataset using PUF tax information, stack it alongside the original CPS records, then use dropout-regularized gradient descent to reweight households toward administrative targets from IRS Statistics of Income, Census population estimates, and program participation data. This preserves the CPS's granular demographic and geographic information while leveraging the PUF's tax reporting accuracy. The enhanced dataset provides a foundation for analyzing federal tax policy, state tax systems, and benefit programs. We release both the enhanced dataset and our open-source enhancement procedure to support transparent policy analysis. |
Oops, something went wrong.