Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the Enhanced CPS #2

Merged
merged 51 commits into from
Sep 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
7c963fd
Lower employment income by 10%
nikhilwoodruff Aug 15, 2024
15c82ab
Try adding PR comment
nikhilwoodruff Aug 15, 2024
34915e9
Add missing dep
nikhilwoodruff Aug 15, 2024
342071c
Try gh actions fix
nikhilwoodruff Aug 15, 2024
e40ecc9
Test that subsequent commits don't create a new comment
nikhilwoodruff Aug 15, 2024
e728f5b
Re-add evaluation
nikhilwoodruff Aug 15, 2024
54b5518
Change review format
nikhilwoodruff Aug 15, 2024
9cb547a
Re-add comment action
nikhilwoodruff Aug 15, 2024
b9f319b
Revert to normal CPS
nikhilwoodruff Aug 18, 2024
230553c
Add 2015 and 2021 PUF data
nikhilwoodruff Aug 18, 2024
737a792
Add uprating table
nikhilwoodruff Aug 18, 2024
d91726f
Add uprating tools
nikhilwoodruff Aug 19, 2024
3da84d9
Add SOI CSV
nikhilwoodruff Aug 19, 2024
10fd55c
Add diffing ability
nikhilwoodruff Aug 19, 2024
ac3d65e
Format
nikhilwoodruff Aug 19, 2024
bf4fdbe
Fix bugs
nikhilwoodruff Aug 19, 2024
6f19ae5
Fix address
nikhilwoodruff Aug 19, 2024
92bd14b
Merge into one action
nikhilwoodruff Aug 19, 2024
9e759c5
Add uprating
nikhilwoodruff Aug 23, 2024
c28821b
Improve calibration for ECPS
nikhilwoodruff Aug 24, 2024
a9f1ad2
Add census population projections
nikhilwoodruff Aug 26, 2024
f2b9db4
Add census and cbo targets
nikhilwoodruff Aug 26, 2024
9b7a8d8
Add ECPS improvements
nikhilwoodruff Aug 27, 2024
0eb1939
Add run step
nikhilwoodruff Aug 27, 2024
8a9689d
Fix docker run action
nikhilwoodruff Aug 27, 2024
eb5083b
Minor improvements
nikhilwoodruff Aug 27, 2024
014a145
Silly error that I don't want to talk about
nikhilwoodruff Aug 27, 2024
8841612
Relax policyengine-us dep
nikhilwoodruff Aug 27, 2024
da1cb2c
Add secret token
nikhilwoodruff Aug 27, 2024
38c3f55
Add env to step
nikhilwoodruff Aug 27, 2024
05d72ec
Amend gh action
nikhilwoodruff Aug 27, 2024
7ce6ea2
Add error stdout
nikhilwoodruff Aug 27, 2024
ad66ec7
Remove download from build
nikhilwoodruff Aug 27, 2024
905d7ad
Amend make
nikhilwoodruff Aug 27, 2024
2940a1f
Move data input to before docker build
nikhilwoodruff Aug 27, 2024
85cd0c1
Move cmd to run
nikhilwoodruff Aug 27, 2024
649d909
Add missing secret
nikhilwoodruff Aug 27, 2024
1502f2e
Use correct URL
nikhilwoodruff Aug 27, 2024
28f0f27
Set require=True
nikhilwoodruff Aug 27, 2024
49ca1fa
Update docs
nikhilwoodruff Aug 28, 2024
ca1ef76
Add updates to actions
nikhilwoodruff Aug 31, 2024
37e8c07
Add dep
nikhilwoodruff Aug 31, 2024
b43da9a
Add setup python
nikhilwoodruff Aug 31, 2024
96e209c
Extend calibration to ten years
nikhilwoodruff Aug 31, 2024
c0df878
Fix indices
nikhilwoodruff Aug 31, 2024
c3793cd
Add updates
nikhilwoodruff Sep 1, 2024
b383eb6
Add model splitting
nikhilwoodruff Sep 2, 2024
3b3bd81
Remove ECPS restriction
nikhilwoodruff Sep 2, 2024
51cde43
Update data (final!)
nikhilwoodruff Sep 2, 2024
7c5f885
Add download links
nikhilwoodruff Sep 2, 2024
1637b8e
Update PE-US
nikhilwoodruff Sep 2, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 0 additions & 21 deletions .github/review_pull_request.py

This file was deleted.

13 changes: 0 additions & 13 deletions .github/upload_evaluation.py

This file was deleted.

35 changes: 20 additions & 15 deletions .github/workflows/pull_request.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,26 +7,31 @@ on:

jobs:
build:
name: Build and test
name: Test
runs-on: ubuntu-latest
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

steps:
- name: Checkout code
uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v2

- name: Install dependencies
uses: actions/setup-python@v5
with:
python-version: 3.12
- name: Install package
run: make install

- name: Download data inputs
run: make download
env:
POLICYENGINE_US_DATA_GITHUB_TOKEN: ${{ secrets.POLICYENGINE_US_DATA_GITHUB_TOKEN }}
- name: Build datasets
run: make data
- name: Run tests
run: make test

- name: Run evaluation
run: make evaluate

- name: Add review comment
run: python .github/review_pull_request.py
lint:
runs-on: ubuntu-latest
name: Lint
steps:
- uses: actions/checkout@v4
- name: Check formatting
uses: "lgeiger/black-action@master"
with:
args: ". -l 79 --check"
60 changes: 45 additions & 15 deletions .github/workflows/push.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,26 +7,56 @@ on:

jobs:
build:
name: Build and test
name: Test
runs-on: ubuntu-latest
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

steps:
- name: Checkout code
uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v2

- name: Install dependencies
uses: actions/setup-python@v5
with:
python-version: 3.12
- name: Install package
run: make install

- name: Download data inputs
run: make download
env:
POLICYENGINE_US_DATA_GITHUB_TOKEN: ${{ secrets.POLICYENGINE_US_DATA_GITHUB_TOKEN }}
- name: Build datasets
run: make data
- name: Run tests
run: make test

- name: Run evaluation
run: make evaluate

- name: Upload evaluation
run: python .github/upload_evaluation.py
- name: Upload completed datasets
run: make upload
env:
POLICYENGINE_US_DATA_GITHUB_TOKEN: ${{ secrets.POLICYENGINE_US_DATA_GITHUB_TOKEN }}
lint:
runs-on: ubuntu-latest
name: Lint
steps:
- uses: actions/checkout@v4
- name: Check formatting
uses: "lgeiger/black-action@master"
with:
args: ". -l 79 --check"
publish:
runs-on: ubuntu-latest
name: Publish
steps:
- name: Checkout code
uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: 3.12
- name: Install package
run: make install
- name: Build package
run: make build
- name: Publish a Python distribution to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
user: __token__
password: ${{ secrets.PYPI }}
skip-existing: true

2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,5 @@
**/*.h5
*.ipynb
**/*.csv
!uprating_factors.csv
!uprating_growth_factors.csv
3 changes: 1 addition & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,4 @@ FROM python:latest
COPY . .
# Install
RUN make install
# Run tests
CMD ["make", "test"]
RUN ["make", "data"]
22 changes: 20 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
all: data test

format:
black . -l 79

Expand All @@ -7,8 +9,24 @@ test:
install:
pip install -e .[dev]

download:
python policyengine_us_data/data_storage/download_public_prerequisites.py
python policyengine_us_data/data_storage/download_private_prerequisites.py

upload:
python policyengine_us_data/data_storage/upload_completed_datasets.py

docker:
docker buildx build --platform linux/amd64 . -t policyengine-us-data:latest

evaluate:
python policyengine_us_data/evaluation/summary.py
documentation:
streamlit run docs/Home.py

data:
python policyengine_us_data/datasets/cps/enhanced_cps.py

clean:
rm policyengine_us_data/data_storage/puf_2015.csv
rm policyengine_us_data/data_storage/demographics_2015.csv
build:
python setup.py sdist bdist_wheel
8 changes: 8 additions & 0 deletions docs/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
FROM python:latest
COPY . .
# Install
RUN make download
RUN make install
RUN python docs/download.py
EXPOSE 8080
ENTRYPOINT ["streamlit", "run", "docs/Home.py", "--server.port=8080", "--server.address=0.0.0.0"]
37 changes: 37 additions & 0 deletions docs/Home.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
import streamlit as st

st.title("PolicyEngine-US-Data")

st.write(
"""PolicyEngine-US-Data is a package to create representative microdata for the US, designed for input in the PolicyEngine tax-benefit microsimulation model."""
)

st.subheader("What does this repo do?")

st.write(
"""Principally, this package creates a (partly synthetic) dataset of households (with incomes, demographics and more) that describes the U.S. household sector. This dataset synthesises multiple sources of data (the Current Population Survey, the IRS Public Use File, and administrative statistics) to improve upon the accuracy of **any** of them."""
)

st.subheader("What does this dataset look like?")

st.write(
"The below table shows an extract of the person records in one household in the dataset."
)


@st.cache_data
def sample_household():
import pandas as pd
from policyengine_us_data.datasets import EnhancedCPS_2024
from policyengine_us import Microsimulation

df = Microsimulation(dataset=EnhancedCPS_2024).to_input_dataframe()

household_id = df.person_household_id__2024.values[10]
people_in_household = df[df.person_household_id__2024 == household_id]
return people_in_household


people_in_household = sample_household()

st.dataframe(people_in_household.T, use_container_width=True)
26 changes: 26 additions & 0 deletions docs/download.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
from policyengine_us_data.utils.github import download
from policyengine_us_data.data_storage import STORAGE_FOLDER

download(
"PolicyEngine",
"policyengine-us-data",
"release",
"enhanced_cps_2024.h5",
STORAGE_FOLDER / "enhanced_cps_2024.h5",
)

download(
"PolicyEngine",
"policyengine-us-data",
"release",
"cps_2024.h5",
STORAGE_FOLDER / "cps_2024.h5",
)

download(
"PolicyEngine",
"irs-soi-puf",
"release",
"puf_2024.h5",
STORAGE_FOLDER / "puf_2024.h5",
)
43 changes: 43 additions & 0 deletions docs/pages/Aggregates.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
import streamlit as st

st.title("Aggregates")

st.write(
"""The table below shows the totals for calendar year 2024 for the Enhanced CPS dataset variables."""
)


@st.cache_data
def sample_household():
from policyengine_us import Microsimulation
from policyengine_us_data import EnhancedCPS_2024
from policyengine_us_data.datasets.cps.extended_cps import (
IMPUTED_VARIABLES as FINANCE_VARIABLES,
)
import pandas as pd

sim = Microsimulation(dataset=EnhancedCPS_2024)

df = (
pd.DataFrame(
{
"Variable": FINANCE_VARIABLES,
"Total ($bn)": [
round(
sim.calculate(variable, map_to="household").sum()
/ 1e9,
1,
)
for variable in FINANCE_VARIABLES
],
}
)
.sort_values("Total ($bn)", ascending=False)
.set_index("Variable")
)
return df


df = sample_household()

st.dataframe(df, use_container_width=True)
Loading
Loading