A Python package for downloading IRS SOI Tax Stats data, with utilities for parsing the data dictionary.
The Internal Revenue Service's Statistics of Income (IRS SOI) division annually publish selected tax stats, by U.S., state, or county.
- This package provides functions to:
- Download data from IRS SOI Tax Stats historical tables
- Download documentation for historical table 2
- Convert the documentation from .doc to .xls format
- Take documentation and produce dictionary of labels
# dev version pip install git+https://github.com/raheem03/taxstats
from taxstats import *
# create an instance of the taxstats object irs = taxstats(year = 2016) irs = taxstats(year = 2016, level = 'state', state = 'md') irs = taxstats(year = 2016, level = 'county', state = 'va') irs = taxstats(year = 2016, level = 'us') irs = taxstats(table = 3)
Once you have created an instance of the taxstats object, you can access a method for downloading thefile with the relevant parameters to your current working directory.
irs.get_table()
Similarly, you can get any available documentation (for historical table 2)
filename = irs.get_docs()
IRS only allows you to download the documentation as a .doc file. This package comes with a utility function that downloads the file in .xls format and also returns a dataframe object with the dictionary that you can access.
# Convert .doc to .xls and return as dataframe df = parse_docs(filename)
Finally, you can create a dictionary of labels using the parsed dictionary.
labels = create_labels(filename)
Code released under the MIT License.