The purpose of this repo is to provide information / links to people, papers, projects, and code involving using novel data methods, ML, NLP, etc. in government.
This is in very early stages! Please reach out to [email protected] OR [email protected] -- or, just go ahead and submit a Pull Request -- if you've got anything that you'd like to be added to this list.
- Gebru, T., Krause, J., Wang, Y., Chen, D., Deng, J., Aiden, E. L., & Fei-Fei, L. (2017). Using deep learning and Google Street View to estimate the demographic makeup of neighborhoods across the United States. Proceedings of the National Academy of Sciences, 114(50), 13108-13113.
- Gebru, T., Krause, J., Wang, Y., Chen, D., Deng, J., Aiden, E. L., & Fei-Fei, L. (2017). Using deep learning and google street view to estimate the demographic makeup of the us. arXiv preprint arXiv:1702.06683. (arxiv version of below paper)
- Gebru, T., Krause, J., Wang, Y., Chen, D., Deng, J., & Fei-Fei, L. (2015). Visual census: Using cars to study people and society. Bigvision.
- Chew, Rob et al. Toward Model-Generated Household Listing in Low- and Middle-Income Countries Using Deep Learning. https://www.mdpi.com/2220-9964/7/11/448
- Chew, Rob et al. Residential scene classification for gridded population sampling in developing countries using deep convolutional neural networks on satellite imagery https://ij-healthgeographics.biomedcentral.com/track/pdf/10.1186/s12942-018-0132-1
- (BLS) Measure, Alex (2017). Deep neural networks for worker injury autocoding.
- (BLS) Measure, Alex (2014). Automatic coding of worker injury narratives. Basis for 1.
- (Census) Dumbacher, Brian; Hanna, Demetria (2017). Passive Data Collection, System-to-system data collection, machine learning to improve economic surveys.
- (BLS) Measure, Alex. "Autocoding Class." https://github.com/ameasure/autocoding-class
- (Census) Moscardi, Christian et al. Using NLP to improve the Commodity Flow Survey
- (Census) Cuffe, John et al. Using Public Data to Generate Industrial Classification Codes and slides
- Skinner, Michael (2018). Product categorization with LSTMs and balanced pooling views.
- Also check the other "data challenge papers" here: https://sigir-ecom.github.io/accepted-papers.html
- Ding, Liya et al. (2015). Auto-Categorization of HS Code Using Background Net Approach.
- Banks, Duren et al. Arrest-Related Deaths Program Redesign Study [by mining news coverage]: https://www.bjs.gov/content/pub/pdf/ardprs1516pf.pdf
- Pierce, Cynthia et al. (2013). Crowd Sourcing data through Amazon Mechanical Turk.
- Chew, Rob et al. SMART: a platform for labelling data. https://rtiinternational.github.io/SMART/ (arXiv paper as well).
- Settles, Burr (2010). Active Learning.
- (BLS) Yu, Erica et al., 2015: https://www.bls.gov/osmr/pdf/st150260.pdf
- (NCI), Fowler, Stephanie et al., 2015: https://s3.amazonaws.com/sitesusa/wp-content/uploads/sites/242/2016/03/C2_Fowler_2015FCSM.pdf
- (NCSES), Morrison, Rebecca et al., 2017:- https://www.census.gov/fedcasic/fc2018/ppt/5AMorrison.pdf
- (DOE), Greenblatt, Jeffery et al., 2013:- https://www.osti.gov/biblio/1171618
- (ORNL) Wang, Chieh (Ross) et al. : Web Scraping rail photos to track crude oil shipments. http://onlinepubs.trb.org/onlinepubs/Conferences/2019/FreightData/CrudebyRailRoutesWang.pdf
- (Census) Dumbacher, Brian et al. : Scraping Assisted By Learning. https://www.census.gov/content/dam/Census/newsroom/press-kits/2018/jsm/jsm-presentation-web-scraping.pdf
- Chew, Rob et al. : Assessing Target Audiences of Digital Public Health Campaigns. https://link.springer.com/chapter/10.1007%2F978-3-319-93372-6_32 (if curious, please feel free to reach out for more info!)
- UK House of Commons Science and Technology Committee: “Algorithms in Decision Making”
- ACM Conference on Fairness, Accountability, and Transparency (“FATML”)
- Friedler et al. (2016). “On the (im-)possibility of fairness”
- Corbett-Davies et al. (2017). “Algorithmic decision making and the cost of fairness”
- IMPACTnet: aicommons.com (launching in Jan. 2019)
- NVIDIA Conference https://www.nvidia.com/en-us/gtc-dc/ November 2019 in Washington, DC (free registration for feds)
- Systems@Scale Conference, https://atscaleconference.com/ (Next: June 6, 2019 in San Jose)
- O'Reilly Artificail Intelligence Conference Series, https://www.oreilly.com/conferences/ (several times in year in different cities)
- Stata Data Conference Series (several times a year in different cities)
- PyData Confernce
- Strangeloop, September 2019, St. Louis, Missouri
- FedCASIC
- Federal Statistical Meetings