ML-NLP-Resource-List

The purpose of this repo is to provide information / links to people, papers, projects, and code involving using novel data methods, ML, NLP, etc. in government.

This is in very early stages! Please reach out to [email protected] OR [email protected] -- or, just go ahead and submit a Pull Request -- if you've got anything that you'd like to be added to this list.

Resource List

Computer Vision + Visual Census

Gebru, T., Krause, J., Wang, Y., Chen, D., Deng, J., Aiden, E. L., & Fei-Fei, L. (2017). Using deep learning and Google Street View to estimate the demographic makeup of neighborhoods across the United States. Proceedings of the National Academy of Sciences, 114(50), 13108-13113.
Gebru, T., Krause, J., Wang, Y., Chen, D., Deng, J., Aiden, E. L., & Fei-Fei, L. (2017). Using deep learning and google street view to estimate the demographic makeup of the us. arXiv preprint arXiv:1702.06683. (arxiv version of below paper)
Gebru, T., Krause, J., Wang, Y., Chen, D., Deng, J., & Fei-Fei, L. (2015). Visual census: Using cars to study people and society. Bigvision.
Chew, Rob et al. Toward Model-Generated Household Listing in Low- and Middle-Income Countries Using Deep Learning. https://www.mdpi.com/2220-9964/7/11/448
Chew, Rob et al. Residential scene classification for gridded population sampling in developing countries using deep convolutional neural networks on satellite imagery https://ij-healthgeographics.biomedcentral.com/track/pdf/10.1186/s12942-018-0132-1

Free-text to automatically assign codes (aka autocoding)

Within gov

(BLS) Measure, Alex (2017). Deep neural networks for worker injury autocoding.
(BLS) Measure, Alex (2014). Automatic coding of worker injury narratives. Basis for 1.
(Census) Dumbacher, Brian; Hanna, Demetria (2017). Passive Data Collection, System-to-system data collection, machine learning to improve economic surveys.
(BLS) Measure, Alex. "Autocoding Class." https://github.com/ameasure/autocoding-class
(Census) Moscardi, Christian et al. Using NLP to improve the Commodity Flow Survey
(Census) Cuffe, John et al. Using Public Data to Generate Industrial Classification Codes and slides

Outside gov

Skinner, Michael (2018). Product categorization with LSTMs and balanced pooling views.
Also check the other "data challenge papers" here: https://sigir-ecom.github.io/accepted-papers.html
Ding, Liya et al. (2015). Auto-Categorization of HS Code Using Background Net Approach.

General text processing + classification

Banks, Duren et al. Arrest-Related Deaths Program Redesign Study [by mining news coverage]: https://www.bjs.gov/content/pub/pdf/ardprs1516pf.pdf

Amazon Mechanical Turk / Active Learning for better training data

Within Gov

Pierce, Cynthia et al. (2013). Crowd Sourcing data through Amazon Mechanical Turk.
Chew, Rob et al. SMART: a platform for labelling data. https://rtiinternational.github.io/SMART/ (arXiv paper as well).

Outside Gov

Settles, Burr (2010). Active Learning.

MTurk for Survey Pretesting

(BLS) Yu, Erica et al., 2015: https://www.bls.gov/osmr/pdf/st150260.pdf
(NCI), Fowler, Stephanie et al., 2015: https://s3.amazonaws.com/sitesusa/wp-content/uploads/sites/242/2016/03/C2_Fowler_2015FCSM.pdf
(NCSES), Morrison, Rebecca et al., 2017:- https://www.census.gov/fedcasic/fc2018/ppt/5AMorrison.pdf
(DOE), Greenblatt, Jeffery et al., 2013:- https://www.osti.gov/biblio/1171618

Alternative Data Sources + Web Scraping

(ORNL) Wang, Chieh (Ross) et al. : Web Scraping rail photos to track crude oil shipments. http://onlinepubs.trb.org/onlinepubs/Conferences/2019/FreightData/CrudebyRailRoutesWang.pdf
(Census) Dumbacher, Brian et al. : Scraping Assisted By Learning. https://www.census.gov/content/dam/Census/newsroom/press-kits/2018/jsm/jsm-presentation-web-scraping.pdf

Network Analysis

Chew, Rob et al. : Assessing Target Audiences of Digital Public Health Campaigns. https://link.springer.com/chapter/10.1007%2F978-3-319-93372-6_32 (if curious, please feel free to reach out for more info!)

ML fairness/algorithmic bias (not govt specific):

UK House of Commons Science and Technology Committee: “Algorithms in Decision Making”
ACM Conference on Fairness, Accountability, and Transparency (“FATML”)
Friedler et al. (2016). “On the (im-)possibility of fairness”
Corbett-Davies et al. (2017). “Algorithmic decision making and the cost of fairness”
IMPACTnet: aicommons.com (launching in Jan. 2019)

Conferences of Interest

NVIDIA Conference https://www.nvidia.com/en-us/gtc-dc/ November 2019 in Washington, DC (free registration for feds)
Systems@Scale Conference, https://atscaleconference.com/ (Next: June 6, 2019 in San Jose)
O'Reilly Artificail Intelligence Conference Series, https://www.oreilly.com/conferences/ (several times in year in different cities)
Stata Data Conference Series (several times a year in different cities)
PyData Confernce
Strangeloop, September 2019, St. Louis, Missouri
FedCASIC
Federal Statistical Meetings

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Papers		Papers
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML-NLP-Resource-List

Resource List

Computer Vision + Visual Census

Free-text to automatically assign codes (aka autocoding)

Within gov

Outside gov

General text processing + classification

Amazon Mechanical Turk / Active Learning for better training data

Within Gov

Outside Gov

MTurk for Survey Pretesting

Alternative Data Sources + Web Scraping

Network Analysis

ML fairness/algorithmic bias (not govt specific):

Conferences of Interest

About

Releases

Packages

Contributors 3

XDgov/ML-NLP-Resource-List

Folders and files

Latest commit

History

Repository files navigation

ML-NLP-Resource-List

Resource List

Computer Vision + Visual Census

Free-text to automatically assign codes (aka autocoding)

Within gov

Outside gov

General text processing + classification

Amazon Mechanical Turk / Active Learning for better training data

Within Gov

Outside Gov

MTurk for Survey Pretesting

Alternative Data Sources + Web Scraping

Network Analysis

ML fairness/algorithmic bias (not govt specific):

Conferences of Interest

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages