Skip to content

XDgov/ML-NLP-Resource-List

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 

Repository files navigation

ML-NLP-Resource-List

The purpose of this repo is to provide information / links to people, papers, projects, and code involving using novel data methods, ML, NLP, etc. in government.

This is in very early stages! Please reach out to [email protected] OR [email protected] -- or, just go ahead and submit a Pull Request -- if you've got anything that you'd like to be added to this list.

Resource List

Computer Vision + Visual Census

  1. Gebru, T., Krause, J., Wang, Y., Chen, D., Deng, J., Aiden, E. L., & Fei-Fei, L. (2017). Using deep learning and Google Street View to estimate the demographic makeup of neighborhoods across the United States. Proceedings of the National Academy of Sciences, 114(50), 13108-13113.
  2. Gebru, T., Krause, J., Wang, Y., Chen, D., Deng, J., Aiden, E. L., & Fei-Fei, L. (2017). Using deep learning and google street view to estimate the demographic makeup of the us. arXiv preprint arXiv:1702.06683. (arxiv version of below paper)
  3. Gebru, T., Krause, J., Wang, Y., Chen, D., Deng, J., & Fei-Fei, L. (2015). Visual census: Using cars to study people and society. Bigvision.
  4. Chew, Rob et al. Toward Model-Generated Household Listing in Low- and Middle-Income Countries Using Deep Learning. https://www.mdpi.com/2220-9964/7/11/448
  5. Chew, Rob et al. Residential scene classification for gridded population sampling in developing countries using deep convolutional neural networks on satellite imagery https://ij-healthgeographics.biomedcentral.com/track/pdf/10.1186/s12942-018-0132-1

Free-text to automatically assign codes (aka autocoding)

Within gov

  1. (BLS) Measure, Alex (2017). Deep neural networks for worker injury autocoding.
  2. (BLS) Measure, Alex (2014). Automatic coding of worker injury narratives. Basis for 1.
  3. (Census) Dumbacher, Brian; Hanna, Demetria (2017). Passive Data Collection, System-to-system data collection, machine learning to improve economic surveys.
  4. (BLS) Measure, Alex. "Autocoding Class." https://github.com/ameasure/autocoding-class
  5. (Census) Moscardi, Christian et al. Using NLP to improve the Commodity Flow Survey
  6. (Census) Cuffe, John et al. Using Public Data to Generate Industrial Classification Codes and slides

Outside gov

  1. Skinner, Michael (2018). Product categorization with LSTMs and balanced pooling views.
  2. Also check the other "data challenge papers" here: https://sigir-ecom.github.io/accepted-papers.html
  3. Ding, Liya et al. (2015). Auto-Categorization of HS Code Using Background Net Approach.

General text processing + classification

  1. Banks, Duren et al. Arrest-Related Deaths Program Redesign Study [by mining news coverage]: https://www.bjs.gov/content/pub/pdf/ardprs1516pf.pdf

Amazon Mechanical Turk / Active Learning for better training data

Within Gov

  1. Pierce, Cynthia et al. (2013). Crowd Sourcing data through Amazon Mechanical Turk.
  2. Chew, Rob et al. SMART: a platform for labelling data. https://rtiinternational.github.io/SMART/ (arXiv paper as well).

Outside Gov

  1. Settles, Burr (2010). Active Learning.

MTurk for Survey Pretesting

  1. (BLS) Yu, Erica et al., 2015: https://www.bls.gov/osmr/pdf/st150260.pdf
  2. (NCI), Fowler, Stephanie et al., 2015: https://s3.amazonaws.com/sitesusa/wp-content/uploads/sites/242/2016/03/C2_Fowler_2015FCSM.pdf
  3. (NCSES), Morrison, Rebecca et al., 2017:- https://www.census.gov/fedcasic/fc2018/ppt/5AMorrison.pdf
  4. (DOE), Greenblatt, Jeffery et al., 2013:- https://www.osti.gov/biblio/1171618

Alternative Data Sources + Web Scraping

  1. (ORNL) Wang, Chieh (Ross) et al. : Web Scraping rail photos to track crude oil shipments. http://onlinepubs.trb.org/onlinepubs/Conferences/2019/FreightData/CrudebyRailRoutesWang.pdf
  2. (Census) Dumbacher, Brian et al. : Scraping Assisted By Learning. https://www.census.gov/content/dam/Census/newsroom/press-kits/2018/jsm/jsm-presentation-web-scraping.pdf

Network Analysis

  1. Chew, Rob et al. : Assessing Target Audiences of Digital Public Health Campaigns. https://link.springer.com/chapter/10.1007%2F978-3-319-93372-6_32 (if curious, please feel free to reach out for more info!)

ML fairness/algorithmic bias (not govt specific):

  1. UK House of Commons Science and Technology Committee: “Algorithms in Decision Making”
  2. ACM Conference on Fairness, Accountability, and Transparency (“FATML”)
  3. Friedler et al. (2016). “On the (im-)possibility of fairness”
  4. Corbett-Davies et al. (2017). “Algorithmic decision making and the cost of fairness”
  5. IMPACTnet: aicommons.com (launching in Jan. 2019)

Conferences of Interest

  1. NVIDIA Conference https://www.nvidia.com/en-us/gtc-dc/ November 2019 in Washington, DC (free registration for feds)
  2. Systems@Scale Conference, https://atscaleconference.com/ (Next: June 6, 2019 in San Jose)
  3. O'Reilly Artificail Intelligence Conference Series, https://www.oreilly.com/conferences/ (several times in year in different cities)
  4. Stata Data Conference Series (several times a year in different cities)
  5. PyData Confernce
  6. Strangeloop, September 2019, St. Louis, Missouri
  7. FedCASIC
  8. Federal Statistical Meetings

About

people, projects, code, research papers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •