Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract Features from Basic Business License Data #18

Open
jasonasher opened this issue Apr 1, 2018 · 0 comments
Open

Extract Features from Basic Business License Data #18

jasonasher opened this issue Apr 1, 2018 · 0 comments
Assignees

Comments

@jasonasher
Copy link
Contributor

The purpose of this issue is to create a feature engineering script that may be run repeatedly on the basic business permit dataset, as new entries or files (by year) are added.

This issue was originally posted in the dc_doh_hackathon repository ,which can be found here:
issue_11

Start with the Basic Business License data in the /Data Sets/Basic Business Licenses/ folder in Dropbox.

Write a script that uses this data to produce a feature data table for

  1. The number of new business licenses issued in the last 4 weeks; and
  2. The number of business licenses in effect in each week.

You can find the data format and examples on the Feature Dataset Format tab in this document

Basic business licenses can have one ore more categories, found in the LICENSECATEGORY column of the source data.

Input:
CSV files with data for each given year

Output:
A CSV file with

-1 row for each feature id, construction permit type and subtype, and each week, year, and census block
-The dataset should include the following columns:

feature_id: The ID for the feature, in this case, "business_licenses_issued_last_4_weeks" or "business_licenses_in_effect"
feature_type: Business license category, found in the LICENSECATEGORY column of the source data.
feature_subtype: Left blank
year: The ISO-8601 year of the feature value
week: The ISO-8601 week number of the feature value
census_block_2010: The 2010 Census Block of the feature value
value: The value of the feature, i.e. the number of business licenses of the specified type either new in the previous 4 weeks or active during the week and year in question in the given census block.

The final script must be able to be run from the command line taking three arguments:

  1. A folder with the basic business license data files (the script should concatenate and merge the files in the directory as appropriate)
  2. The shapefile for census blocks
  3. The output CSV filename

Please also provide a README.md that describes the script and how to run it.

You can model the solution for the command line modifications after the files here or
here

Place all of your files in the codefordc/the-rat-hack repository under a new scripts/feature_engineering/extract_business_license_features/ folder

@jasonasher jasonasher self-assigned this Apr 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant