You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The purpose of this issue is to create a feature engineering script that may be run repeatedly on the basic business permit dataset, as new entries or files (by year) are added.
This issue was originally posted in the dc_doh_hackathon repository ,which can be found here: issue_11
Start with the Basic Business License data in the /Data Sets/Basic Business Licenses/ folder in Dropbox.
Write a script that uses this data to produce a feature data table for
The number of new business licenses issued in the last 4 weeks; and
The number of business licenses in effect in each week.
You can find the data format and examples on the Feature Dataset Format tab in this document
Basic business licenses can have one ore more categories, found in the LICENSECATEGORY column of the source data.
Input:
CSV files with data for each given year
Output:
A CSV file with
-1 row for each feature id, construction permit type and subtype, and each week, year, and census block
-The dataset should include the following columns:
feature_id: The ID for the feature, in this case, "business_licenses_issued_last_4_weeks" or "business_licenses_in_effect" feature_type: Business license category, found in the LICENSECATEGORY column of the source data. feature_subtype: Left blank year: The ISO-8601 year of the feature value week: The ISO-8601 week number of the feature value census_block_2010: The 2010 Census Block of the feature value value: The value of the feature, i.e. the number of business licenses of the specified type either new in the previous 4 weeks or active during the week and year in question in the given census block.
The final script must be able to be run from the command line taking three arguments:
A folder with the basic business license data files (the script should concatenate and merge the files in the directory as appropriate)
The shapefile for census blocks
The output CSV filename
Please also provide a README.md that describes the script and how to run it.
You can model the solution for the command line modifications after the files here or here
Place all of your files in the codefordc/the-rat-hack repository under a new scripts/feature_engineering/extract_business_license_features/ folder
The text was updated successfully, but these errors were encountered:
The purpose of this issue is to create a feature engineering script that may be run repeatedly on the basic business permit dataset, as new entries or files (by year) are added.
This issue was originally posted in the dc_doh_hackathon repository ,which can be found here:
issue_11
Start with the Basic Business License data in the
/Data Sets/Basic Business Licenses/
folder in Dropbox.Write a script that uses this data to produce a feature data table for
You can find the data format and examples on the
Feature Dataset Format
tab in this documentBasic business licenses can have one ore more categories, found in the
LICENSECATEGORY
column of the source data.Input:
CSV files with data for each given year
Output:
A CSV file with
-1 row for each feature id, construction permit type and subtype, and each week, year, and census block
-The dataset should include the following columns:
feature_id
: The ID for the feature, in this case,"business_licenses_issued_last_4_weeks"
or"business_licenses_in_effect"
feature_type
: Business license category, found in theLICENSECATEGORY
column of the source data.feature_subtype
: Left blankyear
: The ISO-8601 year of the feature valueweek
: The ISO-8601 week number of the feature valuecensus_block_2010
: The 2010 Census Block of the feature valuevalue
: The value of the feature, i.e. the number of business licenses of the specified type either new in the previous 4 weeks or active during the week and year in question in the given census block.The final script must be able to be run from the command line taking three arguments:
Please also provide a
README.md
that describes the script and how to run it.You can model the solution for the command line modifications after the files here or
here
Place all of your files in the codefordc/the-rat-hack repository under a new
scripts/feature_engineering/extract_business_license_features/
folderThe text was updated successfully, but these errors were encountered: