Skip to content

Commit

Permalink
created early lambda function for running XGBboost
Browse files Browse the repository at this point in the history
  • Loading branch information
mfrost433 committed May 16, 2019
1 parent f403113 commit 3deab82
Show file tree
Hide file tree
Showing 3 changed files with 31 additions and 6 deletions.
7 changes: 7 additions & 0 deletions lambda/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"name": "ASHA-xgboost-evaluation",
"version": "1.0.0",
"description": "Trains and evaluates an XGBoost model for a parallel ASHA implementation",
"author": "",
"license": "MIT"
}
15 changes: 15 additions & 0 deletions lambda/serverless.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
service: ASHA-xgboost-evaluation

frameworkVersion: ">=1.2.0 <2.0.0"

provider:
name: aws
runtime: python3.7 # or python3.7, supported as of November 2018

functions:
run:
handler: run_xgboost.run
events:
- http:
path: run
method: post
15 changes: 9 additions & 6 deletions xgboost_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,12 @@
from sklearn.model_selection import train_test_split

# read csv into dataframe
df = pd.read_csv("./weatherAUS.csv")
df = pd.read_csv("./weatherAUS.csv", parse_dates=['Date'])

print('Size of weather data frame is :', df.shape)

# drop columns with useless data (too many null values)
df = df.drop(columns=['Sunshine','Evaporation','Cloud3pm','Cloud9am','Location','RISK_MM','Date'],axis=1)
df = df.drop(columns=['Sunshine', 'Evaporation', 'Cloud3pm', 'Cloud9am', 'Location', 'RISK_MM', 'Date'], axis=1)

# get rid of nulls
df = df.dropna(how='any')
Expand All @@ -37,22 +37,25 @@
scaler.fit(df)
df = pd.DataFrame(scaler.transform(df), index=df.index, columns=df.columns)


df.to_csv("preprocessed.csv")
# separate data from target
X = df.loc[:,df.columns!='RainTomorrow']
y = df[['RainTomorrow']]

# k = 5 has 84.25%, k=5 has 84.01%, 85.19% on all data ( bad cols removed), with date: 85.17
# select the k most useful columns
selector = SelectKBest(chi2, k=3)
selector.fit(X, y)
X_new = selector.transform(X)
#selector = SelectKBest(chi2, k=10)
#selector.fit(X, y)
#X_new = selector.transform(X)

# fit and evaluate using k_fold

model = XGBClassifier()
kfold = KFold(n_splits=5, random_state=7)
print("Cross eval starting")

results = cross_val_score(model, X_new, y, cv=kfold, verbose=3)
results = cross_val_score(model, X, y, cv=kfold, verbose=3)
print("Accuracy: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))


Expand Down

0 comments on commit 3deab82

Please sign in to comment.