kaggle-titanic

This is the python/scikit-learn code I wrote during my stab at the Kaggle titanic competition. There is code for several different algorithms, but the primary and highest performing one is the RandomForest implemented in randomforest2.py.

Requirements:

python (a 2.x release at least 2.6)
scikit-learn/NumPy/SciPy (http://scikit-learn.org/stable/install.html)
pandas (http://pandas.pydata.org/pandas-docs/stable/install.html)
matplotlib (http://matplotlib.org/faq/installing_faq.html)

Usage:
> python randomforest2.py

Key files:

loaddata.py: Contains all the feature engineering including options for generating different variable types, and performing PCA, clustering, and class balancing
randomforest2.py: The code that executes the pipeline
scorereport.py: Inspects and reports on the results of hyperparameter search
learningcurve.py: Includes code to generate a learning curve
roc_auc: Includes code to generate a ROC curve

Other files contain other algorithms that were used during experimentation and are in various stages of completeness. Only randomforest2 is 100% up to date

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data		data
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
genderclassmodel.py		genderclassmodel.py
gendermodel.py		gendermodel.py
learningcurve.py		learningcurve.py
loaddata.py		loaddata.py
myfirstforest.py		myfirstforest.py
naivebayes.py		naivebayes.py
naivebayes_gaussian.py		naivebayes_gaussian.py
randomforest2.py		randomforest2.py
roc_auc.py		roc_auc.py
scorereport.py		scorereport.py
sgdclassifier.py		sgdclassifier.py
svc.py		svc.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

kaggle-titanic

About

Releases

Packages

Languages

jiangjingyao/kaggle-titanic

Folders and files

Latest commit

History

Repository files navigation

kaggle-titanic

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages