- Go to https://ufc-predictions.herokuapp.com/
- Select weight-class of the bout
- Select Number of 5 minute rounds the fight is scheduled for
- Select if the fight is a title fight or not
- Select the fighter names
- Click predict
- Scraped event and fight stats, data from 1993 to present date using Beautiful Soup.
- Cleaned, preprocessed and feature engineered the data to each row being a historical representation of both fighters and their individual fights/fight stats.
- Dataset uploaded and now available on Kaggle at: https://www.kaggle.com/rajeevw/ufcdata
- Oversampled minority class, created and tested predictive models using
RandomForestClassifier
andXGBoostClassifier
- Created a web app using dash and deployed it with docker on heroku.
- Accuracy (valid): 0.7218
- AUC Score (valid): 0.7763
-
0
corresponds to Blue: Fighter in the blue corner -
1
corresponds to Red: Fighter in the red corner -
Generally the underdog is in the blue corner and favourite fighter is in the red corner.
-
The model is therefore (understandably) having a hard time figuring out when the underdog wins. This is because the sport is very volatile and there can be anything from an injury, psychological loss/trauma to just pure luck that determine the winner.
This is a list of every UFC fight in the history of the organisation. Every row contains information about both fighters, fight details and the winner. The data was scraped from ufcstats website. After fightmetric ceased to exist, this came into picture. I saw that there was a lot of information on the website about every fight and every event and there were no existing ways of capturing all this. I used beautifulsoup to scrape the data and pandas to process it. It was a long and arduous process, please forgive any mistakes. I have provided the raw files incase anybody wants to process it differently. This is my first time creating a dataset, any suggestions and corrections are welcome!
Each row is a compilation of both fighter stats. Fighters are represented by 'red' and 'blue' (for red and blue corner). So for instance, red fighter has the complied average stats of all the fights except the current one. The stats include damage done by the red fighter on the opponent and the damage done by the opponent on the fighter (represented by 'opp' in the columns) in all the fights this particular red fighter has had, except this one as it has not occured yet (in the data). Same information exists for blue fighter. The target variable is 'Winner' which is the only column that tells you what happened. Here are some column definitions:
R_
andB_
prefix signifies red and blue corner fighter stats respectively_opp_
containing columns is the average of damage done by the opponent on the fighterKD
is number of knockdownsSIG_STR
is no. of significant strikes 'landed of attempted'SIG_STR_pct
is significant strikes percentageTOTAL_STR
is total strikes 'landed of attempted'TD
is no. of takedownsTD_pct
is takedown percentagesSUB_ATT
is no. of submission attemptsPASS
is no. times the guard was passed?REV
are the number of reversalsHEAD
is no. of significant strinks to the head 'landed of attempted'BODY
is no. of significant strikes to the body 'landed of attempted'CLINCH
is no. of significant strikes in the clinch 'landed of attempted'GROUND
is no. of significant strikes on the ground 'landed of attempted'win_by
is method of winlast_round
is last round of the fight (ex. if it was a KO in 1st, then this will be 1)last_round_time
is when the fight ended in the last roundFormat
is the format of the fight (3 rounds, 5 rounds etc.)Referee
is the name of the Refdate
is the date of the fightlocation
is the location in which the event took placeFight_type
is which weight class and whether it's a title bout or notWinner
is the winner of the fightStance
is the stance of the fighter (orthodox, southpaw, etc.)Height_cms
is the height in centimeterReach_cms
is the reach of the fighter (arm span) in centimeterWeight_lbs
is the weight of the fighter in pounds (lbs)age
is the age of the fightertitle_bout
Boolean value of whether it is title fight or notweight_class
is which weight class the fight is in (Bantamweight, heavyweight, Women's flyweight, etc.)no_of_rounds
is the number of rounds the fight was scheduled forcurrent_lose_streak
is the count of current concurrent losses of the fightercurrent_win_streak
is the count of current concurrent wins of the fighterdraw
is the number of draws in the fighter's ufc careerwins
is the number of wins in the fighter's ufc careerlosses
is the number of losses in the fighter's ufc careertotal_rounds_fought
is the average of total rounds fought by the fightertotal_time_fought(seconds)
is the count of total time spent fighting in secondstotal_title_bouts
is the total number of title bouts taken part in by the fighterwin_by_Decision_Majority
is the number of wins by majority judges decision in the fighter's ufc careerwin_by_Decision_Split
is the number of wins by split judges decision in the fighter's ufc careerwin_by_Decision_Unanimous
is the number of wins by unanimous judges decision in the fighter's ufc careerwin_by_KO/TKO
is the number of wins by knockout in the fighter's ufc careerwin_by_Submission
is the number of wins by submission in the fighter's ufc careerwin_by_TKO_Doctor_Stoppage
is the number of wins by doctor stoppage in the fighter's ufc career
- Clear out the data folder and simply run
scrape_all_data.py
(Note: This will scrape everything from the beginning and hence will take a long time.) - Run
EDA_and_preprocessing-1.ipynb
and after thatEDA_and_preprocessing-2b.ipynb
(EDA_and_preprocessing-2a.ipynb
is an alternative where the rows with missing stat values are removed and not treated.)
- Try weighted moving average instead of simple mean to give more importance to stats of recent fights per fighter
-
Inspiration: https://github.com/Hitkul/UFC_Fight_Prediction Provided ideas on how to store per fight data. Unfortunately, the entire UFC website and fightmetric website changed so couldn't reuse any of the code.
-
Print Progress Bar: https://gist.github.com/aubricus/f91fb55dc6ba5557fbab06119420dd6a To display progress of how much download is complete in the terminal
-
Web app: https://github.com/jasonchanhku/ Ideas on how to use dash and google search api to show fighter images