The UFC (Ultimate Fighting Championship) is a global mixed martial arts (MMA) organization, hosting weekly competitive events that showcase fighters from a range of weight classes and backgrounds.
This repository contains code and resources relating to the UFC. This includes one of the most comprehensive public UFC datasets available, encompassing official match outcomes and history compiled from the UFC, fighter statistics, as well as historic betting odds.
The purpose of compiling these datasets is for personal interest for data analysis and to test building a predictive model for match outcome on, as well as being publicly available for external interest.
/data/complete_ufc_data.csv
captures a comprehensive UFC dataset uniquely combining 30 years of match history (from 1994), individual figher statistics and 9 years of historic betting odds (from Nov 2014).
Data dictionary
Column | Sample values | Description | Source |
---|---|---|---|
event_date |
2023-09-16 |
Date of UFC event | Scraped from UFC match history |
event_name |
UFC Fight Night: Grasso vs. Shevchenko 2 |
Name of UFC event | Scraped from UFC match history |
weight_class |
Women's Flyweight |
Weight class of UFC match | Scraped from UFC match history |
fighter1 , fighter2 |
Alexa Grasso , Valentina Shevchenko |
Fighter names; note that fighter1 should usually be the winner of the match, as this is how the names are ordered in the official match history |
Scraped from UFC match history |
favourite , underdog |
Valentina Shevchenko , Alexa Grasso , NaN |
Fighter names from betting favourite and betting underdogs. Note that betting odds do not exist for older years, and that where odds do exist, there will be missing values where fighter names on the betting site and official UFC match history did not match |
Scraped from historic odds on betmma.tips |
favourite_odds , underdog_odds |
1.67 , 2.88 , NaN |
Betting odds (decimal) | Scraped from historic odds on betmma.tips |
betting_outcome |
favourite , underdog , NaN |
Whether the favourite or the underdog was the winner of the match. Provided in this format for easier querying on odds | Scraped from historic odds on betmma.tips |
outcome |
fighter , fighter2 , Draw |
Match outcome - will usually be fighter1 as this is how names are ordered in the official match history |
Derived from UFC match history |
method |
S-DEC , U-DEC , KO/TKO Punches |
Method of victory | Scraped from UFC match history |
round |
5 |
Round of victory | Scraped from UFC match history |
fighter1_* e.g., fighter1_height , fighter1_dob , fighter1_reach , fighter1_sig_strikes_landed_pm , fighter1_takedown_avg_per15m |
Fighter attributes for fighter1 at time data was scraped |
Derived from UFC fighter statistics | |
fighter2_* | Fighter attributes for fighter2 at time data was scraped |
Derived from UFC fighter statistics | |
events_extract_ts , odds_extract_ts , fighter_extract_ts |
2023-09-21 02:02:55.178363 |
Timestamp when dataset was scraped |
The raw datasets (scraped from the official UFC website and betmma.tips are also available under /data/
.
🏃 Code:
- To run web scraper and update match results/fighter stats/betting odds:
Note that the following arguments are permitted:
python -m ufc.scraper
--events
,--fighters
,--odds
, to scrape individually/multiple, rather than all. The default is to scrape all.
- To run pre-processing, data cleaning, on scraped data:
python -m ufc.preprocessing
✅ Features completed:
- Scrape UFC data - fighter stats, and match results
- Scrape historic betting odds from betmma.tips
- Pre-processing to clean data, reformat/restructure, data checks
🚧 Feature backlog
- Update by appending instead of replacing all, for more efficient refreshes - fetch only new events, but update all fighter stats
Some interesting insights and visualisations are shared here:
🚧 Development of ML model to test how well match outcome can be predicted based on fighter stats is WiP:
- Initial PoCs (GBM, logistic regression) attempting to predict match outcome from fighter attributes (had not yet scraped betting odds) saw accuracy of ~65%
- This is comparable to a betting strategy of always picking the favourite (65%), which suggests that betting market sentiment may capture most information the model is currently trained on.
- Significant opportunity still to iterate with further testing of features:
- Fight win streak, finish rate (knockouts, submissions)
- Derived features - durability, tag as wrester/striker/grappler etc.
- Include if fighter is favourite (if have scraped odds)
- Note that MMA is a highly dynamic and unpredictable sport, frequently characterised by upsets, and that match outcome may not be consistently predictable
Dependency management: Poetry (more actively maintained) or pip (requirements.txt
exists but less frequently updated)
poetry install