This is a thesis project completed as part of the COMPGW99 - MSc Thesis module (MSc Web Science and Big Data Analytics) at University College London.
It proposes the application of a novel approach in graph theory, to identify coherent clusters of topics and researchers within Networks of Topics and Researchers constructed using current (2010 to 2016) and historical (1990 to 2000, 2000 to 2010) data collected from EPSRC. For more information, please refer to the main thesis report.
Sergiu Tripon, MSc Web Science and Big Data Analytics candidate
Dr. Shi Zhou, Senior lecturer at University College London
- Main thesis report
- Supplementary material
- Research Data Management Plan
- Repository contents
- Running the source code
- algorithms/ - Louvain community detection algorithm for NetworkX
- analysis/ - source code written for the analysis of the network, communities and sub-communities
- data/ - various .gephi, .graphml, .png, .tsv and .txt files related to the networks and the visualisation of the networks
- documents/ - main thesis report, supplementary material and research data management plan
- experiments/ - experiments carried out to trial different network analysis packages
- literature/ - research papers in PDF format representing the state-of-the-art of several different topics
- network-maker - source code written to collect the data and convert it into networks, data in the form of html files downloaded from the EPSRC Grants on the Web (GoW) service
- wiki - various .png, .psd, .pptx and .xlsx files used to add different media to the main thesis report and the GitHub wiki
- requirements.txt - pip .txt requirements file that consists of the project dependencies which can be installed by running
bash pip install -r requirements.txt
in a terminal window
Note: In order to run the source code, an virtual environment installation is required. The code is written in Python 3.5. The packages used in the project are listed in the requirements.txt file and can be install using pip.
Running the network analysis is achieved by running the analysis.py file with the desired parameters (-n requires network (topic or researcher), -i requires interpretation (grants, researchers or topics), -d requires data set (1990-2000, 2000-2010, 2010-2016)), following the steps below:
# activate virtual environment
$ source venv/bin/activate
# navigate to analysis source folder
$ cd msc-thesis-na-epsrc/analysis/src/
# analyse topic (grants as edges, 2010-2016)
$ python analysis.py -n topic -i grants -d 2010-2016
# analyse topic (grants as edges, 2000-2010)
$ python analysis.py -n topic -i grants -d 2000-2010
# analyse topic (grants as edges, 1990-2000)
$ python analysis.py -n topic -i grants -d 1990-2000
# analyse topic (researchers as edges, 2010-2016)
$ python analysis.py -n topic -i researchers -d 2010-2016
# analyse topic (researchers as edges, 2000-2010)
$ python analysis.py -n topic -i researchers -d 2000-2010
# analyse topic (researchers as edges, 1990-2000)
$ python analysis.py -n topic -i researchers -d 1990-2000
# analyse researcher (grants as edges, 2010-2016)
$ python analysis.py -n topic -i grants -d 2010-2016
# analyse researcher (researchers as edges, 2000-2010)
$ python analysis.py -n topic -i grants -d 2000-2010
# analyse researcher (researchers as edges, 1990-2000)
$ python analysis.py -n topic -i grants -d 1990-2000
# analyse researcher (topics as edges, 2010-2016)
$ python analysis.py -n topic -i topics -d 2010-2016
# analyse researcher (topics as edges, 2000-2010)
$ python analysis.py -n topic -i topics -d 2000-2010
# analyse researcher (topics as edges, 1990-2000)
$ python analysis.py -n topic -i topics -d 1990-2000