In this highly selective, global PhD student competition, I individually tackled a confidential problem statement and documented my findings in a detailed report. Note that the main findings are in the report itself and the code is considered only as supplementary material.
For more details about the competition, check out these links:
This research delved into the textual characteristics of clickbait, focusing on how they impact user engagement. Utilizing Natural Language Processing (NLP) techniques, I analyzed sentiments, emotions, and topics present in clickbait articles. My analysis involved a statistical evaluation and ranking of these factors in terms of their effect on user interaction, supplemented by the development of two null models to validate the reliability of this ranking. My methodological approach is encapsulated in the Clickbait Defender product concept:
My technical work on this project is divided into four main parts:
-
Google Analytics Analysis: Leveraging Google Analytics, I extracted and analyzed user engagement metrics. This involved studying user behavior patterns, click-through rates, and other relevant metrics to understand how users interact with clickbait content. The notebook
google_analytics_analysis.ipynb
details this process. -
Data Cleaning, Processing and Exploratory Data Analysis: I refined the dataset used in the original study, focusing on cleaning, categorizing, and preparing the data for deeper analysis. The notebook
data_cleaning_processing_eda.ipynb
contains the entire process. -
NLP Classification: Here, I developed algorithms for classifying the text of clickbait articles. This part involves sentiment analysis, emotion detection, and topic categorization, as seen in
nlp_text_classyfing_algorithms.ipynb
. -
Statistical Rank Analysis, Null Models and Insights: This section involves applying statistical models to the processed data to glean insights into user engagement. The Jupyter notebook
text_analysis_sentiment_emotion_topic.ipynb
outlines this analysis.
I cannot provide the processed datasets that we have obtained for the competition, but I provide its source, The Upworthy Research Archive: https://upworthy.natematias.com/