Quantifying and ranking user engagement with clickbait articles using NLP-created feature

In this highly selective, global PhD student competition, I individually tackled a confidential problem statement and documented my findings in a detailed report. Note that the main findings are in the report itself and the code is considered only as supplementary material.

For more details about the competition, check out these links:

Research Overview

This research delved into the textual characteristics of clickbait, focusing on how they impact user engagement. Utilizing Natural Language Processing (NLP) techniques, I analyzed sentiments, emotions, and topics present in clickbait articles. My analysis involved a statistical evaluation and ranking of these factors in terms of their effect on user interaction, supplemented by the development of two null models to validate the reliability of this ranking. My methodological approach is encapsulated in the Clickbait Defender product concept:

Technical Overview

My technical work on this project is divided into four main parts:

Google Analytics Analysis: Leveraging Google Analytics, I extracted and analyzed user engagement metrics. This involved studying user behavior patterns, click-through rates, and other relevant metrics to understand how users interact with clickbait content. The notebook google_analytics_analysis.ipynb details this process.
Data Cleaning, Processing and Exploratory Data Analysis: I refined the dataset used in the original study, focusing on cleaning, categorizing, and preparing the data for deeper analysis. The notebook data_cleaning_processing_eda.ipynb contains the entire process.
NLP Classification: Here, I developed algorithms for classifying the text of clickbait articles. This part involves sentiment analysis, emotion detection, and topic categorization, as seen in nlp_text_classyfing_algorithms.ipynb.
Statistical Rank Analysis, Null Models and Insights: This section involves applying statistical models to the processed data to glean insights into user engagement. The Jupyter notebook text_analysis_sentiment_emotion_topic.ipynb outlines this analysis.

Data

I cannot provide the processed datasets that we have obtained for the competition, but I provide its source, The Upworthy Research Archive: https://upworthy.natematias.com/

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.ipynb_checkpoints		.ipynb_checkpoints
README.md		README.md
data_cleaning_processing_eda.ipynb		data_cleaning_processing_eda.ipynb
datathon_report.pdf		datathon_report.pdf
google_analytics_analysis.ipynb		google_analytics_analysis.ipynb
nlp_text_classyfing_algorithms.ipynb		nlp_text_classyfing_algorithms.ipynb
text_analysis_sentiment_emotion_topic.ipynb		text_analysis_sentiment_emotion_topic.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quantifying and ranking user engagement with clickbait articles using NLP-created feature

Research Overview

Technical Overview

Data

About

Releases

Packages

Languages

lukablagoje/citadel_correlation_one_global_phd_datathon_2023

Folders and files

Latest commit

History

Repository files navigation

Quantifying and ranking user engagement with clickbait articles using NLP-created feature

Research Overview

Technical Overview

Data

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages