Podcast Reviews - 2 million reviews on 100k podcasts

Overarching Investigation

What categories of podcasts are popular?
What makes a 5-star podcast? What makes a 1-star podcast?
How do reviewer tendencies vary and differ?

Technical Goals

Working with a SQLite Database
Using Python-based Data Analysis Packages such as Pandas
Data Visualization
Hypothesis Testing and Statistical Inference
Making a Dashboard Using Looker Studio

Description

This project delves into a dataset containing 2 million reviews on 100k podcasts, available on Kaggle (Podcast Reviews Dataset). The analysis encompasses statistical examination of podcast ratings across various categories, employing both traditional statistical methods and resampling techniques like bootstrap and permutation procedures. Assumptions such as normality and independence of samples are rigorously assessed.

Additional Analysis

Analyze podcast ratings statistically across different categories using traditional statistics as well as bootstrap and permutation procedures.
Assess statistical assumptions such as normality and independence of samples.
Analyze the length of 1-star vs 5-star comedy podcasts.
Compare paired non-parametric samples of podcast ratings on two podcasts.

Conclusions

5-star reviews are the most common for all categories.
Most podcasts are rated similarly (5-stars).
Ratings are not normally distributed.
Negatively reviewed podcast categories (e.g., News-Government) receive more 3-star reviews than positively reviewed podcast categories (e.g., Business).
1-star reviews are slightly longer than 5-star reviews.
People who rate two podcasts do so similarly.

Dependencies

python>=3.10.13
- ipython
- jupyter
- matplotlib
- numpy
- pandas
- pyarrow
- scipy
- seaborn
- statsmodels

see requirements.txt for more details and versions used in development environment

Files Included

podcast_reviews.ipynb: Jupyter Notebook containing the analysis code
podcast_utils.py: Python script with utility functions
requirements.txt: File containing the project dependencies
LICENSE: MIT

Getting Started

Clone the repository
Create a virtual environment using the requirements.txt file provided
e.g.

python3 -m venv podcast_reviews/
# activate the venv and install all requirements provided 
source podcast_reviews/bin/activate
pip install -r requirements.txt

Open the Jupyter notebook file, podcast_reviews.ipynb, in your Jupyter environment and step through to see analysis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Podcast Reviews - 2 million reviews on 100k podcasts

Overarching Investigation

Technical Goals

Description

Additional Analysis

Conclusions

Dependencies

Files Included

Getting Started

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
utils		utils
LICENSE		LICENSE
README.md		README.md
podcast_reviews.ipynb		podcast_reviews.ipynb
requirements.txt		requirements.txt

License

migueldiazacevedo/podcast_reviews

Folders and files

Latest commit

History

Repository files navigation

Podcast Reviews - 2 million reviews on 100k podcasts

Overarching Investigation

Technical Goals

Description

Additional Analysis

Conclusions

Dependencies

Files Included

Getting Started

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages