This project analyzes Goodreads publications over several years. Using Tableau for visualization and data analytics, we will explore factors influencing popularities of publications, author and publishers of publications, and seasonal and historical trends.
The primary objectives of this project are:
- Analyze Book Popularity Factors:
- Identify key factors influencing book popularity, such as book type (series/novel) and length.
- Assess the impact of author popularity on book ratings and reviews.
- Compare Publisher Outputs:
- Evaluate the publication output of leading authors and their respective publishers.
- Analyze the market share of top publishers among the most prolific authors.
- Investigate Seasonal Trends:
- Examine the timing of book releases throughout the year and identify peak publication months.
- Analyze historical trends in publication rates across different years.
- Generate Strategic Insights:
- Develop actionable recommendations for publishers to enhance their release strategies based on identified trends and patterns.
Data
- Contains the original books.csv and the new cleaned / processed datasets duplicated_author_goodsread.csv and singular_author_goodsread.csv
Preprocessing (Goodreads.ipynb)
- Contains a Python notebook to clean, process EDA and feature engineer for analysis and visualization
Visualization (Goodreads.twb)
- Contains a Tableau workbook file featuring visualizations, a dashboard, and a data story that encapsulates the key insights from the analysis
The dataset was cleaned to address differences in spelling and suffixes in Publisher column to make sure the same variation of Publisher was present for analysis. The publication_year column was split into Month and Year for seasonal trend analysis. The Author column was also cleaned so that there is only one singular author per book title. Two columns were created through feature engineer with by categorizing books according to their num_pages (short, medium, long) and whether the book was a novel, part of a series or a boxed-set collection.
Visit the Tableau Dashboard to explore the visualizations and insights.
Analysis: I explored the different factors such as book length, book type on the impact of popularity of authors and books.
Visualization: Dashboard made of visualizations including: Rating of Book Types, Book Length Popularity (Pie Chart), Author Popularity (Bubble Chart) and Book Popularity (Bar Chart).
Insight: Series and medium-length books are the most popular genres, with three out of the top five books falling into these categories. The most popular author, J.K. Rowling, is known for her series, Harry Potter. (Popular is defined by number of ratings and the average rating)
Analysis: I compared leading authors and publishers, and the publishers of the leading authors and if that differed from the overall leading publishers.
Visualization: Top Authors (Bar Chart), Top Publishers (Pie Chart), Top Publishers of Top Authors (Chart)
Insight: Stephen King is the most published author, and Vintage is the leading publisher overall. However, within the top 10 authors, Tor Books contributes 3.54% (20 books) of their publications. This suggests that while Vintage is a top publisher, it may not concentrate on highly prolific authors. This illustrates the different strategies and focuses of various publishers.
3. How do seasonal influences and historical trends in publishing impact the timing of book releases?
Analysis: I evaluated the trends in book publications release timing across the years and months.
Visualization: Top Months for Publications (Bar Chart), Top Years for Publications (Bubble Chart)
Insight: September ("National Literacy Month") is the peak month for book releases, while December (busy holiday season) sees the fewest. The years 2006, 2005, and the early 2000s marked the highest publication rates, perhaps driven by the popularity of series like Harry Potter and Twilight, which may inspire publishers to release more similar titles.
Based on the insights from this analysis, as a publisher it is valuable to consider:
1. Leverage Popular Genres: Invest in developing series and medium-length books, as they are currently the most popular among readers.
2. Adapt Publication Strategies: Consider timing releases around September to maximize visibility and engagement, while being mindful of decreased interest in December.
3. Support Prolific Authors: Actively seek out and support prolific authors to increase their publication output, as their established fan bases can drive sales and enhance market presence.
4. Cultivate Author Relationships: Foster long-term partnerships with prolific authors to encourage consistent output and collaboration on new projects, which can help maintain reader interest and loyalty.
The dataset used in this analysis was scraped from Goodreads via Goodreads API comprising of 11,000 rows.
- Book Data: title, authors, isbn13, num_pages,
- Review Data: average_rating, ratings_amount, text_reviews_count
- Publication Data: publisher, publication_date, language_code
- Sentiment Data: Sentiment scores derived from review text, common terms used
This project uses Goodread data sourced from Kaggle.