Skip to content

A spider chart with absolute feature values on the axis instead of relative ones (percentage or similar)

Notifications You must be signed in to change notification settings

Ringomed/Plotnine-contest

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Plotnine Contest 2024

Inspired by ggplot2, the plotnine library is also based on the concept of grammar of graphics, allowing for creation of graphs by stacking multiple layers on top of one another. This powerful concept lets us create essentially any visualization, as long as we know how to code it. I’ll be using it to construct an advanced version of a spider chart from scratch. The chart will present a comparison of Titanic passenger across the three passenger classes.

I'll be using R for data preparation and then plotnine for the plotting (the actual code is run within a Quarto notebook which is available in the repository).

The philosophy behind chart construction using the grammar of graphics approach.

Data preparation and tidying

library(titanic)
library(dplyr)
library(tidyr)
library(tibble)
library(purrr)
library(scales)
library(ggplot2)


# Prepare the Titanic data
Titanic <- titanic_train

Titanic_gr1 <-
  Titanic %>%
  select(Survived:Fare) %>%
  group_by(Pclass) %>%
  summarise(across(c(Age, Fare), mean, na.rm = TRUE))

Titanic_gr2 <-
  Titanic %>%
  select(Survived:Fare) %>%
  group_by(Pclass, Survived) %>%
  summarise(N = n()) %>%
  pivot_wider(names_from = Survived, values_from = N) %>%
  mutate("Survived (%)" = `1`/(`0` + `1`)) %>%
  select(1,4)

Titanic_gr3 <-
  Titanic %>%
  select(Survived:Fare) %>%
  group_by(Pclass, Sex) %>%
  summarise(N = n()) %>%
  pivot_wider(names_from = Sex, values_from = N)

Titanic_gr <- reduce(list(Titanic_gr1, 
                          Titanic_gr2, 
                          Titanic_gr3), left_join) %>%
  rename(Male = "male", Female = "female")

p_data <- Titanic_gr %>% 
  rename(group = "Pclass") %>%
  mutate(group = as.factor(case_when(group == 1 ~ "1st Class",
                                     group == 2 ~ "2nd Class",
                                     group == 3 ~ "3rd Class")),
         `Survived (%)` = 100*`Survived (%)`)

# Calculate the coordinates of polygon tips
circle_coords <- function(r, n_axis = ncol(p_data) - 1){
  fi <- seq(0, 2*pi, (1/n_axis)*2*pi) + pi/2
  x <- r*cos(fi)
  y <- r*sin(fi)
  
  tibble(x, y, r)
}

central_distance <- 0.2
axis_name_offset <- 0.2

circle_df <- map_df(seq(0, 1, 0.25) + central_distance, circle_coords)

# Calculate the coordinates for the axis lines
axis_coords <- function(n_axis){
  fi <- seq(0, (1 - 1/n_axis)*2*pi, (1/n_axis)*2*pi) + pi/2
  x1 <- central_distance*cos(fi)
  y1 <- central_distance*sin(fi)
  x2 <- (1 + central_distance)*cos(fi)
  y2 <- (1 + central_distance)*sin(fi)
  
  tibble(x = c(x1, x2), y = c(y1, y2), id = rep(1:n_axis, 2))
}

# Coordinates for the axis titles
text_data <- p_data %>%
  select(-group) %>%
  map_df(~ min(.) + (max(.) - min(.)) * seq(0, 1, 0.25)) %>%
  mutate(r = seq(0, 1, 0.25)) %>%
  pivot_longer(-r, names_to = "parameter", values_to = "value")

text_coords <- function(r, n_axis = ncol(p_data) - 1){
  fi <- seq(0, (1 - 1/n_axis)*2*pi, (1/n_axis)*2*pi) + pi/2 + 0.01*2*pi/r
  x <- r*cos(fi)
  y <- r*sin(fi)
  
  tibble(x, y, r = r - central_distance)
}

# Coordinates for the axis labels
labels_data <- map_df(seq(0, 1, 0.25) + central_distance, text_coords) %>%
  bind_cols(text_data %>% select(-r)) %>%
  group_by(parameter) %>%
  mutate(value = signif(value, 2) %>% as.character)


rescaled_coords <- function(r, n_axis){
  fi <- seq(0, 2*pi, (1/n_axis)*2*pi) + pi/2
  tibble(r, fi) %>% mutate(x = r*cos(fi), y = r*sin(fi)) %>% select(-fi)
}

# Coordinates for the car feature value points
rescaled_data <- p_data %>% 
  mutate(across(-group, rescale)) %>%
  mutate(copy = pull(., 2)) %>%
  pivot_longer(-group, names_to = "parameter", values_to = "value") %>%
  group_by(group) %>%
  mutate(coords = rescaled_coords(value + central_distance, ncol(p_data) - 1)) %>%
  unnest

Making the chart

Now for the Python and plotnine part! Let's first set up the libraries and fonts.

import pandas as pd
import numpy as np
from plotnine import *
from scipy.stats import zscore
import matplotlib
import matplotlib.pyplot as plt
from matplotlib import font_manager
import requests
import os
import zipfile

#I'll be using the Google's Roboto Condensed font for the plot
# Path to the directory containing the fonts
font_dir = "roboto_condensed_fonts"

# Register the font files
for font_file in os.listdir(font_dir):
    if font_file.endswith(".ttf"):
        font_path = os.path.join(font_dir, font_file)
        font_manager.fontManager.addfont(font_path)
        
# List of font family names in your directory
font_family = 'Roboto Condensed'

# Use the font in the plot
plt.rcParams['font.family'] = font_family
         

central_distance = 0.2
axis_name_offset = 0.2

We are finally ready to start making the plot. The layered approach calls for separate construction of different aspects of the graph. First we will create the chart outline.

step_1 = (ggplot(r.circle_df, aes('x', 'y')) +
        geom_polygon(data=r.circle_coords(1 + central_distance, p_data.shape[1] - 1), alpha=1, fill='beige') +
        geom_path(aes(group='r'), linetype='dashed', alpha=0.5) +
        theme_void() +
        theme(legend_title=element_blank(),
        legend_direction='horizontal',
        legend_position='bottom',
        legend_box_spacing=0)
        )

image

Next, we add the axes to the chart...

step_2 = (step_1 +
geom_line(data=r.axis_coords(p_data.shape[1] - 1), mapping=aes(x='x', y='y', group='id'), alpha=0.3)
)

image

...and overlay the data points.

step_3 = (step_2 +
geom_point(data=r.rescaled_data, mapping=aes(x='x', y='y', group='group', color='group'), size=3) +
    geom_path(data=r.rescaled_data, mapping=aes(x='x', y='y', group='group', color='group'), size=1) +
    geom_polygon(data=r.rescaled_data, mapping=aes('x', 'y', group = 'group', color = 'group', fill = 'group'), size = 1, alpha = 0.05, show_legend = False)
    )

image

The only thing left is to add the textual labels and names of the axes.

step_4 = (step_3 +
geom_text(data=r.labels_data, mapping=aes(x='x', y='y', label='value'), alpha=0.65, size=8, 
         color='#303030') +
        geom_text(data=r.text_coords(1 + central_distance + 0.25, p_data.shape[1] - 1),
         mapping=aes(x='x', y='y'), 
         label=[param for param in p_data.columns[1:]],
         size=9)
    )

image

Putting all the steps together, using the Roboto font and pimping the plot a little bit more, we get:

plot = (ggplot(r.circle_df, aes('x', 'y')) +
        geom_polygon(data=r.circle_coords(1 + central_distance, p_data.shape[1] - 1), alpha=1, fill='beige') +
        geom_path(aes(group='r'), linetype='dashed', alpha=0.5) +
        theme_void() +
        theme(legend_title=element_blank(),
        legend_direction='horizontal',
        legend_position='bottom',
        legend_box_spacing=0) +
        geom_line(data=r.axis_coords(p_data.shape[1] - 1), mapping=aes(x='x', y='y', group='id'), alpha=0.3) +
        geom_point(data=r.rescaled_data, mapping=aes(x='x', y='y', group='group', color='group'), size=3) +
        geom_path(data=r.rescaled_data, mapping=aes(x='x', y='y', group='group', color='group'), size=1) +
        geom_polygon(data=r.rescaled_data, mapping=aes('x', 'y', group = 'group', color = 'group', fill = 'group'), size = 1, alpha = 0.05, show_legend = False) +
        geom_text(data=r.labels_data, mapping=aes(x='x', y='y', label='value'), alpha=0.65, size=9, 
         color='#303030') +
        geom_text(data=r.text_coords(1 + central_distance + 0.15, p_data.shape[1] - 1),
         mapping=aes(x='x', y='y'), 
         label=[param for param in p_data.columns[1:]],
         size=9) +
         labs(color='', title = 'Comparison of Titanic passengers') +
         theme(legend_position='bottom',
               legend_text=element_text(size=10, face='bold'),
               legend_title=element_blank(),
                text=element_text(family="Roboto Condensed"),
               legend_box_margin=0,
               legend_margin=-30,
               plot_title=element_text(size=12, margin={'b': -40}, face='bold'),
               axis_title=element_blank(),
               axis_text=element_blank()) +
         lims(x=(-1.75, 1.75), y=(-1.5, 1.8)))

Ta-daa, our work here is done. Let’s just take a moment more to comment on the numbers displayed. The 1st class passengers were the oldest and the wealthiest of the three. The 3rd class passengers had the highest number of both male and female passengers and were the youngest group — probably mostly young people and families in search for better life abroad. However, the 1st class passengers had the highest survival rate, and the 3rd the lowest. This is probably partly due to the 1st class quarters being closer to the boat deck and partly due to the higher proportion of women in that class (since woman and children were rescued first).

P.S. The code necessary to produce the above plots is also available in the spider_titanic.qmd Quarto notebook in the repository.

Bonus: mtcars plot

For the mtcars version, please consult the spider_mtcars.qmd notebook in the repository.

About

A spider chart with absolute feature values on the axis instead of relative ones (percentage or similar)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published