index.qmd

---
# title: "`R` in Action"
# subtitle: "Efficient data science with `R`"
format:
    revealjs:
        width: 900
        height: 1600
        pagetitle: "Research fair 2024"
        menu: false
        #hash: false
        fragment-in-url: false
        history: false
        transition: slide
        background-transition: slide
        controls: true
        controls-layout: bottom-right
        navigation-mode: linear
        # theme: serif
        center: true
        fig-align: center
        auto-slide: 5000
        auto-slide-stoppable: true
        loop: true
        embed-resources: true
        # standalone: true
        # self-contained: true
        # self-contained-math: true
execute: 
    cache: true
editor_options: 
  chunk_output_type: console
---

## `R` in Action {.center}
::: {.r-fit-text}
Efficient data science with `R`
:::
A demonstration by <span style="color:blue;">Md. Aminul Islam Shazid</span>.


## {.center data-autoslide="5000"}
::: {.r-fit-text}
Grammar of graphics with `ggplot2`
:::

## Plots using grammar of graphics with `ggplot2` {data-autoslide="15000"}
- `ggplot2` is an `R` package that implements the grammar of graphics.
- Can provide beautiful graphics with some simple building blocks.
- Variables/features/columns are mapped to various elements of the plot called "aesthetics", e.g., axis, colours, point size, line type etc.
- Then a geometry transforms that "aesthetic" mapping into a plot.

## A simple example
```{r}
library(ggplot2)
library(palmerpenguins)
data(package = 'palmerpenguins')
# attributes(penguins$bill_length_mm)$label <- "Bill length (mm)"
```

```{r}
#| echo: true
#| warning: false
ggplot(data = penguins,
       mapping = aes(x = bill_length_mm, 
                     y = flipper_length_mm)) +
    geom_point()
```

## Adding a grouping variable
```{r}
#| echo: true
#| warning: false
ggplot(penguins, 
       mapping = aes(x = bill_length_mm, 
                     y = flipper_length_mm, 
                     color = species)) + 
    geom_point()
```

## Let's add another dimension to the plot!
```{r}
#| echo: true
#| warning: false
ggplot(penguins, 
       mapping = aes(x = bill_length_mm, 
                     y = flipper_length_mm, 
                     color = species, 
                     size = body_mass_g)) + 
    geom_point(alpha = 0.5)
```

## Adding yet another dimension!
```{r}
#| echo: true
#| warning: false
ggplot(penguins, 
       mapping = aes(x = bill_length_mm, 
                     y = flipper_length_mm, 
                     color = species, 
                     size = body_mass_g)) + 
    geom_point(alpha = 0.5) +
    facet_wrap(~island)
```

## Comparing a variable across groups with boxplot
```{r}
#| echo: true
#| warning: false
ggplot(penguins,
       mapping = aes(y = body_mass_g, 
                     x = species, 
                     fill = species)) +
    geom_boxplot(width = 0.2, show.legend = FALSE)
```

```{r}
# geom_boxplot(position = position_dodge2(preserve = "single"))
```

## Violon plots as alternative to boxplot
More informative: gives a sense of the density too!
```{r}
#| echo: true
#| warning: false
ggplot(penguins,
       mapping = aes(y = body_mass_g, 
                     x = species, 
                     fill = species)) +
    geom_violin(width = 0.5, show.legend = FALSE) + 
    geom_boxplot(fill = "white", width = 0.1, show.legend = FALSE)
```

## Bar diagrams
```{r}
library(dplyr)
# nba = read.csv("http://datasets.flowingdata.com/ppg2008.csv")
```


```{r}
#| echo: true
penguins |> 
    count(island, species) |> 
    ggplot() + 
    aes(x = island, y = n, fill = species) + 
    geom_bar(stat = "identity", 
             position = position_dodge2(preserve = "single"))
```

## Line chart
To show trend or evolution.

```{r}
#| echo: true
#| warning: false
ggplot() + 
    aes(x = time(AirPassengers), y = AirPassengers) + 
    geom_line()
```

## Line chart with a trend line!
`LOESS` smoother added as a trend line.
```{r}
#| echo: true
#| warning: false
ggplot() + 
    aes(x = time(AirPassengers), y = AirPassengers) + 
    geom_line() + 
    geom_smooth()
```


## {.center}
::: {.r-fit-text}
Fast data exploration with  `DataExplorer`
:::

## Basic info about a dataset
```{r}
#| echo: true
library(DataExplorer)
plot_intro(penguins)
```

## Find missing values
```{r}
#| echo: true
plot_missing(penguins)
```

## Frequency distribution of all discrete variables
```{r}
#| echo: true
plot_bar(diamonds)
```

## Frequency distribution by a discrete variable
```{r}
#| echo: true
plot_bar(diamonds, by = "cut")
```

## Histogram of all continuous variables
```{r test0}
#| echo: true
plot_histogram(diamonds)
```

## Kernel density of all continuous variables
```{r test1}
#| echo: true
plot_density(diamonds)
```

## Boxplot
Boxplots of all continuous variables with groups formed with respect to a categorical variable
```{r test2}
#| echo: true
plot_boxplot(diamonds, by = "cut")
```

## Scatterplot of one variable with all other continuous variable
```{r test3}
#| echo: true
plot_scatterplot(
    split_columns(diamonds)$continuous, 
    by = "price", 
    sampled_rows = 1000L
)
```

## Quantile-quantile plot of all continuous variables
```{r qqplot}
#| echo: true
plot_qq(diamonds)
```

## Correlogram
```{r}
#| echo: true
plot_correlation(split_columns(diamonds)$continuous)
```


## {.center}
::: {.r-fit-text}
Publication ready tables with `gtsummary`
:::

## Table describing the sample
This is the so-called `table-1`

```{r}
# data(mtcars)
# 
# mtcars$am <- factor(mtcars$am, levels = 0:1, labels = c("Auto", "Manual"))
# mtcars$cyl <- factor(mtcars$cyl)
# 
# # NOTE: categorical variables need to be converted to factors before applying variable labels
# 
# attributes(mtcars$mpg)$label <- "Miles per galon"
# attributes(mtcars$hp)$label <- "Horsepower"
# attributes(mtcars$wt)$label <- "Weight"
# attributes(mtcars$qsec)$label <- "1/4 mile time"
# attributes(mtcars$am)$label <- "Transmission"
# attributes(mtcars$gear)$label <- "Gears"
# 
# attr(mtcars$cyl, "label") <- "Cylinders"
# attr(mtcars$disp, "label") <- "Displacement"
# 
# mtcars <- mtcars |>
#     select(c("mpg", "hp", "wt", "qsec", "am", "gear", "cyl", "disp"))
library(gtsummary)
trial$death <- factor(trial$death, levels = 0:1, labels = c("No", "Yes"))
attributes(trial$response)$label <- "Patient died"

# trial$response <- factor(trial$response)
# attributes(trial$response)$label <- "Tumor Response"
```

```{r}
#| echo: true
library(gtsummary)
tbl_summary(
    data = trial, 
    missing_text = "NA",
    include = c("age", "trt", "marker", "stage", "grade", "death")
    ) |> 
    bold_labels()
```

## Cross table

```{r}
#| echo: true
tbl_summary(
    data = trial, 
    include = c("age", "trt", "marker", "stage", "grade"),
    by = "death",
    percent = "row",
    missing_text = "NA",
    ) |> 
    add_p() |> 
    bold_p() |> 
    bold_labels() |> 
    modify_spanning_header(all_stat_cols() ~ "**Death**")
```

## Regression model summary table
`gtsummary` many different kinds of statistical models. Adding support for new models is also very easy.
```{r}
#| echo: true
logit_model <- glm(death ~ age + trt + marker + stage + grade, 
                   data = trial, family = binomial)
tbl_regression(
    logit_model,
    exponentiate = TRUE
    ) |> 
    bold_p() |> 
    bold_labels()
```


## {.center}
::: {.r-fit-text}
Decision tree classifier in `R`
:::

## Fitting a decision tree model
Classifying disease outcome using decision tree
```{r}
#| echo: true
library(tree)
tree1 <- tree(death ~ age + trt + marker + stage + grade, 
     data = trial)
plot(tree1)
text(tree1, pretty = 0)
```


## {.center}
::: {.r-fit-text}
Hierarchical clustering in `R`
:::

## Finding similar cars
```{r}
#| echo: true
#| fig.height: 6
library(colorhcplot)
d <- dist(mtcars)
hc1 <- hclust(d)
plot(hc1, hang = -1, cex = 0.8)
rect.hclust(hc1, k = 3)
```

## {.center}
::: {.r-fit-text}
KNN clustering in `R`
:::

## Finding clusters of flowers in the `iris` dataset
```{r}
#| echo: true
library(factoextra)
km1 <- kmeans(iris[, 1:4], centers = 4, nstart = 1000)
fviz_cluster(km1, data = iris[, 1:4], geom = "point")
```


## {.center}
::: {.r-fit-text}
Time series analysis in `R`
:::

## Plotting a time series
```{r}
#| echo: true
library(ggfortify)
autoplot(AirPassengers)
```

## Decompose a time series
Decomposing the `AirPassengers` data into trend, seasonality etc.
```{r}
#| echo: true
dAP <- decompose(AirPassengers)
autoplot(dAP)
```

## Forecasting future values
```{r}
#| echo: true
library(forecast)
AP_arima <- auto.arima(AirPassengers)
AP_f <- forecast(AP_arima, h = 30)
autoplot(AP_f)
```