Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data collection quarto #81

Merged
merged 2 commits into from
Jan 24, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 31 additions & 20 deletions quarto/main/data_collection.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -11,25 +11,24 @@ all_wave_data <- CARS::get_all_waves(mode = "file")
data <- CARS::get_tidy_data_file("2023_data.csv") %>%
CARS::rename_cols() %>%
CARS::apply_skip_logic() %>%
CARS::clean_workplace() %>%
CARS::clean_departments() %>%
CARS::clean_data() %>%
CARS::derive_vars()

```

# How we collect data

The Coding in Analysis and Research Survey (CARS) data collection takes place for approximately one month, every autumn. The survey is self-selecting and participation is voluntary. Launch dates vary slightly by year to maximise response rate, for example by avoiding clashes with other internal surveys. In 2022, data collection took place from 3 October to 11 November.
The Coding in Analysis and Research Survey (CARS) data collection takes place for approximately one month, every autumn. The survey is self-selecting and participation is voluntary. Launch dates vary slightly by year to maximise response rate, for example by avoiding clashes with other internal surveys. In 2023, data collection took place from 16 October to 4 December.

We invite analysts to participate in the survey using a variety of online channels, mailing lists, networks and newsletters. For the past three years, the most common source of data was through departmental Reproducible Analytical Pipeline (RAP) champions, who promote the survey in their organisations. We rely on various champion networks, Heads of Profession (HoPs) for analysis and Departmental Directors of Analysis (DDans) to promote the survey in their departments and encourage their analytical communities to participate. This means the response rate and any selection bias will vary across organisations.
We invite analysts to participate in the survey using a variety of online channels, mailing lists, networks and newsletters. For the past four years, the most common source of data has been through departmental Reproducible Analytical Pipeline (RAP) champions, who promote the survey in their organisations. We rely on various champion networks, Heads of Profession (HoPs) for analysis and Departmental Directors of Analysis (DDans) to promote the survey and encourage their analytical communities to participate. This means the response rate and any selection bias will vary across organisations.

Our promotional materials make it clear that we are interested in responses from all analysts, whether or not they use coding in their work. However, it may be the case that the survey attracts a disproportionate number of respondents who have an interest in coding and RAP. We advise against making strong inferences about differences between professions and departments or attempting to estimate real frequencies from the data because of these potential limitations.
Our promotional materials make it clear that we are interested in responses from all analysts, whether or not they use coding in their work. The survey may however attract a disproportionate number of respondents who have an interest in coding and RAP. We advise against making strong inferences about differences between professions and departments or attempting to estimate real frequencies from the data because of these potential limitations.

Lastly, while the survey is open to all public sector analysts, the vast majority of responses come from the UK and devolved Civil Service (`r round(sum(data$workplace == "Civil service, including devolved administations") / nrow(data) * 100, 1)`%). As such, follow-up questions on grade and profession applied only to civil servants.
Lastly, while the survey is open to all public sector analysts, the vast majority of responses come from the UK and devolved Civil Service (`r round(sum(data$workplace == "Civil service, including devolved administrations") / nrow(data) * 100, 1)`% in 2023). As such, follow-up questions on grade and profession applied only to civil servants.

## Where our data comes from

Link tracking allows us to see where responses are coming from. Links promoted by RAP champions were the most commonly used for the past three waves, but consistently account for fewer than half of responses.
Link tracking allows us to see where responses are coming from. Links promoted by RAP champions were the most commonly used for the past three waves, and accounted for over half of responses in 2023.

```{r}
rename_list <- list(
Expand Down Expand Up @@ -71,8 +70,19 @@ rename_list <- list(
"Government digital DS slack" = "Slack",
"GSS slack" = "Slack",
"RAS mailing list/newsletter" = "ONS RAS mailing list",
"RAS mailing list" = "ONS RAS mailing"
"RAS mailing list" = "ONS RAS mailing",
"HoPs managers support network + GSG Teams Channel" = "HoP/DDan mailing list",
"HoPs weekly email" = "HoP/DDan mailing list",
"RAS newsletter" = "ONS RAS mailing list",
"AF newsletter" = "Profession newsletters/mailing lists",
"DDaT newsletter" = "Profession newsletters/mailing lists",
"GSR Friday Bulletin" = "Profession newsletters/mailing lists",
"GORS Newsletter" = "Profession newsletters/mailing lists",
"GSS Newsletter" = "Profession newsletters/mailing lists",
"RAP Champions Network" = "RAP champions",
"DATA SCIENCE SLACK" = "Slack"
)

all_wave_data$tracking_link %<>% dplyr::recode(!!!rename_list)

links <- table(all_wave_data$tracking_link)
Expand All @@ -87,12 +97,12 @@ tracking_link_freqs <- table(all_wave_data$year, all_wave_data$tracking_link) %>
data.frame()

# Reorder by 2022 frequencies
# As the dataset is ordered by year, the code below works out the correct order for the 2022 "block" and applies it to all three
# As the dataset is ordered by year, the code below works out the correct order for the 2023 "block" and applies it to all three

order <- rev(order(tracking_link_freqs$percent[17:24]))
tracking_link_freqs <- tracking_link_freqs[c(order, order+8, order+16) ,]
order <- rev(order(tracking_link_freqs$percent[25:32]))
tracking_link_freqs <- tracking_link_freqs[c(order, order+8, order+16, order+24) ,]

CARS::df_to_table(tracking_link_freqs[c(2,1,5)], column_headers = c("Tracking link", "2020", "2021", "2022"), crosstab = T)
CARS::df_to_table(tracking_link_freqs[c(2,1,5)], column_headers = c("Tracking link", "2020", "2021", "2022", "2023"), crosstab = T)

```

Expand Down Expand Up @@ -143,24 +153,24 @@ CARS::wrap_outputs("code-freq", plot, table)

## Grade

Across all waves, over 80% of Civil Service respondents reported that they are at H, S or Grade 7 grades. While this will be representative of the grade distribution of analysts in some government organisations, it may not be the case for all organisations.
Across all years, over 80% of Civil Service respondents reported that they are at H, S or Grade 7 grades. While this will be representative of the grade distribution of analysts in some government organisations, it may not be the case for all organisations.

```{r}
all_wave_data$CS_grade[all_wave_data$CS_grade == "Research Officer"] <- "Higher Executive Officer (or equivalent)"

all_wave_data$CS_grade <- gsub(" \\(or equivalent\\)", "", all_wave_data$CS_grade)

recode_list <- list(
"Administrative Officer" = "Administrative officer or executive officer",
"Executive Officer" = "Administrative officer or executive officer",
"Administrative Officer" = "Administrative officer or Executive officer",
"Executive Officer" = "Administrative officer or Executive officer",
"Grade 6" = "Grade 6 or above",
"SCS Pay Band 1" = "Grade 6 or above"
"SCS Pay Band 1" = "Grade 6 or above"
)

all_wave_data$CS_grade <- dplyr::recode(all_wave_data$CS_grade, !!!recode_list)

all_wave_data$CS_grade <- factor(all_wave_data$CS_grade, levels = c(
"Administrative officer or executive officer",
"Administrative officer or Executive officer",
"Higher Executive Officer",
"Senior Executive Officer",
"Grade 7",
Expand All @@ -182,7 +192,7 @@ CARS::wrap_outputs("grades-by-year", plot, table)

## Profession

Below is a breakdown of the proportion of respondents in different Civil Service professions. These cover the [Analysis Function professions](https://analysisfunction.civilservice.gov.uk/about-us/frequently-asked-questions/) and do not apply outside of the civil service. The exception to this are data scientists who do not have an official government profession. They are included separately here to avoid skewing the data for other professions. Note that respondents can be members of more than one analytical profession. Profession data is difficult to compare across waves as these questions have changed in line with changes to the Analysis Function.
Below is a breakdown of the proportion of respondents in different Civil Service professions. These cover the [Analysis Function professions](https://analysisfunction.civilservice.gov.uk/about-us/frequently-asked-questions/) and do not apply outside of the civil service. The exception to these are data scientists and data engineers who do not have an official government profession. They are included separately here to avoid skewing the data for other professions. Note that respondents can be members of more than one analytical profession. Profession data is difficult to compare across years as these questions have changed in line with changes to the Analysis Function.

The CARS sample has high representation from statisticians compared with other professions. This again may be representative of some organisations but not all.

Expand All @@ -199,13 +209,14 @@ recode_vals <- c(
"prof_GSG" = "Statisticians",
"prof_DS" = "Data scientists",
"prof_GSR" = "Social researchers",
"prof_CS_none" = "civil servant - no profession membership",
"prof_CS_none" = "Civil servant - no profession membership",
"prof_GORS" = "Operational researchers",
"prof_GES" = "Economists",
"prof_DDAT" = "Digital, data and technology profession",
"prof_CS_other" = "Civil servant - other profession",
"prof_GAD" = "Actuaries",
"prof_geog" = "Georgraphers"
"prof_geog" = "Geographers",
"prof_DE" = "Data engineers"
)
frequencies$Profession <- dplyr::recode(frequencies$Profession, !!!recode_vals)

Expand Down