Data collection quarto (#81)

* Updated text and tables for 2023 * Corrected typos in text
best-practice-and-impact · Jan 24, 2024 · e41d7da · e41d7da
1 parent 48adc8f
commit e41d7da
Showing 1 changed file with 30 additions and 18 deletions.
diff --git a/quarto/main/data_collection.qmd b/quarto/main/data_collection.qmd
@@ -18,17 +18,17 @@ data <- CARS::get_tidy_data_file("2023_data.csv") %>%
 
 # How we collect data
 
-The Coding in Analysis and Research Survey (CARS) data collection takes place for approximately one month, every autumn. The survey is self-selecting and participation is voluntary. Launch dates vary slightly by year to maximise response rate, for example by avoiding clashes with other internal surveys. In 2022, data collection took place from 3 October to 11 November.
+The Coding in Analysis and Research Survey (CARS) data collection takes place for approximately one month, every autumn. The survey is self-selecting and participation is voluntary. Launch dates vary slightly by year to maximise response rate, for example by avoiding clashes with other internal surveys. In 2023, data collection took place from 16 October to 4 December.
 
-We invite analysts to participate in the survey using a variety of online channels, mailing lists, networks and newsletters. For the past three years, the most common source of data was through departmental Reproducible Analytical Pipeline (RAP) champions, who promote the survey in their organisations. We rely on various champion networks, Heads of Profession (HoPs) for analysis and Departmental Directors of Analysis (DDans) to promote the survey in their departments and encourage their analytical communities to participate. This means the response rate and any selection bias will vary across organisations.
+We invite analysts to participate in the survey using a variety of online channels, mailing lists, networks and newsletters. For the past four years, the most common source of data has been through departmental Reproducible Analytical Pipeline (RAP) champions, who promote the survey in their organisations. We rely on various champion networks, Heads of Profession (HoPs) for analysis and Departmental Directors of Analysis (DDans) to promote the survey and encourage their analytical communities to participate. This means the response rate and any selection bias will vary across organisations.
 
-Our promotional materials make it clear that we are interested in responses from all analysts, whether or not they use coding in their work. However, it may be the case that the survey attracts a disproportionate number of respondents who have an interest in coding and RAP. We advise against making strong inferences about differences between professions and departments or attempting to estimate real frequencies from the data because of these potential limitations.
+Our promotional materials make it clear that we are interested in responses from all analysts, whether or not they use coding in their work. The survey may however attract a disproportionate number of respondents who have an interest in coding and RAP. We advise against making strong inferences about differences between professions and departments or attempting to estimate real frequencies from the data because of these potential limitations.
 
-Lastly, while the survey is open to all public sector analysts, the vast majority of responses come from the UK and devolved Civil Service (`r round(sum(data$workplace == "Civil service, including devolved administations") / nrow(data) * 100, 1)`%). As such, follow-up questions on grade and profession applied only to civil servants.
+Lastly, while the survey is open to all public sector analysts, the vast majority of responses come from the UK and devolved Civil Service (`r round(sum(data$workplace == "Civil service, including devolved administrations") / nrow(data) * 100, 1)`% in 2023). As such, follow-up questions on grade and profession applied only to civil servants.
 
 ## Where our data comes from
 
-Link tracking allows us to see where responses are coming from. Links promoted by RAP champions were the most commonly used for the past three waves, but consistently account for fewer than half of responses.
+Link tracking allows us to see where responses are coming from. Links promoted by RAP champions were the most commonly used for the past three waves, and accounted for over half of responses in 2023.
 
 ```{r}
 rename_list <- list(
@@ -70,8 +70,19 @@ rename_list <- list(
   "Government digital DS slack" = "Slack",
   "GSS slack" = "Slack",
   "RAS mailing list/newsletter" = "ONS RAS mailing list",
-  "RAS mailing list" = "ONS RAS mailing"
+  "RAS mailing list" = "ONS RAS mailing",
+  "HoPs managers support network + GSG Teams Channel" = "HoP/DDan mailing list",
+  "HoPs weekly email" = "HoP/DDan mailing list",
+  "RAS newsletter" = "ONS RAS mailing list",
+  "AF newsletter" = "Profession newsletters/mailing lists",
+  "DDaT newsletter" = "Profession newsletters/mailing lists",
+  "GSR Friday Bulletin" = "Profession newsletters/mailing lists",
+  "GORS Newsletter" = "Profession newsletters/mailing lists",
+  "GSS Newsletter" = "Profession newsletters/mailing lists",
+  "RAP Champions Network" = "RAP champions",
+  "DATA SCIENCE SLACK" = "Slack"
 )
+
 all_wave_data$tracking_link %<>% dplyr::recode(!!!rename_list)
 
 links <- table(all_wave_data$tracking_link)
@@ -86,12 +97,12 @@ tracking_link_freqs <- table(all_wave_data$year, all_wave_data$tracking_link) %>
   data.frame()
 
 # Reorder by 2022 frequencies
-# As the dataset is ordered by year, the code below works out the correct order for the 2022 "block" and applies it to all three
+# As the dataset is ordered by year, the code below works out the correct order for the 2023 "block" and applies it to all three
 
-order <- rev(order(tracking_link_freqs$percent[17:24]))
-tracking_link_freqs <- tracking_link_freqs[c(order, order+8, order+16) ,]
+order <- rev(order(tracking_link_freqs$percent[25:32]))
+tracking_link_freqs <- tracking_link_freqs[c(order, order+8, order+16, order+24) ,]
 
-CARS::df_to_table(tracking_link_freqs[c(2,1,5)], column_headers = c("Tracking link", "2020", "2021", "2022"), crosstab = T)
+CARS::df_to_table(tracking_link_freqs[c(2,1,5)], column_headers = c("Tracking link", "2020", "2021", "2022", "2023"), crosstab = T)
 
 ```
 
@@ -142,24 +153,24 @@ CARS::wrap_outputs("code-freq", plot, table)
 
 ## Grade
 
-Across all waves, over 80% of Civil Service respondents reported that they are at H, S or Grade 7 grades. While this will be representative of the grade distribution of analysts in some government organisations, it may not be the case for all organisations.
+Across all years, over 80% of Civil Service respondents reported that they are at H, S or Grade 7 grades. While this will be representative of the grade distribution of analysts in some government organisations, it may not be the case for all organisations.
 
 ```{r}
 all_wave_data$CS_grade[all_wave_data$CS_grade == "Research Officer"] <- "Higher Executive Officer (or equivalent)"
 
 all_wave_data$CS_grade <- gsub(" \\(or equivalent\\)", "", all_wave_data$CS_grade)
 
 recode_list <- list(
-  "Administrative Officer" = "Administrative officer or executive officer",
-  "Executive Officer" = "Administrative officer or executive officer",
+  "Administrative Officer" = "Administrative officer or Executive officer",
+  "Executive Officer" = "Administrative officer or Executive officer",
   "Grade 6" = "Grade 6 or above",
-  "SCS Pay Band 1" = "Grade 6 or above"  
+  "SCS Pay Band 1" = "Grade 6 or above"
 )
 
 all_wave_data$CS_grade <- dplyr::recode(all_wave_data$CS_grade, !!!recode_list)
 
 all_wave_data$CS_grade <- factor(all_wave_data$CS_grade, levels = c(
-  "Administrative officer or executive officer",
+  "Administrative officer or Executive officer",
   "Higher Executive Officer",
   "Senior Executive Officer",
   "Grade 7",
@@ -181,7 +192,7 @@ CARS::wrap_outputs("grades-by-year", plot, table)
 
 ## Profession
 
-Below is a breakdown of the proportion of respondents in different Civil Service professions. These cover the [Analysis Function professions](https://analysisfunction.civilservice.gov.uk/about-us/frequently-asked-questions/) and do not apply outside of the civil service. The exception to this are data scientists who do not have an official government profession. They are included separately here to avoid skewing the data for other professions. Note that respondents can be members of more than one analytical profession. Profession data is difficult to compare across waves as these questions have changed in line with changes to the Analysis Function.
+Below is a breakdown of the proportion of respondents in different Civil Service professions. These cover the [Analysis Function professions](https://analysisfunction.civilservice.gov.uk/about-us/frequently-asked-questions/) and do not apply outside of the civil service. The exception to these are data scientists and data engineers who do not have an official government profession. They are included separately here to avoid skewing the data for other professions. Note that respondents can be members of more than one analytical profession. Profession data is difficult to compare across years as these questions have changed in line with changes to the Analysis Function.
 
 The CARS sample has high representation from statisticians compared with other professions. This again may be representative of some organisations but not all.
 
@@ -198,13 +209,14 @@ recode_vals <- c(
   "prof_GSG" = "Statisticians",
   "prof_DS" = "Data scientists",
   "prof_GSR" = "Social researchers",
-  "prof_CS_none" = "civil servant - no profession membership",
+  "prof_CS_none" = "Civil servant - no profession membership",
   "prof_GORS" = "Operational researchers",
   "prof_GES" = "Economists",
   "prof_DDAT" = "Digital, data and technology profession",
   "prof_CS_other" = "Civil servant - other profession",
   "prof_GAD" = "Actuaries",
-  "prof_geog" = "Georgraphers"
+  "prof_geog" = "Geographers",
+  "prof_DE" = "Data engineers"
 ) 
 frequencies$Profession <- dplyr::recode(frequencies$Profession, !!!recode_vals)