From 7e688590889ede5b129d4701c98976472d84bd11 Mon Sep 17 00:00:00 2001 From: Rowley Date: Wed, 24 Jan 2024 15:39:40 +0000 Subject: [PATCH 1/6] Updated text for 2023 survey --- quarto/main/_quarto.yml | 2 +- quarto/main/summary.qmd | 122 +++++++++++++++++----------------------- 2 files changed, 52 insertions(+), 72 deletions(-) diff --git a/quarto/main/_quarto.yml b/quarto/main/_quarto.yml index b0da09e..76fecdd 100644 --- a/quarto/main/_quarto.yml +++ b/quarto/main/_quarto.yml @@ -3,7 +3,7 @@ project: output-dir: ../../docs/ website: - title: Coding in Analysis and Research Survey 2022 + title: Coding in Analysis and Research Survey 2023 navbar: background: primary left: diff --git a/quarto/main/summary.qmd b/quarto/main/summary.qmd index 9603333..07be6f9 100644 --- a/quarto/main/summary.qmd +++ b/quarto/main/summary.qmd @@ -1,5 +1,5 @@ --- -title: "The state of UK public sector analysis code: 2022" +title: "The state of UK public sector analysis code: 2023" output: html: self-contained: true @@ -31,11 +31,9 @@ For more detail, [see the data collection page](data_collection.qmd). ## Coding frequency and tools -### Most respondents regularly use code at work +We asked respondents "In your current role, how often do you write code to complete your work objectives?" -Over the past three years, most respondents reported coding regularly or all the time to complete work objectives. This may be in part due to respondent bias, where those with an interest in coding are more likely to respond to the survey. However, we can conclude that coding is now, and has been for at least a few years, a normal aspect of analysis work in the public sector for many analysts. - -#### 2022 data +#### 2023 data ```{r} @@ -76,18 +74,10 @@ CARS::wrap_outputs("code-freq", plot, table) ### Access to and knowledge of programming languages -Given a list of programming tools, we asked respondents to answer "Yes", "No" or "Don't know" for the following questions; - -- Do you know how to program with this tool to a level suitable for your work? -- Is this tool available to use for your work? +Given a list of programming tools, we asked respondents if the tool was available to use for their work. Access to tools does not necessarily refer to official policy. Some analysts may have access to tools others cannot access within the same organisation. -Please note that capability in programming languages is self-reported here and was not objectively defined or tested - -#### Most respondents have access to open source tools - -More respondents reported having access to R and Python than any other programming language listed here. R, Python and SQL are the most accessible programming languages across government, ahead of well established licensed tools such as SPSS, SAS and Stata. ```{r} @@ -98,9 +88,9 @@ CARS::wrap_outputs("access", plot, table) ``` -#### Open source tools have overtaken proprietary tools in capability +Given the same list of programming tools, respondents were asked if they knew how to program with the tool to a level suitable for their work, answering "Yes", "No" or "Not required for my work". -More respondents reported having the knowledge to use R, Python and SQL at work than any other coding tools, with SPSS being the most popular proprietary software. This shift towards open source tooling is in line with the [RAP strategy](https://analysisfunction.civilservice.gov.uk/policy-store/reproducible-analytical-pipelines-strategy/) and cross-government RAP standards. +Please note that capability in programming languages is self-reported here and was not objectively defined or tested. ```{r} @@ -111,9 +101,9 @@ CARS::wrap_outputs("knowledge", plot, table) ``` -#### Open source capability is increasing over time +#### Open source capability over time -The proportion of respondents who report having the capability to use R and Python has been increasing over the past three waves of CARS. In contrast, the proportion who are able to use SAS, SPSS or Stata has been decreasing during this time. R and Python has risen dramatically in popularity in recent years due to their use in data science. Python in particular is now along the most used and [most popular programming languages globally](https://survey.stackoverflow.co/2022/), while both R and Python are in the top three most [popular programming languages for data science](https://www.kaggle.com/kaggle-survey-2022). +The proportion of respondents who report having the capability to use R and Python, is shown alongside the proportion who are able to use SAS, SPSS or Stata, for the past four years of the survey. ```{r} @@ -139,28 +129,30 @@ CARS::wrap_outputs("tools-over-time", plot, table) ``` -#### Different professions have capability in different tools - -Differences in preferred languages may lead to silos between analytical professions. For digital and data professionals, operational researchers, data scientists and geographers capability is highest in R, SQL and python. R is among the two top languages for capability for every analytical profession. +#### Professions capability in different tools -However, proprietary tools tend to be more profession-specific. For example, economists have much higher stata capability than any other profession, while social researchers have the highest SPSS capability. Open source tooling offers better cross-profession as well as cross-department collaboration due to easier access to tools. +Differences in preferred languages may lead to silos between analytical professions. Here we show the tool capability of members of the different analytical professions. Please note that respondents might be members of more than one profession. ```{R} colnames(tables$languages_by_prof)[2] <- "Profession" -tables$languages_by_prof[c(2,1,3)] %>% CARS::df_to_table(n = samples$profs, crosstab = T, proportion_col = 3) +tables$languages_by_prof[c(2,1,3)] %>% CARS::df_to_table(crosstab = T, proportion_col = 3) ``` -### Most respondents have access to git and know how to use it +### Access to and knowledge of git -Access to git is generally high across government. However, many have access to git but do not have the capability to use it, meaning there is more work to do to ensure analysts are able to learn these skills. However, some departments have no access, or limited access to git (see department pages). + +We asked respondents to answer "Yes", "No" or "Don't know" for the following questions: + +- Is git available to use in your work? +- Do you know how to use git to version-control your work? Please note these outputs include people who do not code at work. -#### Access to git +### Git access and knowledge ```{r} plot <- CARS::plot_freqs(tables$git_access, font_size = 14, n = samples$all, xlab = "Access to git") %>% CARS::set_axis_range(0, 1, axis = "y") @@ -178,11 +170,12 @@ CARS::wrap_outputs("git-knowledge", plot, table) ``` -## Capability -### Most respondents first learned to code in education +### Where respondents first learned to code + +Respondents with coding experience outside their current role were asked where they first learned to code. Those analysts who code in their current role but reported no other coding experience, are included as having learned 'In current role'. -Half of respondents learned to code for the first time in education. Nevertheless, nearly a third reported learning code for the first time in public sector employment, mostly in their current role. These data show analysts are actively up-skilling in their roles and the organisation is able to draw experienced programmers from other parts of government, as well as those who leave education with some coding abilities. +These data only show where people first learned to code. They do not show all the settings in which they had learned to code, to what extent, or how long ago. ```{r} @@ -198,9 +191,11 @@ CARS::wrap_outputs("where-learned", plot, table) ``` -### Most analysts' coding capabilities are actively improving +### Change in coding ability during current role -Most respondents with prior coding experience reported that their coding capability has improved while in their current role. Again, this shows the Analysis Function is building capability in-house as well as recruiting analysts with existing capability. However, results also show a minority of respondents who feel they are losing their capability. As more analysts gain capability, it is important that existing skills are retained. +We asked "Has your coding ability changed during your current role?" + +This question was only asked of respondents with coding experience outside of their current role. This means analysts who first learned to code in their current role are not included in the data. ```{r} @@ -217,15 +212,8 @@ CARS::wrap_outputs("ability-change", plot, table) ``` -### How often people write code is a strong predictor of capability change - -Management responsibility and civil service grade are both negatively correlated with improvements to coding capability. In other words, more senior analysts are more likely to report that their coding abilities are becoming worse. +#### Capability change by coding frequency -We used cross-government data to create an ordinal regression model. How often people write code was a very strong predictor at all levels (p \< 0.001) of whether respondents report improvement to their coding capability. Analysts who use code at work at least some of the time were 4.5 times more likely to report more improvement/less decline to their coding abilities than those who rarely or never wrote code at work. The effect was even greater for those who reported writing code regularly or all the time. - -Civil service grade had a smaller effect, with those at grade 7 or 6 reporting less improvement to their coding abilities (p \< .001), but no significant difference between those at HEO and SEO grades (p \> .05). Line management responsibility and civil service grade had no significant effect when the model includes coding frequency regardless of whether it involved managing others who code (p \> .05). - -These findings show that allowing analysts to continue coding is key to retaining and building on existing skills. While seniority is less predictive of capability change, it is correlated with how often people write code at work, meaning more senior analysts are more likely to lose their capability and less likely to build capability. ```{r} plot <- CARS::plot_likert(tables$capability_change_by_freq, mid = 3, @@ -255,14 +243,12 @@ The following links contain more resources on RAP: * you can find minimum RAP standards in the [RAP MVP](rap_mvp_maturity_guidance/Reproducible-Analytical-Pipelines-MVP.md%20at%20master%20·%20best-practice-and-impact/rap_mvp_maturity_guidance%20·%20GitHub) * you can find guidance on quality assuring code in the [Duck Book](https://best-practice-and-impact.github.io/qa-of-code-guidance/intro.html) -### Awareness of RAP across government has increased +### Awareness of RAP over time ```{r} freqs <- CARS::summarise_rap_awareness_over_time(all_wave_data) ``` -Over the past three years, awareness of RAP has increased year on year across the Analysis Function. In 2022, `r round(freqs[3, 5]*100, digits = 0) %>% paste0("%")` of respondents had heard of RAP, a `r round((freqs[3, 5]-freqs[1, 5])/freqs[1, 5]*100, digits = 0) %>% paste0("%")` increase. - ```{r} plot <- CARS::plot_freqs(freqs[c(2, 5)], type = "line", xlab = "Year", font_size = 14, error_y = CARS::set_error_bars(freqs$lower_ci, freqs$upper_ci)) %>% CARS::set_axis_range(0, 1, axis = "y") @@ -276,19 +262,27 @@ table <- CARS::df_to_table(freqs[c(2,5:7)], CARS::wrap_outputs("rap-awareness-over-time", plot, table) ``` -### Most respondents have heard of RAP champions - -[RAP champions](https://analysisfunction.civilservice.gov.uk/support/reproducible-analytical-pipelines/reproducible-analytical-pipeline-rap-champions/) support and promote the use of RAP across government. Although most respondents who have heard of RAP had heard of the RAP champions' network, most did not know who their RAP champions are. More work is needed to increase awareness of the support available across government, including who its RAP champions are. - -Please [contact the analysis standards and pipelines team](mailto:asap@ons.gov.uk) for any enquiries about RAP or the champions network. +#### 2023 data ```{r} - plot <- CARS::plot_freqs(tables$rap_knowledge, n = samples$code_at_work, xlab = "Heard of RAP?", font_size = 14, orientation = "v") table <- CARS::df_to_table(tables$rap_knowledge, n = samples$code_at_work, column_headers = c("Knowledge", "Percent")) CARS::wrap_outputs("rap-knowledge", plot, table) +``` + +### Awareness of RAP Champions + +We asked respondents who had heard of RAP, if they knew who their RAP champions are. + +[RAP champions](https://analysisfunction.civilservice.gov.uk/support/reproducible-analytical-pipelines/reproducible-analytical-pipeline-rap-champions/) support and promote the use of RAP across government. + +Please [contact the analysis standards and pipelines team](mailto:asap@ons.gov.uk) for any enquiries about RAP or the champions network. + + +```{r} + plot <- CARS::plot_freqs(tables$rap_champ_status, n = samples$heard_of_RAP, break_q_names_col = "value", max_lines = 2, xlab = "Heard of RAP champions?", font_size = 14, orientation = "h") table <- CARS::df_to_table(tables$rap_champ_status, n = samples$heard_of_RAP, column_headers = c("Knowledge", "Percent")) @@ -296,9 +290,11 @@ CARS::wrap_outputs("rap-champ-status", plot, table) ``` -### Most respondents have heard of the RAP strategy +### Awareness of RAP strategy + +We asked respondents who had heard of RAP, if they had heard of the RAP strategy -The [Analysis Function RAP strategy](https://analysisfunction.civilservice.gov.uk/policy-store/reproducible-analytical-pipelines-strategy/) was released in June 2022 and sets out plans for adopting RAP across government. Although most respondents had not read the strategy, 77% of those who had heard of RAP were also aware of the RAP strategy. +The [Analysis Function RAP strategy](https://analysisfunction.civilservice.gov.uk/policy-store/reproducible-analytical-pipelines-strategy/) was released in June 2022 and sets out plans for adopting RAP across government. ```{r} plot <- CARS::plot_freqs(tables$strategy_knowledge, break_q_names_col = 1, max_lines = 3, font_size = 14, orientation = "v", n = samples$heard_of_RAP) @@ -313,13 +309,6 @@ CARS::wrap_outputs("RAP-strat", plot, table) We asked respondents who had heard of RAP whether they agreed with a series of statements. - -### Respondents see the benefits of RAP but most are not currently implementing it - -We asked respondents who had heard of RAP a series of questions about their opinions on RAP. The majority agreed with the statement "I think it is important to implement RAP in my work" (`r round(sum(data$RAP_important %in% c("Agree", "Strongly Agree")) / samples$heard_of_RAP * 100, 1) %>% paste0("%")`), but only only `r round(sum(data$RAP_implementing %in% c("Agree", "Strongly Agree")) / samples$heard_of_RAP * 100, 1) %>% paste0("%")` agreed they are currently implementing RAP. - -Similarly , around half of the respondents agreed on various statements on understanding RAP and having the support and resources to implement it. While awareness and buy-in is high, this highlights the need to ensure analysts are aware of the resources currently available, and for additional resources to be made available where gaps currently exist. - ```{r} plot <- CARS::plot_likert(tables$rap_opinions, @@ -348,11 +337,6 @@ CARS::wrap_outputs("rap-opinions", plot, table) We asked respondents who reported writing code at work about the good practices they apply when writing code at work. These questions cover many of the coding practices recommended in the quality assurance of code for analysis and research guidance, as well as the [minimum RAP standards](rap_mvp_maturity_guidance/Reproducible-Analytical-Pipelines-MVP.md%20at%20master%20·%20best-practice-and-impact/rap_mvp_maturity_guidance%20·%20GitHub) set by the cross-government [RAP champions network](https://analysisfunction.civilservice.gov.uk/support/reproducible-analytical-pipelines/reproducible-analytical-pipeline-rap-champions/). -### Respondents do not consistently apply RAP practices - -While many respondents make use of RAP practices, these are very inconsistently applied. The chart below present the frequency of respondents reporting that they apply these practices "regularly" or "all the time". For documentation, writing readme files was considered a minimum requirement. For dependency management and continuous integration respondents were only asked whether they use these at all. - -While most respondents who write code use open source software and peer review code regularly, this is not the case for other practices. Basic RAP practices, defined by the RAP champions network as being part of the [minimum RAP standards](rap_mvp_maturity_guidance/Reproducible-Analytical-Pipelines-MVP.md%20at%20master%20·%20best-practice-and-impact/rap_mvp_maturity_guidance%20·%20GitHub) are presented in blue. Among these, open sourcing code is particularly uncommon, despite being part of the digital service standard. ```{r} @@ -377,9 +361,10 @@ CARS::wrap_outputs("rap-comp", plot, table) ``` -### Many have the capability to apply good practices, but do not always do so +### Consistency of good coding practices + +We asked respondents who reported writing code at work how frequently they apply good coding practices when writing code at work. -As the detailed breakdowns below show, analysts often apply these good practices some of the time. However, in most cases fewer than half responded "regularly" or "all the time", meaning they often do not use these despite having the capability to do so. ```{r} @@ -407,9 +392,9 @@ CARS::wrap_outputs("good-practices", plot, table) ``` -### Analysts rely primarily on code comments for documentation +### Code documentation -Many analysts do not regularly document code in any form other than code comments. While code comments are useful, other forms of documentation are needed to ensure the code is easy to review and work with, and truly reproducible. +We asked respondents who reported writing code at work how frequently they write different forms of documentation when programming in their current role. ```{r} @@ -437,9 +422,4 @@ CARS::wrap_outputs("doc", plot, table) ``` -## Summary - -The findings above show that the Analysis Function has made great strides towards RAP adoption. Using code for analysis is widespread. Open source coding tools such as R, Python, SQL and git are widely available and used (although this varies by organisation). RAP awareness has increased dramatically and many analysts feel supported to apply RAP in their work. Capability is generally increasing, but there is work to do to ensure analysts can retain these coding skills, especially as they move to more senior positions. - -However, there is still much work to be done before RAP becomes the default way of working. While many analysts write code, few consistently apply RAP principles while doing so. Although most respondents see the value of RAP in their work and are actively implementing it, the data on good practices suggest that teams across the Analysis Function still have a work to do before consistently meeting the minimum RAP standards. From 8a966ed3e60033fffe02090031221e3814b1a0f4 Mon Sep 17 00:00:00 2001 From: Rowley Date: Wed, 24 Jan 2024 15:44:50 +0000 Subject: [PATCH 2/6] Moved summary QA file --- .gitignore | 3 +- quarto/QA/summary_qa.qmd | 330 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 332 insertions(+), 1 deletion(-) create mode 100644 quarto/QA/summary_qa.qmd diff --git a/.gitignore b/.gitignore index 8b19f43..595ee10 100644 --- a/.gitignore +++ b/.gitignore @@ -8,4 +8,5 @@ temp/ *.rda exploratory_scripts/ docs/ -docs/summary_qa.html \ No newline at end of file +quarto/qa/summary_qa.html +quarto/summary_qa_files/ diff --git a/quarto/QA/summary_qa.qmd b/quarto/QA/summary_qa.qmd new file mode 100644 index 0000000..06fa7d1 --- /dev/null +++ b/quarto/QA/summary_qa.qmd @@ -0,0 +1,330 @@ +--- +title: "Summary QA" +execute: + echo: false +output: + html: + self-contained: true +--- + +```{r echo=FALSE, output=FALSE} +library(magrittr) + +data <- CARS::get_tidy_data_file("2023_data.csv") %>% + CARS::rename_cols() %>% + CARS::apply_skip_logic() %>% + CARS::clean_data() %>% + CARS::derive_vars() + + +raw_data <- CARS::get_tidy_data_file("2023_data.csv") %>% + CARS::rename_cols() %>% + CARS::clean_data() %>% + CARS::derive_vars() + +all_wave_data <- CARS::get_all_waves(mode = "file") + +tables <- CARS::summarise_all(data, all_tables = TRUE, sample = TRUE) + +exp_samples <- CARS::sample_sizes(raw_data) + +``` + +### QA checklist: + +* Spelling, grammar and readability +* All charts and tables are present +* All charts have titles, legends and axis labels +* All links work as expected + +In addition, this document can be used to QA the data underlying each of the frequency tables and charts. Denominator checks take the expected sample size based on the raw data following the logic rules of the sample_sizes function, as an additional check for question routing. The expected sample size will vary for each question depending on question streaming rules. Other checks include raw data tables used for percentage calculations, which can be used to cross-check calculations are correct. + +The datasets used in this document are: + +* data: data as used in the final publication, with question skip logic applied +* raw_data: data without question skip logic applied, used to determine the expected sample sizes based on question streaming logic +* all_wave_data: data as used in the final publication for each year, with question skip logic applied + +## Coding frequency and tools +#### Summarise coding frequency +Check data against figure +```{r echo = FALSE} +knitr::kable(tables$code_freq) +``` + +Denominator check: +```{r echo = FALSE} +if(tables$code_freq$sample[1] != exp_samples$all) { + warning("Denominator different from expected") +} else { + print(paste0("Denominator as expected: ", exp_samples$all)) +} +``` +#### Coding frequency over time +Sample size should be the total response for each year. Percentages are calculated within the summary.qmd code. +```{r echo = FALSE} + +all_wave_data$code_freq <- factor(all_wave_data$code_freq, levels = c( + "Never", + "Rarely", + "Sometimes", + "Regularly", + "All the time" +)) + +table(all_wave_data$year, all_wave_data$code_freq) %>% + data.frame %>% + dplyr::group_by(Var1) %>% + dplyr::summarise(sample = sum(Freq)) %>% + knitr::kable() + +``` + +### Access to and knowledge of programming languages +#### Access +Check data against figure, check proportions are correct +```{r echo = FALSE} +knitr::kable(tables$access) +``` +Denominator check: +```{r echo = FALSE} + +if(tables$access$sample[1] != exp_samples$all) { + warning("Denominator different from expected") +} else { + print(paste0("Denominator as expected: ", exp_samples$all)) + +} +``` + +#### Knowledge +Check data against figure, check proportions are correct +```{r echo = FALSE} +knitr::kable(tables$knowledge) +``` +Denominator check: +```{r echo = FALSE} + +if(tables$knowledge$sample[1] != exp_samples$all) { + warning("Denominator different from expected") +} else { + print(paste0("Denominator as expected: ", exp_samples$all)) + +} +``` + +#### Open source capability +Check percentages are correct from the data in the table: +```{r echo = FALSE} +knitr::kable(CARS::summarise_os_vs_prop(all_wave_data)) +``` + +#### Different professions have capability in different tools +Check percentages are correct from the data in the table (final column = group sample size): +```{r echo = FALSE} +knitr::kable(tables$languages_by_prof) +``` +Denominator check - numbers of respondents in each profession, cross check with above: +```{r echo = FALSE} +raw_data %>% + tidyr::pivot_longer(contains("prof"), names_to = "prof", values_to = "value") %>% + dplyr::group_by(prof) %>% + dplyr::summarise(n = sum(value == "Yes")) %>% + knitr::kable() +``` +#### Access to git +Check data against figure, check proportions are correct +```{r echo = FALSE} +knitr::kable(tables$git_access) +``` +Denominator check: +```{r echo = FALSE} + +if(tables$git_access$sample[1] != exp_samples$all) { + warning("Denominator different from expected") +} else { + print(paste0("Denominator as expected: ", exp_samples$all)) +} +``` + +## Capability +#### First learned +Check data against figure, check proportions are correct +```{r echo = FALSE} +knitr::kable(tables$where_learned) +``` +Denominator check: +```{r echo = FALSE} + +if(tables$where_learned$sample[1] != exp_samples$code_at_work) { + warning("Denominator different from expected") +} else { + print(paste0("Denominator as expected: ", exp_samples$code_at_work)) + +} +``` + +#### Ability change +Check data against figure, check proportions are correct +```{r echo = FALSE} +knitr::kable(tables$ability_change) +``` +Denominator check: +```{r echo = FALSE} + +if(tables$ability_change$sample[1] != exp_samples$other_code_experience) { + warning("Denominator different from expected") +} else { + print(paste0("Denominator as expected: ", exp_samples$other_code_experience)) +} +``` + +#### Ability change by frequency +Check data against figure, check proportions are correct +```{r echo = FALSE} +knitr::kable(tables$capability_change_by_freq) +``` + +Sample size check: +```{r echo = FALSE} + +if(tables$capability_change_by_freq$sample[1] != exp_samples$other_code_experience) { + warning("Sample size different from expected") + print(paste0("Expected: ", exp_samples$other_code_experience)) + print(paste0("Actual: ", tables$capability_change_by_freq$sample[1])) +} else { + print(paste0("Sample size as expected: ", exp_samples$other_code_experience)) +} +``` + +## RAP +#### Awareness of RAP +Check that the percentages in the chart and the figures in the text are correct +```{r echo = FALSE} +knitr::kable(CARS::summarise_rap_awareness_over_time(all_wave_data)) +``` + +#### RAP knowledge +Check data against figure, check proportions are correct +```{r echo = FALSE} +knitr::kable(tables$rap_knowledge) +``` + +Denominator check: +```{r echo = FALSE} + +if(tables$rap_knowledge$sample[1] != exp_samples$code_at_work) { + warning("Denominator different from expected") +} else { + print(paste0("Denominator as expected: ", exp_samples$code_at_work)) +} +``` + + +#### RAP champs +Check data against figure, check proportions are correct +```{r echo = FALSE} +knitr::kable(tables$rap_champ_status) +``` + +Denominator check: +```{r echo = FALSE} + +if(tables$rap_champ_status$sample[1] != exp_samples$heard_of_RAP) { + warning("Denominator different from expected") +} else { + print(paste0("Denominator as expected: ", exp_samples$heard_of_RAP)) +} +``` + + +#### RAP strategy knowledge +Check data against figure, check proportions are correct +```{r echo = FALSE} +knitr::kable(tables$strategy_knowledge) +``` + +Denominator check: +```{r echo = FALSE} + +if(tables$strategy_knowledge$sample[1] != exp_samples$heard_of_RAP) { + warning("Denominator different from expected") +} else { + print(paste0("Denominator as expected: ", exp_samples$heard_of_RAP)) +} +``` + + +#### RAP opinions +Check data against figure, check proportions are correct +```{r} +knitr::kable(tables$rap_opinions) +``` + +Denominator check: +```{r echo = FALSE} + +if(tables$rap_opinions$sample[1] != exp_samples$heard_of_RAP) { + warning("Denominator different from expected") + print(paste0("Expected: ", exp_samples$heard_of_RAP)) + print(paste0("Actual: ", tables$rap_opinions[1])) +} else { + print(paste0("Denominator as expected: ", exp_samples$heard_of_RAP)) +} +``` + +### Coding practices +Check data against figure, check proportions are correct +```{r} +knitr::kable(tables$rap_components) +``` + +Denominator check: +In this function, denominator is derived directly from data based on logic rules as below +```{r echo = FALSE} + +if(sum(data$code_freq != "Never", na.rm = TRUE) != exp_samples$code_at_work) { + warning("Denominator different from expected") +} else { + print(paste0("Denominator as expected: ", exp_samples$code_at_work)) + +} +``` + +#### Coding practices: frequency +Check data against figure, check proportions are correct +```{r} +knitr::kable(tables$coding_practices) +``` + +Denominator check: +```{r} + +if(tables$coding_practices$sample[1] != exp_samples$code_at_work) { + warning("Denominator different from expected") + print(paste0("Expected: ", exp_samples$code_at_work)) + print(paste0("Actual: ", tables$coding_practices$sample[1])) +} else { + print(paste0("Denominator as expected: ", exp_samples$code_at_work)) +} + +``` + +#### Documentation +Check data against figure, check proportions are correct +```{r} +knitr::kable(tables$doc) +``` + +Denominator check: +```{r} + +if(tables$doc$sample[1] != exp_samples$code_at_work) { + warning("Denominator different from expected") + print(paste0("Expected: ", exp_samples$code_at_work)) + print(paste0("Actual: ", tables$doc$sample[1])) +} else { + print(paste0("Denominator as expected: ", exp_samples$code_at_work)) +} + +``` From a2d13e6df3189329357ebe77cfac07affcf762c7 Mon Sep 17 00:00:00 2001 From: Rowley Date: Wed, 24 Jan 2024 15:46:02 +0000 Subject: [PATCH 3/6] Moved summary QA file --- quarto/main/summary_qa.qmd | 328 ------------------------------------- 1 file changed, 328 deletions(-) delete mode 100644 quarto/main/summary_qa.qmd diff --git a/quarto/main/summary_qa.qmd b/quarto/main/summary_qa.qmd deleted file mode 100644 index 6cd91d0..0000000 --- a/quarto/main/summary_qa.qmd +++ /dev/null @@ -1,328 +0,0 @@ ---- -title: "Summary QA" -output: - html: - self-contained: true ---- - -```{r echo=FALSE} -library(magrittr) - -data <- CARS::get_tidy_data_file("2023_data.csv") %>% - CARS::rename_cols() %>% - CARS::apply_skip_logic() %>% - CARS::clean_data() %>% - CARS::derive_vars() - - -raw_data <- CARS::get_tidy_data_file("2023_data.csv") %>% - CARS::rename_cols() %>% - CARS::clean_data() %>% - CARS::derive_vars() - -all_wave_data <- CARS::get_all_waves(mode = "file") - -tables <- CARS::summarise_all(data, all_tables = TRUE, sample = TRUE) - -exp_samples <- CARS::sample_sizes(raw_data) - -``` - -### QA checklist: - -* Spelling, grammar and readability -* All charts and tables are present -* All charts have titles, legends and axis labels -* All links work as expected - -In addition, this document can be used to QA the data underlying each of the frequency tables and charts. Denominator checks take the expected sample size based on the raw data following the logic rules of the sample_sizes function, as an additional check for question routing. The expected sample size will vary for each question depending on question streaming rules. Other checks include raw data tables used for percentage calculations, which can be used to cross-check calculations are correct. - -The datasets used in this document are: - -* data: data as used in the final publication, with question skip logic applied -* raw_data: data without question skip logic applied, used to determine the expected sample sizes based on question streaming logic -* all_wave_data: data as used in the final publication for each year, with question skip logic applied - -## Coding frequency and tools -#### Summarise coding frequency -Check data against figure -```{r echo = FALSE} -knitr::kable(tables$code_freq) -``` - -Denominator check: -```{r echo = FALSE} -if(tables$code_freq$sample[1] != exp_samples$all) { - warning("Denominator different from expected") -} else { - print(paste0("Denominator as expected: ", exp_samples$all)) -} -``` -#### Coding frequency over time -Sample size should be the total response for each year. Percentages are calculated within the summary.qmd code. -```{r echo = FALSE} - -all_wave_data$code_freq <- factor(all_wave_data$code_freq, levels = c( - "Never", - "Rarely", - "Sometimes", - "Regularly", - "All the time" -)) - -table(all_wave_data$year, all_wave_data$code_freq) %>% - data.frame %>% - dplyr::group_by(Var1) %>% - dplyr::summarise(sample = sum(Freq)) %>% - knitr::kable() - -``` - -### Access to and knowledge of programming languages -#### Access -Check data against figure, check proportions are correct -```{r echo = FALSE} -knitr::kable(tables$access) -``` -Denominator check: -```{r echo = FALSE} - -if(tables$access$sample[1] != exp_samples$all) { - warning("Denominator different from expected") -} else { - print(paste0("Denominator as expected: ", exp_samples$all)) - -} -``` - -#### Knowledge -Check data against figure, check proportions are correct -```{r echo = FALSE} -knitr::kable(tables$knowledge) -``` -Denominator check: -```{r echo = FALSE} - -if(tables$knowledge$sample[1] != exp_samples$all) { - warning("Denominator different from expected") -} else { - print(paste0("Denominator as expected: ", exp_samples$all)) - -} -``` - -#### Open source capability -Check percentages are correct from the data in the table: -```{r} -knitr::kable(CARS::summarise_os_vs_prop(all_wave_data)) -``` - -#### Different professions have capability in different tools -Check percentages are correct from the data in the table (final column = group sample size): -```{r echo = FALSE} -knitr::kable(tables$languages_by_prof) -``` -Denominator check - numbers of respondents in each profession, cross check with above: -```{r} -raw_data %>% - tidyr::pivot_longer(contains("prof"), names_to = "prof", values_to = "value") %>% - dplyr::group_by(prof) %>% - dplyr::summarise(n = sum(value == "Yes")) %>% - knitr::kable() -``` -#### Access to git -Check data against figure, check proportions are correct -```{r} -knitr::kable(tables$git_access) -``` -Denominator check: -```{r echo = FALSE} - -if(tables$git_access$sample[1] != exp_samples$all) { - warning("Denominator different from expected") -} else { - print(paste0("Denominator as expected: ", exp_samples$all)) -} -``` - -## Capability -#### First learned -Check data against figure, check proportions are correct -```{r echo = FALSE} -knitr::kable(tables$where_learned) -``` -Denominator check: -```{r echo = FALSE} - -if(tables$where_learned$sample[1] != exp_samples$code_at_work) { - warning("Denominator different from expected") -} else { - print(paste0("Denominator as expected: ", exp_samples$code_at_work)) - -} -``` - -#### Ability change -Check data against figure, check proportions are correct -```{r echo = FALSE} -knitr::kable(tables$ability_change) -``` -Denominator check: -```{r echo = FALSE} - -if(tables$ability_change$sample[1] != exp_samples$other_code_experience) { - warning("Denominator different from expected") -} else { - print(paste0("Denominator as expected: ", exp_samples$other_code_experience)) -} -``` - -#### Ability change by frequency -Check data against figure, check proportions are correct -```{r} -knitr::kable(tables$capability_change_by_freq) -``` - -Sample size check: -```{r echo = FALSE} - -if(tables$capability_change_by_freq$sample[1] != exp_samples$other_code_experience) { - warning("Sample size different from expected") - print(paste0("Expected: ", exp_samples$other_code_experience)) - print(paste0("Actual: ", tables$capability_change_by_freq$sample[1])) -} else { - print(paste0("Sample size as expected: ", exp_samples$other_code_experience)) -} -``` - -## RAP -#### Awareness of RAP -Check that the percentages in the chart and the figures in the text are correct -```{r} -knitr::kable(CARS::summarise_rap_awareness_over_time(all_wave_data)) -``` - -#### RAP knowledge -Check data against figure, check proportions are correct -```{r} -knitr::kable(tables$rap_knowledge) -``` - -Denominator check: -```{r echo = FALSE} - -if(tables$rap_knowledge$sample[1] != exp_samples$code_at_work) { - warning("Denominator different from expected") -} else { - print(paste0("Denominator as expected: ", exp_samples$code_at_work)) -} -``` - - -#### RAP champs -Check data against figure, check proportions are correct -```{r} -knitr::kable(tables$rap_champ_status) -``` - -Denominator check: -```{r echo = FALSE} - -if(tables$rap_champ_status$sample[1] != exp_samples$heard_of_RAP) { - warning("Denominator different from expected") -} else { - print(paste0("Denominator as expected: ", exp_samples$heard_of_RAP)) -} -``` - - -#### RAP strategy knowledge -Check data against figure, check proportions are correct -```{r} -knitr::kable(tables$strategy_knowledge) -``` - -Denominator check: -```{r echo = FALSE} - -if(tables$strategy_knowledge$sample[1] != exp_samples$heard_of_RAP) { - warning("Denominator different from expected") -} else { - print(paste0("Denominator as expected: ", exp_samples$heard_of_RAP)) -} -``` - - -#### RAP opinions -Check data against figure, check proportions are correct -```{r} -knitr::kable(tables$rap_opinions) -``` - -Denominator check: -```{r echo = FALSE} - -if(tables$rap_opinions$sample[1] != exp_samples$heard_of_RAP) { - warning("Denominator different from expected") - print(paste0("Expected: ", exp_samples$heard_of_RAP)) - print(paste0("Actual: ", tables$rap_opinions[1])) -} else { - print(paste0("Denominator as expected: ", exp_samples$heard_of_RAP)) -} -``` - -### Coding practices -Check data against figure, check proportions are correct -```{r} -knitr::kable(tables$rap_components) -``` - -Denominator check: -In this function, denominator is derived directly from data based on logic rules as below -```{r echo = FALSE} - -if(sum(data$code_freq != "Never", na.rm = TRUE) != exp_samples$code_at_work) { - warning("Denominator different from expected") -} else { - print(paste0("Denominator as expected: ", exp_samples$code_at_work)) - -} -``` - -#### Coding practices: frequency -Check data against figure, check proportions are correct -```{r} -knitr::kable(tables$coding_practices) -``` - -Denominator check: -```{r} - -if(tables$coding_practices$sample[1] != exp_samples$code_at_work) { - warning("Denominator different from expected") - print(paste0("Expected: ", exp_samples$code_at_work)) - print(paste0("Actual: ", tables$coding_practices$sample[1])) -} else { - print(paste0("Denominator as expected: ", exp_samples$code_at_work)) -} - -``` - -#### Documentation -Check data against figure, check proportions are correct -```{r} -knitr::kable(tables$doc) -``` - -Denominator check: -```{r} - -if(tables$doc$sample[1] != exp_samples$code_at_work) { - warning("Denominator different from expected") - print(paste0("Expected: ", exp_samples$code_at_work)) - print(paste0("Actual: ", tables$doc$sample[1])) -} else { - print(paste0("Denominator as expected: ", exp_samples$code_at_work)) -} - -``` From b62bfe2577b2c458943cf179440f77ddcf9fffc2 Mon Sep 17 00:00:00 2001 From: Rowley Date: Wed, 24 Jan 2024 15:59:21 +0000 Subject: [PATCH 4/6] excluded current employment from cap change freq tables --- R/frequency-tables.R | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/R/frequency-tables.R b/R/frequency-tables.R index 2341f7f..f28e58e 100644 --- a/R/frequency-tables.R +++ b/R/frequency-tables.R @@ -600,6 +600,8 @@ summarise_ability_change <- function(data, sample = FALSE) { stop("unexpected_input: no column called 'coding_ability_change'") } + data <- data[data$first_learned != "Current employment", ] + questions <- "coding_ability_change" levels <- c("It has become significantly worse", @@ -790,7 +792,7 @@ summarise_cap_change_by_freq <- function(data, sample = FALSE){ col2 <- "coding_ability_change" - data <- dplyr::filter(data, (code_freq != "Never" & other_coding_experience == "Yes")) + data <- dplyr::filter(data, (code_freq != "Never" & other_coding_experience == "Yes" & data$first_learned != "Current employment")) levels1 <- c( "Rarely", @@ -826,6 +828,8 @@ summarise_cap_change_by_line_manage <- function(data){ col2 <- "coding_ability_change" + data <- dplyr::filter(data, (code_freq != "Never" & other_coding_experience == "Yes" & data$first_learned != "Current employment")) + levels1 <- c("Yes", "No - I manage people who do not write code", "No - I don't line manage anyone") @@ -858,6 +862,8 @@ summarise_cap_change_by_CS_grade <- function(data){ col2 <- "coding_ability_change" + data <- dplyr::filter(data, (code_freq != "Never" & other_coding_experience == "Yes" & data$first_learned != "Current employment")) + levels1 <- c("Higher Executive Officer (or equivalent)", "Senior Executive Officer (or equivalent)", "Grade 6 and 7") From c305b46aced1f600fd190d115ec5261f402deed2 Mon Sep 17 00:00:00 2001 From: Rowley Date: Wed, 24 Jan 2024 17:33:16 +0000 Subject: [PATCH 5/6] Updated tests for cap change functions --- R/frequency-tables.R | 6 +---- .../testthat/test-summarise_ability_change.R | 23 +++++++++++++------ .../test-summarise_cap_change_by_freq.R | 14 +++++++---- 3 files changed, 27 insertions(+), 16 deletions(-) diff --git a/R/frequency-tables.R b/R/frequency-tables.R index f28e58e..30c6492 100644 --- a/R/frequency-tables.R +++ b/R/frequency-tables.R @@ -70,7 +70,7 @@ sample_sizes <- function(data) { list( all = nrow(data), code_at_work = sum(!is.na(data$code_freq) & data$code_freq != "Never"), - other_code_experience = sum(!is.na(data$code_freq) & data$code_freq != "Never" & data$other_coding_experience == "Yes"), + other_code_experience = sum(!is.na(data$code_freq) & data$code_freq != "Never" & data$other_coding_experience == "Yes" & data$first_learned != "Current employment"), heard_of_RAP = sum(!is.na(data$code_freq) & data$code_freq != "Never" & data$heard_of_RAP == "Yes"), not_RAP_champ = sum(is.na(data$know_RAP_champ) | data$know_RAP_champ != "I am a RAP champion"), @@ -828,8 +828,6 @@ summarise_cap_change_by_line_manage <- function(data){ col2 <- "coding_ability_change" - data <- dplyr::filter(data, (code_freq != "Never" & other_coding_experience == "Yes" & data$first_learned != "Current employment")) - levels1 <- c("Yes", "No - I manage people who do not write code", "No - I don't line manage anyone") @@ -862,8 +860,6 @@ summarise_cap_change_by_CS_grade <- function(data){ col2 <- "coding_ability_change" - data <- dplyr::filter(data, (code_freq != "Never" & other_coding_experience == "Yes" & data$first_learned != "Current employment")) - levels1 <- c("Higher Executive Officer (or equivalent)", "Senior Executive Officer (or equivalent)", "Grade 6 and 7") diff --git a/tests/testthat/test-summarise_ability_change.R b/tests/testthat/test-summarise_ability_change.R index 3d2aeca..781d75d 100644 --- a/tests/testthat/test-summarise_ability_change.R +++ b/tests/testthat/test-summarise_ability_change.R @@ -1,10 +1,19 @@ -dummy_data <- data.frame(coding_ability_change = c(NA, - rep("It has become significantly worse", 2), - rep("It has become slightly worse", 3), - rep("It has stayed the same", 4), - rep("It has become slightly better", 5), - rep("It has become significantly better", 6))) +dummy_data <- data.frame(first_learned =rep(c(NA, + "Current employment", + "Education", + "Previous public sector employment", + "Previous private sector employment", + "Other"), + times = 6), + coding_ability_change = rep(c(NA, + "It has become significantly worse", + "It has become slightly worse", + "It has stayed the same", + "It has become slightly better", + "It has become significantly better"), + each = 6) +) test_that("summarise_ability_change validation works", { @@ -36,7 +45,7 @@ test_that("summarise_ability_change output is as expected", { "Stayed the same", "Slightly better", "Significantly better")), - n=c(0.10, 0.15, 0.20, 0.25, 0.30)) + n=c(0.2, 0.2, 0.2, 0.2, 0.2)) expect_equal(got, expected) }) diff --git a/tests/testthat/test-summarise_cap_change_by_freq.R b/tests/testthat/test-summarise_cap_change_by_freq.R index 7ed025c..2af697d 100644 --- a/tests/testthat/test-summarise_cap_change_by_freq.R +++ b/tests/testthat/test-summarise_cap_change_by_freq.R @@ -7,21 +7,27 @@ dummy_data <- data.frame( "It has stayed the same", "It has become slightly better", "It has become significantly better"), - each = 15), + each = 90), code_freq = rep(c( NA, "Sometimes", "All the time", "Rarely", "Regularly"), - times = 18), + times = 108), other_coding_experience = rep(c( NA, "Yes", "No"), - times = 30 + times = 180), + first_learned = rep(c(NA, + "Current employment", + "Education", + "Previous public sector employment", + "Previous private sector employment", + "Other"), + times =90) ) -) test_that("summarise_cap_change_by_freq missing data is handled correctly", { From 61667ee1ff23b8ad0d7bba13c2f81a6423ff09cd Mon Sep 17 00:00:00 2001 From: Rowley Date: Wed, 24 Jan 2024 17:48:13 +0000 Subject: [PATCH 6/6] Updated text on summary quarto --- quarto/main/summary.qmd | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/quarto/main/summary.qmd b/quarto/main/summary.qmd index 07be6f9..0870e5c 100644 --- a/quarto/main/summary.qmd +++ b/quarto/main/summary.qmd @@ -272,9 +272,9 @@ CARS::wrap_outputs("rap-knowledge", plot, table) ``` -### Awareness of RAP Champions +### RAP Champions -We asked respondents who had heard of RAP, if they knew who their RAP champions are. +We asked respondents who had heard of RAP, if their department has a RAP champion and if they know who it is. [RAP champions](https://analysisfunction.civilservice.gov.uk/support/reproducible-analytical-pipelines/reproducible-analytical-pipeline-rap-champions/) support and promote the use of RAP across government. @@ -283,7 +283,7 @@ Please [contact the analysis standards and pipelines team](mailto:asap@ons.gov.u ```{r} -plot <- CARS::plot_freqs(tables$rap_champ_status, n = samples$heard_of_RAP, break_q_names_col = "value", max_lines = 2, xlab = "Heard of RAP champions?", font_size = 14, orientation = "h") +plot <- CARS::plot_freqs(tables$rap_champ_status, n = samples$heard_of_RAP, break_q_names_col = "value", max_lines = 2, xlab = "Department RAP champions?", font_size = 14, orientation = "h") table <- CARS::df_to_table(tables$rap_champ_status, n = samples$heard_of_RAP, column_headers = c("Knowledge", "Percent")) CARS::wrap_outputs("rap-champ-status", plot, table)