From ec864dc0f0670e8b6cafb379eadbb714e2e694dc Mon Sep 17 00:00:00 2001
From: chelseamcg <134400730+chelseamcg@users.noreply.github.com>
Date: Wed, 13 Nov 2024 09:20:57 +0000
Subject: [PATCH] Add files via upload

---
 rmarkdown/Page_2_Question_Bank_Use.rmd  |  73 +++
 rmarkdown/Page_3_Quality_Dimensions.rmd | 839 ++++++++++++++++++++++++
 rmarkdown/Page_4_Data_Linkage.rmd       | 114 ++++
 rmarkdown/Page_5_Accessibility.rmd      |  40 ++
 rmarkdown/_navbar.html                  |  30 +
 rmarkdown/_site.yml                     |  18 +
 rmarkdown/index.rmd                     |  82 +++
 rmarkdown/question_bank.css             |  22 +
 8 files changed, 1218 insertions(+)
 create mode 100644 rmarkdown/Page_2_Question_Bank_Use.rmd
 create mode 100644 rmarkdown/Page_3_Quality_Dimensions.rmd
 create mode 100644 rmarkdown/Page_4_Data_Linkage.rmd
 create mode 100644 rmarkdown/Page_5_Accessibility.rmd
 create mode 100644 rmarkdown/_navbar.html
 create mode 100644 rmarkdown/_site.yml
 create mode 100644 rmarkdown/index.rmd
 create mode 100644 rmarkdown/question_bank.css
diff --git a/rmarkdown/Page_2_Question_Bank_Use.rmd b/rmarkdown/Page_2_Question_Bank_Use.rmd
new file mode 100644
index 0000000..aef3cd0
--- /dev/null
+++ b/rmarkdown/Page_2_Question_Bank_Use.rmd
@@ -0,0 +1,73 @@
+---
+title: "How to use the question bank"
+output:
+  html_document:
+    css: "question_bank.css"
+    toc: yes
+    toc_depth: 4
+    toc_float:
+      collapsed: yes
+  pdf_document:
+    toc: yes
+    toc_depth: '4'
+---
+
+```{r global-options, include=FALSE}
+# Set echo=false for all chunks
+knitr::opts_chunk$set(echo=FALSE)
+```
+
+---
+
+## Question bank structure and how to use 
+
+
+This question bank groups questions into sections, some of which having further sub-sections. These sections and sub-sections are as follows: 
+
+ 
+
+1. Quality dimensions as defined by the Data Management Association UK (DAMA), paired into the following: 
+
+    * accuracy and validity 
+
+    * completeness and uniqueness 
+
+    * consistency and timeliness 
+
+ 
+
+2. Data linkage 
+
+
+
+These sections and their sub-sections have been chosen to give you a wide selection of questions to gain further insights into administrative data quality. According to the [Code of Practice for Statistics](https://code.statisticsauthority.gov.uk/), “quality means that statistics fit their intended uses, are based on appropriate data and methods, and are not materially misleading.” 
+
+In essence, quality centers around a consideration of fitness for purpose, including: 
+
++ Are the data good enough for what I want to use it for? 
+
++ Did the statistic I produce meet the needs of the people who are using it?  
+
+<br>
+ 
+The questions in this question bank can be used to assess the different aspects of fitness for your use. Very rarely will there be data that is completely perfect for statistical and research purposes. Understanding which dimensions are important for your specific uses will help you when deciding if data are fit for purpose. To this end, the question bank has been designed to be flexible, and your approach can be tailored in proportion to your needs, for example, by making them relevant to the variables you are interested in. These questions provide a structure into assessing the data’s fitness for purpose and ensuring that you cover the key issues to help understand the data’s quality.  
+
+
+The first key step is to identify what dataset you are interested in and wish to assess using the set of questions in this question bank. You can then use the questions in this The Administrative Data Quality Question Bank (ADQQB) and tailor these to find out more about that specific dataset. 
+
+
+The Administrative Data Quality Question Bank (ADQQB) focuses on assessing quality of data at input data level. Input data level refers to the point at which your organisation receives the data. Quality at this stage refers to how well the data fits the purpose(s) you want to use them for. Essentially, this could be suitability of the data to produce statistics, or suitability of the data to carry out analysis or research. 
+
+<br>
+
+We have included questions on the [DAMA quality dimensions](https://www.gov.uk/government/news/meet-the-data-quality-dimensions) because these are dimensions that are widely used across the government to assess if data is good enough to use, or whether improvements need to be made. We have also chosen to include questions on data linkage in data collection and production as answers to these questions could supply further information and context around how the data are produced. It also provides a reminder that you should check-in regularly with the data suppliers regarding any changes that may affect the resulting data. 
+
+
+As we further develop the Administrative Data Quality Question Bank (ADQQB)  beyond the current publication, we will add further sections. These will include output data quality: "how well your ‘final’ output meets your users’ needs”. This will be done through integrating relevant dimensions from the European Statistical System’s (ESS) dimensions of quality. 
+
+
+<br>
+
+ 
+
+
diff --git a/rmarkdown/Page_3_Quality_Dimensions.rmd b/rmarkdown/Page_3_Quality_Dimensions.rmd
new file mode 100644
index 0000000..0f83b5a
--- /dev/null
+++ b/rmarkdown/Page_3_Quality_Dimensions.rmd
@@ -0,0 +1,839 @@
+---
+title: "Quality dimensions"
+output:
+  html_document:
+    css: "question_bank.css"
+    toc: yes
+    toc_depth: 4
+    toc_float:
+      collapsed: yes
+  pdf_document:
+    toc: yes
+    toc_depth: '4'
+---
+
+```{r global-options, include=FALSE}
+# Set echo=false for all chunks
+knitr::opts_chunk$set(echo=FALSE)
+```
+
+---
+
+This question bank includes data quality themes as defined by the Data Management Association UK (DAMA) dimensions outlined in [The Government Data Quality Framework](https://www.gov.uk/government/publications/the-government-data-quality-framework/the-government-data-quality-framework#Data-quality-dimensions). These dimensions and definitions used are the same as those outlined in our [Administrative Data Quality Framework (ADQF)](https://analysisfunction.civilservice.gov.uk/policy-store/quality-of-administrative-data-in-statistics/). The dimensions covered are: 
+
++ accuracy 
+
++ validity  
+
++ completeness  
+
++ uniqueness 
+
++ consistency 
+
++ timeliness
+
+We have used these because they were developed by experts in data quality to assess the fitness for purpose of data. Finding which dimensions are important for you will help you make decisions around how fit for purpose the data are for your needs. 
+
+In future publications of this question bank, we intend to include questions based on relevant selected principles from the [European Statistics Code of Practice](https://ec.europa.eu/eurostat/web/quality/european-quality-standards/european-statistics-code-of-practice). The principles covered will be:
+
++ relevance: coverage, content, purpose and collection  
+
++ accessibility and clarity: accessing the data, data format, availability of supporting information, quality and sufficiency of metadata, illustrations and accompanying advice  
+
+Before going into the questions for each data quality dimension, we have provided some general questions for you, which can be used to ensure that you have a fundamental understanding of the data and its’ quality.  
+
+## Questions to ask to gain insights into the data's quality in general 
+
+
+<table class="simpleTable">
+<TR VALIGN=TOP>
+    <th scope = "col"style="width: 8%;">   </th>
+    <th scope = "col"> </th>
+
+</tr>
+
+ <tr>
+  <td align="center">  Q1  </td>
+  <td> How are the data from this dataset collected? For example, through public contact with services over the phone, registration forms, etc. </td>
+ </tr>
+ 
+ </tr>
+ 
+ <tr>
+  <td align="center">  Q2  </td>
+  <td> What organisation(s) collects the data? </td>
+ </tr>
+ 
+  <tr>
+  <td align="center">  Q3  </td>
+  <td> Are there different organisations which collect different data or variables in the dataset?  For example, where one organisation is responsible for collecting income-related data, and another organisation is responsible for collecting demographic data, and this data is combined to create one composite dataset </td>
+ </tr>
+ 
+  <tr>
+  <td align="center">  Q4  </td>
+  <td> If so, which data is collected by which organisation? </td>
+ </tr>
+
+  <tr>
+  <td align="center">  Q5  </td>
+  <td> Is the data collected differently?</td>
+ </tr>
+ 
+   <tr>
+  <td align="center">  Q6  </td>
+  <td> Does this have an impact on the quality? </td>
+ </tr>
+
+
+  <tr>
+  <td align="center">  Q7  </td>
+  <td> How do suppliers quality assure the data?</td>
+ </tr>  
+
+  
+  <tr>
+  <td align="center">  Q8  </td>
+  <td> Are there any known quality issues? </td>
+ </tr>  
+ 
+
+  <tr>
+  <td align="center">  Q9  </td>
+  <td> What thresholds have the suppliers put in place regarding the data's quality? For example, an acceptable number of duplicate records, or an acceptable amount of missing data. </td>
+ </tr>  
+
+  
+  <tr>
+  <td align="center">  Q10  </td>
+  <td> How is the quality for this dataset documented? </td>
+ </tr>  
+
+  
+  <tr>
+  <td align="center">  Q11  </td>
+  <td> Are there any supplementary documents related to the dataset that can be shared? For example, a data dictionary, a metadata list.  </td>
+ </tr>  
+
+  
+  <tr>
+  <td align="center">  Q12  </td>
+  <td> Are there training manuals related to the work that can be shared? For example, for coding, updating or maintaining the dataset. </td>
+ </tr> 
+ 
+  </table>
+
+
+<br>
+
+# Accuracy and Validity
+
+
+
+## Accuracy and validity definition 
+
+Administrative data accuracy refers to how well the data match reality - do the data capture what you are trying to measure? 
+
+Valid data is defined as the extent to which the data conform to the expected format, type, and range. For example, an email address must have an ‘@’ symbol. 
+
+
+## Questions to ask to gain insights into accuracy of data  
+
+<table class="simpleTable">
+<TR VALIGN=TOP>
+    <th scope = "col"style="width: 8%;">   </th>
+    <th scope = "col"> </th>
+
+</tr>
+
+ <tr>
+  <td align="center">  Q13  </td>
+  <td> How accurate are the supplied data? </td>
+ </tr>
+
+ <tr>
+  <td align="center">  Q14  </td>
+  <td> How well do the data meet the statistical use? </td>
+ </tr>  
+
+  <tr>
+  <td align="center">  Q15 </td>
+  <td> How accurate are the items, or variables in the supplied data?  </td>
+ </tr> 
+
+ <tr>
+  <td align="center">  Q16  </td>
+  <td> How accurate are the units, or records in the supplied data?  </td>
+ </tr> 
+  
+ <tr>
+  <td align="center">  Q17  </td>
+  <td> What are the accuracy issues in the supplied data? </td>
+ </tr> 
+  
+ <tr>
+  <td align="center">  Q18 </td>
+  <td> If there are accuracy issues, how are they identified? For example, through a formal auditing process, or an automatic flagging system.  </td>
+ </tr> 
+  
+ <tr>
+  <td align="center">  Q19  </td>
+  <td> What methods are implemented by the suppliers to prevent any accuracy issues? For example, checks built into the data collection instrument. </td>
+ </tr> 
+  
+ <tr>
+  <td align="center">  Q20  </td>
+  <td> If there are accuracy issues, how are they resolved by the suppliers? And to which variables and types of records? </td>
+ </tr> 
+  
+
+ <tr>
+  <td align="center">  Q21  </td>
+  <td> What data accuracy issues are not addressed?  </td>
+ </tr>
+ 
+ <tr>
+  <td align="center">  Q22  </td>
+  <td> Why are the issues not addressed? </td>
+ </tr>
+ 
+ <tr>
+  <td align="center">  Q23  </td>
+  <td> What happens to data accuracy issues that are not addressed? For example, logged or reported to a specific team.  </td>
+ </tr>
+ 
+ <tr>
+  <td align="center">  Q24  </td>
+  <td> How are users of the data informed about these data accuracy issues?  </td>
+ </tr>
+
+</table>
+ 
+
+### Invalid entry questions
+
+<table class="simpleTable">
+<TR VALIGN=TOP>
+    <th scope = "col"style="width: 8%;">   </th>
+    <th scope = "col"> </th>
+    
+</tr>
+ <tr>
+  <td align="center">  Q25 </td>
+  <td>What are the types of invalid data entries in the data?  </td>
+ </tr>
+ 
+ <tr>
+  <td align="center">  Q26  </td>
+  <td> How many invalid entries are there in the data?  </td>
+ </tr>
+ 
+ <tr>
+  <td align="center">  Q27  </td>
+  <td>What variables have invalid data entries?  </td>
+ </tr>
+ 
+ <tr>
+  <td align="center">  Q28  </td>
+  <td> What types of records have invalid data entries? </td>
+ </tr>
+ 
+ <tr>
+  <td align="center">  Q29  </td>
+  <td>What methods are used to identify invalid data entries? </td>
+ </tr>
+ 
+ <tr>
+  <td align="center">  Q30  </td>
+  <td>What methods are used to resolve invalid data entries?  </td>
+ </tr>
+ 
+
+</table>
+
+
+### Error, typos or mistakes questions
+
+<table class="simpleTable">
+<TR VALIGN=TOP>
+    <th scope = "col"style="width: 8%;">   </th>
+    <th scope = "col"> </th>
+    
+</tr>
+ <tr>
+  <td align="center">  Q31 </td>
+  <td>What kinds of errors, typos or mistakes, are there in the data? </td>
+ </tr>
+ 
+ </tr>
+ <tr>
+  <td align="center">  Q32 </td>
+  <td>Which variables have typos, errors or mistakes? </td>
+ </tr>
+ 
+ </tr>
+ <tr>
+  <td align="center">  Q33 </td>
+  <td>Which types of records have typos, errors or mistakes? </td>
+ </tr>
+ 
+ </tr>
+ <tr>
+  <td align="center">  Q34 </td>
+  <td>What are the causes of these errors, typos or mistakes in the data?</td>
+ </tr>
+ 
+ </tr>
+ <tr>
+  <td align="center">  Q35 </td>
+  <td>How are errors, typos or mistakes identified in the data? </td>
+ </tr>
+ 
+ </tr>
+ <tr>
+  <td align="center">  Q36 </td>
+  <td>How are errors, typos or mistakes in the data resolved?</td>
+ </tr>
+
+</table> 
+
+
+<br>
+
+# Completeness and Uniqueness
+
+
+
+## Completeness definition 
+ 
+Completeness describes the degree to which all values within each variable are present (or absent of blank, null or empty values). Completeness applies both at data item (variable) level, and unit (record) level. At a data item level, you may have an individual’s value missing, for example a date of birth, from their record within a dataset. Alternatively, at a unit level, a full record may be missing; that individual is missing from the dataset entirely. 
+
+
+Depending on the completeness of your data there may be under-coverage or over-coverage. Please see [the ‘Completeness’ section of the Administrative Data Quality Framework](https://best-practice-and-impact.github.io/admin-data-quality-stats/departments.html#completeness) for more information on over and under-coverage. 
+
+
+To assess completeness, you will need to identify how many items or records are missing versus present. This dimension is sometimes described in terms of “missingness” rather than “completeness”, but the quality issue is the same. Data are ‘complete’ when all the data required for your purposes are both present and available for use. This does not mean your data needs 100% of the fields to be complete, but that the values and units you need are present. A ‘complete’ dataset may still be inaccurate if it has values that are not correct. 
+ 
+## Questions to ask to gain insights into completeness of data
+
+### Unit completeness questions
+
+<table class="simpleTable">
+<TR VALIGN=TOP>
+    <th scope = "col"style="width: 8%;">   </th>
+    <th scope = "col"> </th>
+    
+</tr>
+ <tr>
+  <td align="center">  Q37 </td>
+  <td>How complete or incomplete are the data?  </td>
+ </tr>
+
+
+ <tr>
+  <td align="center">  Q38 </td>
+  <td>How many records in the data are considered complete, or to have good coverage?   </td>
+ </tr>
+ 
+
+
+  <tr>
+  <td align="center">  Q39 </td>
+  <td>What types of records need to be in the data to be considered complete?  </td>
+ </tr>
+
+
+  <tr>
+  <td align="center">  Q40 </td>
+  <td>What types of records are missing from the data where they should be included? </td>
+ </tr>
+ 
+ 
+  <tr>
+  <td align="center">  Q41 </td>
+  <td>Why are they missing?  </td>
+ </tr>
+
+ 
+
+ <tr>
+  <td align="center">  Q42 </td>
+  <td>What types of records are included in the data where they should not be?  </td>
+ </tr>
+ 
+ <tr>
+  <td align="center">  Q43 </td>
+  <td>Why are these included?  </td>
+ </tr>
+
+ 
+ <tr>
+  <td align="center">  Q44 </td>
+  <td>How are records missing from the data identified as missing?  </td>
+ </tr>
+
+ <tr>
+  <td align="center">  Q45 </td>
+  <td>How are records missing in the data resolved?  </td>
+ </tr>
+ 
+ 
+ <tr>
+  <td align="center">  Q46 </td>
+  <td>How are records that are wrongly included in the data, identified? </td>
+ </tr>
+
+ 
+ <tr>
+  <td align="center">  Q47 </td>
+  <td>How are records, that are wrongly included in the data, resolved?  </td>
+ </tr>
+
+
+ <tr>
+  <td align="center">  Q48 </td>
+  <td>Unit imputation is when missing data are replaced with a record or unit.  Are any records in the data supplied, imputed records? </td>
+ </tr>
+ 
+ 
+ <tr>
+  <td align="center">  Q49 </td>
+  <td>Why are these records imputed?  </td>
+ </tr>
+
+ <tr>
+  <td align="center">  Q50 </td>
+  <td>How are they imputed? </td>
+ </tr>
+ 
+
+ <tr>
+  <td align="center">  Q51 </td>
+  <td>What changes have been made to exclusion and inclusion criteria in the data over time? For example, due to policy changes. </td>
+ </tr>
+ 
+</table>
+
+
+
+### Item completeness questions
+
+<table class="simpleTable">
+<TR VALIGN=TOP>
+    <th scope = "col"style="width: 8%;">   </th>
+    <th scope = "col"> </th>
+    
+</tr>
+ <tr>
+  <td align="center">  Q52 </td>
+  <td> Which variables or values have missing data?  </td>
+ </tr>
+
+ <tr>
+  <td align="center">  Q53 </td>
+  <td>If there are missing data in variables or values, are there any particular types of records that have data within variables or values missing?  </td>
+ </tr> 
+
+
+  <tr>
+  <td align="center">  Q54 </td>
+  <td> How are missing data within variables or values identified? </td>
+ </tr>
+ 
+
+ <tr>
+  <td align="center">  Q55 </td>
+  <td> How are missing data within variables or values resolved?</td>
+ </tr> 
+ 
+
+ <tr>
+  <td align="center">  Q56 </td>
+  <td>How are data, variables or values that are wrongly included in the dataset, identified?  </td>
+ </tr> 
+
+
+  <tr>
+  <td align="center">  Q57 </td>
+  <td> How are data, variables or values that are wrongly included in the dataset, resolved?</td>
+ </tr>
+ 
+
+  <tr>
+  <td align="center">  Q58 </td>
+  <td>Item imputation is when missing data are replaced with a value or variable. Which variables, or values in the data are imputed? </td>
+ </tr>
+ 
+ 
+ <tr>
+  <td align="center">  Q59 </td>
+  <td>Why are these variables, or values imputed?  </td>
+ </tr>
+
+
+  <tr>
+  <td align="center">  Q60 </td>
+  <td>How are they imputed? </td>
+ </tr>
+ 
+</table>
+ 
+<br>
+
+
+## Uniqueness definition
+
+Uniqueness describes the degree to which there is no duplication in records. This means that the data contains only one record for each entity it represents, and each value is stored only once. 
+
+ 
+Data are unique if it appears only once in a dataset. A record can be a duplicate even if it has some fields that are different. For example, a person may have two patient records with matching information in some fields (for example, name and date of birth) but may have different addresses and contact numbers in each record, therefore they are treated as two separate people. Depending on what you are using the data for, this may or may not be a uniqueness issue. If you want to know the total number of visits for every patient, this is not a problem. However, if you want to know how many patients you have on your roster, you could be counting the same person twice. As such, it is important to take uniqueness into account and into context when assessing the quality for and when combining datasets as it can impact the coverage of the data. 
+
+## Questions to ask to gain insights into uniqueness of data
+
+<table class="simpleTable">
+<TR VALIGN=TOP>
+    <th scope = "col"style="width: 8%;">   </th>
+    <th scope = "col"> </th>
+    
+</tr>
+ <tr>
+  <td align="center">  Q61 </td>
+  <td>How often, do identical records appear in the data more than once? </td>
+ </tr>
+
+ 
+
+ 
+ <tr>
+  <td align="center">  Q62 </td>
+  <td>Should there or shouldn’t there be records appearing more than once in the data? </td>
+ </tr>
+ 
+
+ 
+ <tr>
+  <td align="center">  Q63 </td>
+  <td>If records appear more than once, what, is the reason?  </td>
+ </tr>
+
+
+ 
+ <tr>
+  <td align="center">  Q64 </td>
+  <td> How unique are the records in the data? </td>
+ </tr>
+
+
+ 
+ <tr>
+  <td align="center">  Q65 </td>
+  <td> What type of records appear in the data more than once? </td>
+ </tr>
+ 
+
+ 
+ <tr>
+  <td align="center">  Q66 </td>
+  <td>What does each row in the dataset represent?  </td>
+ </tr>
+  
+
+ 
+ <tr>
+  <td align="center">  Q67 </td>
+  <td>How is each unique record identified? For example, a record ID number. </td>
+ </tr>
+   
+
+ 
+ <tr>
+  <td align="center">  Q68 </td>
+  <td> What measures are carried out to prevent records appearing more than once in the data during data collection?  </td>
+ </tr>
+
+
+ 
+ <tr>
+  <td align="center">  Q69 </td>
+  <td>What measures are carried out to prevent records appearing more than once in the data during data processing?  </td>
+ </tr>
+
+
+ 
+ <tr>
+  <td align="center">  Q70 </td>
+  <td>How are records that appear more than once in the data identified? </td>
+ </tr>
+ 
+
+ 
+ <tr>
+  <td align="center">  Q71 </td>
+  <td> What do duplicate records look like in the data?</td>
+ </tr>
+ 
+
+ 
+ <tr>
+  <td align="center">  Q72 </td>
+  <td>How are records that appear more than once in the data resolved? </td>
+ </tr>
+ 
+
+ </table>
+
+<br>
+
+# Consistency and Timeliness
+
+## Consistency definition
+
+Consistency is achieved when data values do not conflict with other values within a dataset or across different datasets. For example, date of birth for the same person should be recorded as the same date within the same dataset and between datasets. It should also match the age recorded for that person. Their postcode should also not conflict with their address, etc.  Another example may be where two people who are each others’ spouses, should both have the same marital status recorded.  
+
+## Questions to ask to gain insights into consistency of data
+
+<table class="simpleTable">
+<TR VALIGN=TOP>
+    <th scope = "col"style="width: 8%;">   </th>
+    <th scope = "col"> </th>
+    
+</tr>
+ <tr>
+  <td align="center">  Q73 </td>
+  <td>How consistent, are the data between variables?</td>
+ </tr>
+ 
+
+ <tr>
+  <td align="center">  Q74 </td>
+  <td>Which variables, have inconsistent information? What is the reason for this?</td>
+ </tr>
+
+
+<tr>
+  <td align="center">  Q75 </td>
+  <td>Which types of records, if any, have inconsistent information?</td>
+ </tr> 
+ 
+
+<tr>
+  <td align="center">  Q76 </td>
+  <td>What is the reason for this? </td>
+ </tr> 
+
+
+<tr>
+  <td align="center">  Q77 </td>
+  <td>If you have a composite dataset (dataset compiled from different sources), how consistent, are the data across the different sources?</td>
+ </tr> 
+ 
+
+<tr>
+  <td align="center">  Q78 </td>
+  <td>How consistent, are the data over time? </td>
+ </tr> 
+
+
+<tr>
+  <td align="center">  Q79 </td>
+  <td>Have there been any changes to the way the data are collected over time?</td>
+ </tr>
+ 
+
+<tr>
+  <td align="center">  Q80 </td>
+  <td>What changes have there been to the variables over time? For example, changes to definition.</td>
+ </tr>
+ 
+
+<tr>
+  <td align="center">  Q81 </td>
+  <td>Which variables, if any, were changed? </td>
+ </tr>  
+
+
+<tr>
+  <td align="center">  Q82 </td>
+  <td>What is used to measure consistency or identify inconsistencies in the supplied data?</td>
+ </tr> 
+ 
+
+<tr>
+  <td align="center">  Q83 </td>
+  <td>What aspects of the data are checked for consistency? Such as, all data items, certain variables, certain time points.</td>
+ </tr> 
+  
+
+<tr>
+  <td align="center">  Q84 </td>
+  <td>How are inconsistencies in the data resolved?</td>
+ </tr> 
+ 
+</table>
+
+## Timeliness definition
+
+Timeliness refers to how well the data reflect the period they are supposed to represent. It also describes how up to date the data are. 
+
+ 
+The attributes represented in some data might stay the same over time – e.g., the day you were born does not change, no matter how much time passes. Other attributes, such as income, may change. 
+
+ 
+Your data are also ‘timely’ if the lag between their collection and their availability for your use is appropriate for your needs. Are the data available when expected and needed? Do they reflect the time they are supposed to? 
+
+
+## Questions to ask to gain insights into timeliness of data
+
+
+<table class="simpleTable">
+<TR VALIGN=TOP>
+    <th scope = "col"style="width: 8%;">   </th>
+    <th scope = "col"> </th>
+    
+</tr>
+ <tr>
+  <td align="center">  Q85 </td>
+  <td>When are the data collected? For example, constantly or over a certain timeframe?</td>
+ </tr>
+ 
+
+ <tr>
+  <td align="center">  Q86 </td>
+  <td>Up to date refers to whether the data supplied is the latest version. For example, if there are new data being collected, but is not reflected in the current data, then the data are not up to date.</td>
+ </tr>
+ 
+
+ <tr>
+  <td align="center">  Q87 </td>
+  <td>How up to date are the data at the point of it being supplied?</td>
+ </tr>
+ 
+
+<tr>
+  <td align="center">  Q88 </td>
+  <td>What can impact how up to date the data are?</td>
+ </tr>
+  
+
+<tr>
+  <td align="center">  Q89 </td>
+  <td>Reference dates refer to timestamps which indicate when the data have been changed. Are there any reference dates for each record?</td>
+ </tr>
+  
+
+<tr>
+  <td align="center">  Q90 </td>
+  <td>At what point of the data collection phase are reference dates produced? For example, when the data are collected, or when the data were last updated. </td>
+ </tr> 
+
+
+<tr>
+  <td align="center">  Q91 </td>
+  <td>How up to date, are the variables at the point of it being supplied?</td>
+ </tr>
+  
+
+ <tr>
+  <td align="center">  Q92 </td>
+  <td>Which types of records, do not have up to date information in these variables? </td>
+ </tr>
+
+<tr>
+  <td align="center">  Q93 </td>
+  <td>What methods are used to check that the data are up to date?</td>
+ </tr>
+  
+
+ <tr>
+  <td align="center">  Q94 </td>
+  <td>What methods are carried out to resolve data if they are not up to date?</td>
+ </tr>
+ 
+
+<tr>
+  <td align="center">  Q95 </td>
+  <td>How often are the data updated? </td>
+ </tr> 
+
+
+<tr>
+  <td align="center">  Q96 </td>
+  <td>What information is updated?</td>
+ </tr> 
+  
+
+<tr>
+  <td align="center">  Q97 </td>
+  <td>Are there any time lags between the reference dates in the data and the date in which the data are supplied?</td>
+ </tr> 
+ 
+
+<tr>
+  <td align="center">  Q98 </td>
+  <td>What are the different processes by which new records are added?</td>
+ </tr> 
+  
+
+<tr>
+  <td align="center">  Q99 </td>
+  <td>How often, are existing records within the data updated with new information?</td>
+ </tr>
+ 
+
+<tr>
+  <td align="center">  Q100 </td>
+  <td>What are the different processes by which existing records are updated with new information? </td>
+ </tr> 
+
+
+ <tr>
+  <td align="center">  Q101</td>
+  <td>What are the different processes by which variables or values are updated with new information? </td>
+ </tr>
+
+
+<tr>
+  <td align="center">  Q102</td>
+  <td>How often are the data updated to remove records from the data?</td>
+ </tr> 
+   
+
+<tr>
+  <td align="center">  Q103 </td>
+  <td>Under what circumstances are records removed from the data?</td>
+ </tr>
+   
+
+<tr>
+  <td align="center">  Q104 </td>
+  <td>What are the different processes by which unwanted records are removed?</td>
+ </tr>
+ 
+
+<tr>
+  <td align="center">  Q105 </td>
+  <td>When records meet the criteria for removal, how long would it typically take for the record to be deleted from the data supplied? </td>
+ </tr>
+ 
+
+<tr>
+  <td align="center">  Q106 </td>
+  <td>How often, are existing records within the data, updated to correct for any errors?</td>
+ </tr>
+   
+
+ <tr>
+  <td align="center">  Q107 </td>
+  <td>How often, are variables within records, updated to correct for any errors? </td>
+ </tr>
+
+
+<tr>
+  <td align="center">  Q108 </td>
+  <td>What are the different processes by which existing records are updated to correct for any errors?</td>
+ </tr>
+ 
+
+</table>
+
+<br>
diff --git a/rmarkdown/Page_4_Data_Linkage.rmd b/rmarkdown/Page_4_Data_Linkage.rmd
new file mode 100644
index 0000000..fc543d1
--- /dev/null
+++ b/rmarkdown/Page_4_Data_Linkage.rmd
@@ -0,0 +1,114 @@
+---
+title: "Data linkage"
+output:
+  html_document:
+    css: "question_bank.css"
+    toc: yes
+    toc_depth: 4
+    toc_float:
+      collapsed: yes
+  pdf_document:
+    toc: yes
+    toc_depth: '4'
+---
+
+```{r global-options, include=FALSE}
+# Set echo=false for all chunks
+knitr::opts_chunk$set(echo=FALSE)
+```
+
+---
+
+## Data linkage definition
+
+Data Linkage is the process of classifying whether two or more records refer to the same entity. Entities can be anything that a dataset contains, for example, people, addresses or households. These questions are looking to understand what data linkage, if any, the data supplier carries out on the dataset you are interested in.  
+
+## Questions to ask to gain insights into data linkage 
+
+<table class="simpleTable">
+<TR VALIGN=TOP>
+    <th scope = "col"style="width: 8%;">   </th>
+    <th scope = "col"> </th>
+    
+</tr>
+ <tr>
+  <td align="center">  Q109 </td>
+  <td>How, if at all, are the data linked?  </td>
+ </tr>
+
+
+
+ <tr>
+  <td align="center">  Q110 </td>
+  <td>Why is the data linkage conducted? </td>
+ </tr> 
+ 
+
+ <tr>
+  <td align="center">  Q111 </td>
+  <td>Why are these methods used to link data? </td>
+ </tr>
+ 
+
+ <tr>
+  <td align="center">  Q112 </td>
+  <td> How often are the data linkage methods changed?</td>
+ </tr> 
+ 
+
+ <tr>
+  <td align="center">  Q113 </td>
+  <td>What changes have been made?  </td>
+ </tr> 
+
+
+  <tr>
+  <td align="center">  Q114 </td>
+  <td> Why were these changes made?</td>
+ </tr>
+ 
+
+ <tr>
+  <td align="center">  Q115 </td>
+  <td>What assessments, checks or preparations, are carried out on the different datasets before data linkage occurs? </td>
+ </tr> 
+ 
+
+ <tr>
+  <td align="center">  Q116 </td>
+  <td>What variables are used to link or match the datasets?  </td>
+ </tr> 
+ 
+
+  <tr>
+  <td align="center">  Q117 </td>
+  <td>Why are these variables used? </td>
+ </tr>
+ 
+
+  <tr>
+  <td align="center">  Q118 </td>
+  <td>A match-key is created by putting together pieces of information to create unique keys that are then hashed and used for automated matching. How does your organisation use match keys? </td>
+ </tr>
+ 
+
+  <tr>
+  <td align="center">  Q119 </td>
+  <td>How is the success of these match keys evaluated? </td>
+ </tr>
+ 
+
+  <tr>
+  <td align="center">  Q120 </td>
+  <td> How are decisions made about whether records should be declared a match or a non-match? </td>
+ </tr>
+
+
+  <tr>
+  <td align="center">  Q121 </td>
+  <td> How is data linkage quality assessed?</td>
+ </tr>
+  
+ </table>
+
+<br>
diff --git a/rmarkdown/Page_5_Accessibility.rmd b/rmarkdown/Page_5_Accessibility.rmd
new file mode 100644
index 0000000..c602dd6
--- /dev/null
+++ b/rmarkdown/Page_5_Accessibility.rmd
@@ -0,0 +1,40 @@
+---
+title: "Accessibility and planned developments"
+output:
+  html_document:
+    css: "question_bank.css"
+    toc: yes
+    toc_depth: 4
+    toc_float:
+      collapsed: yes
+  pdf_document:
+    toc: yes
+    toc_depth: '4'
+---
+
+```{r global-options, include=FALSE}
+# Set echo=false for all chunks
+knitr::opts_chunk$set(echo=FALSE)
+```
+
+---
+
+### Accessibility
+
+This is an early version of the question bank; we welcome any feedback on accessibility to inform the next stages of development. If you find any problems, please contact us by emailing methods.research@ons.gov.uk. Please also get in touch if you are unable to access any part of this question bank or require the content in a different format. We will consider your request and aim to get back to you within five working days. 
+
+
+Current accessibility features include: 
+
+* Font is arial size 12 minimum. 
+* You can navigate the question bank using a keyboard – up and down arrow keys to scroll up and down, tab to highlight the different sections at the top, and enter to select them. 
+* Text can be read aloud to you (in Microsoft Edge browser) by right clicking and selecting “read aloud”. 
+* You can change colours, contrast levels, and font: in Microsoft Edge with “Immersive Reader” feature (right click and select “Immersive Reader”), in Google Chrome with style sheet (right click and select “Inspect”). 
+* You can zoom in up to 300% without the text spilling off the screen in Microsoft Edge’s “Immersive Reader” (right click and select “Immersive Reader”).
+
+
+### Feedback to this question bank
+
+ If you have thoughts or preferences on how the question bank is structured and how we include questions, please email methods.research@ons.gov.uk to provide feedback. 
+
+<br>
diff --git a/rmarkdown/_navbar.html b/rmarkdown/_navbar.html
new file mode 100644
index 0000000..512bfef
--- /dev/null
+++ b/rmarkdown/_navbar.html
@@ -0,0 +1,30 @@
+
+  <div class="navbar navbar-default  navbar-fixed-top" role="navigation">
+    <div class="container">
+      <div class="navbar-header">
+        <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-controls="nav-primary" aria-label="Display main menu">
+          Main menu
+        </button>
+        <a class="navbar-brand" href="index.html">The Administrative Data Quality Question Bank</a>
+          </div>
+            <div id="navbar" class="navbar-collapse collapse">
+              <ul class="nav navbar-nav">
+                   <li>
+                    <a href="index.html">Home  </a>
+                  </li>
+                  <li>
+                    <a href="Page_2_Question_Bank_Use.html">How to use the question bank  </a>
+                  </li>
+                  <li>
+                    <a href="Page_3_Quality_Dimensions.html">Quality dimensions  </a>
+                  </li>
+                  <li>
+                    <a href="Page_4_Data_Linkage.html">Data linkage  </a>
+                  </li>
+                  <li>
+                    <a href="Page_5_Accessibility.html">Accessibility and planned developments </a>
+                  </li>
+                </ul>
+            </div><!--/.nav-collapse -->
+          </div><!--/.container -->
+        </div><!--/.navbar -->
\ No newline at end of file
diff --git a/rmarkdown/_site.yml b/rmarkdown/_site.yml
new file mode 100644
index 0000000..5728765
--- /dev/null
+++ b/rmarkdown/_site.yml
@@ -0,0 +1,18 @@
+name: "Question Bank"
+output_dir: "../../docs"
+output:
+  html_document:
+    css: "question_bank.css"
+navbar:
+  title: "The Administrative Data Quality Question Bank"
+  left:
+    - text: "Home"
+      href: index.html
+    - text: "How to use the question bank"
+      href: Page_2_Question_Bank_Use.html
+    - text: "Quality dimensions"
+      href: Page_3_Quality_Dimensions.html
+    - text: "Data linkage"
+      href: Page_4_Data_Linkage.html
+    - text: "Accessibility and planned developments"
+      href: Page_5_Accessibility.html
diff --git a/rmarkdown/index.rmd b/rmarkdown/index.rmd
new file mode 100644
index 0000000..54469db
--- /dev/null
+++ b/rmarkdown/index.rmd
@@ -0,0 +1,82 @@
+---
+title: "Home"
+output:
+  html_document:
+    css: "question_bank.css"
+    toc: yes
+    toc_depth: 4
+    toc_float:
+      collapsed: yes
+  pdf_document:
+    toc: yes
+    toc_depth: '4'
+---
+
+```{r global-options, include=FALSE}
+# Set echo=false for all chunks
+knitr::opts_chunk$set(echo=FALSE)
+```
+
+---
+
+## Background
+
+The Administrative Data Quality Question Bank (ADQQB) is a collection of specially designed questions that act as a guide to help analysts to understand the quality of their administrative data for research and statistical purposes. These questions can be used in two ways. 
+
+
+Analysts can ask the data suppliers these questions. It is strongly encouraged for analysts to communicate with data suppliers prior to undertaking any research using administrative data, to fully understand the data in question and its’ quality. 
+
+
+Analysts can also use these questions as their own guides when assessing the quality of their data.  
+
+ <br> 
+ 
+The questions have been designed to guide the analyst to do the following: 
+
++ understand the data at the point that either the analyst or organisation has acquired and accessed the administrative data  
+
++ make the right decisions on how to treat the data at the data processing and analysis phase and onwards 
+
++ determine whether the data are fit for purpose 
+
++ transparently communicate the quality of the data in statistical research and outputs 
+
+
+
+
+As such, this question bank falls within the requirements of the three principles of government statistical data Quality, as set out in the [Code of Practice for Statistics](https://code.statisticsauthority.gov.uk/): 
+
++ suitable data sources 
+
++ sound methods 
+
++ assured quality
+
+
+
+<br> 
+
+The question bank has also been designed to be consistent with, and is encouraged to be used alongside [the Administrative Data Quality Framework (ADQF)](https://analysisfunction.civilservice.gov.uk/policy-store/quality-of-administrative-data-in-statistics/). The question bank consists of:
+
++ an explanation of what it is and how it can be used 
+
++ examples of questions, organised by theme and sub-theme  
+
++ description of the themes the questions are organised into, including definitions where appropriate 
+
+
+
+
+<br>
+
+
+## What are administrative data in statistics and research? 
+
+
+Administrative data are data which have been collected during the operations of an organisation. Government produces a large amount of administrative data, which can provide a valuable resource in the production of statistics. Administrative data must be accessed securely and via legal gateways. Their use represents an opportunity for analysts, however, it is important to remember that the subjects of the data must be protected from misuse. This question bank does not support you with making decisions about access to data, however, this is something you need to consider. Your organisation will have data protection policies, such as [these data protection guidelines from the ONS](https://www.ons.gov.uk/aboutus/transparencyandgovernance/dataprotection). If you have questions, you should contact your Data Protection Officer.
+
+ 
+
+Administrative data are collected for operational purposes and not statistical purposes. This can lead to challenges when using it for statistics, a summary of which can be found in David Hand’s paper, [“Statistical challenges of administrative and transaction data”](https://rss.onlinelibrary.wiley.com/doi/10.1111/rssa.12315).  
+
+<br>
diff --git a/rmarkdown/question_bank.css b/rmarkdown/question_bank.css
new file mode 100644
index 0000000..e601051
--- /dev/null
+++ b/rmarkdown/question_bank.css
@@ -0,0 +1,22 @@
+table.simpleTable {
+    background-color: #FFFFFF;
+    color: #000000;
+    width: 100%;
+    text-align: left;
+    border-collapse: collapse;
+  }
+
+  table.simpleTable td {
+    padding: 6px 4px;
+  }
+  
+  table.simpleTable thead, table.simpleTable th {
+      border-bottom: 2px solid #000000;
+      font-weight: bold;
+      color: #000000;
+      height: 30px;
+  }
+ 
+  table.simpleTable tr:nth-child(even){
+    background: #EEEEEE;
+  }


Q1	How are the data from this dataset collected? For example, through public contact with services over the phone, registration forms, etc.
Q2	What organisation(s) collects the data?
Q3	Are there different organisations which collect different data or variables in the dataset? For example, where one organisation is responsible for collecting income-related data, and another organisation is responsible for collecting demographic data, and this data is combined to create one composite dataset
Q4	If so, which data is collected by which organisation?
Q5	Is the data collected differently?
Q6	Does this have an impact on the quality?
Q7	How do suppliers quality assure the data?
Q8	Are there any known quality issues?
Q9	What thresholds have the suppliers put in place regarding the data's quality? For example, an acceptable number of duplicate records, or an acceptable amount of missing data.
Q10	How is the quality for this dataset documented?
Q11	Are there any supplementary documents related to the dataset that can be shared? For example, a data dictionary, a metadata list.
Q12	Are there training manuals related to the work that can be shared? For example, for coding, updating or maintaining the dataset.

Q13	How accurate are the supplied data?
Q14	How well do the data meet the statistical use?
Q15	How accurate are the items, or variables in the supplied data?
Q16	How accurate are the units, or records in the supplied data?
Q17	What are the accuracy issues in the supplied data?
Q18	If there are accuracy issues, how are they identified? For example, through a formal auditing process, or an automatic flagging system.
Q19	What methods are implemented by the suppliers to prevent any accuracy issues? For example, checks built into the data collection instrument.
Q20	If there are accuracy issues, how are they resolved by the suppliers? And to which variables and types of records?
Q21	What data accuracy issues are not addressed?
Q22	Why are the issues not addressed?
Q23	What happens to data accuracy issues that are not addressed? For example, logged or reported to a specific team.
Q24	How are users of the data informed about these data accuracy issues?

Q25	What are the types of invalid data entries in the data?
Q26	How many invalid entries are there in the data?
Q27	What variables have invalid data entries?
Q28	What types of records have invalid data entries?
Q29	What methods are used to identify invalid data entries?
Q30	What methods are used to resolve invalid data entries?

Q31	What kinds of errors, typos or mistakes, are there in the data?
Q32	Which variables have typos, errors or mistakes?
Q33	Which types of records have typos, errors or mistakes?
Q34	What are the causes of these errors, typos or mistakes in the data?
Q35	How are errors, typos or mistakes identified in the data?
Q36	How are errors, typos or mistakes in the data resolved?

Q37	How complete or incomplete are the data?
Q38	How many records in the data are considered complete, or to have good coverage?
Q39	What types of records need to be in the data to be considered complete?
Q40	What types of records are missing from the data where they should be included?
Q41	Why are they missing?
Q42	What types of records are included in the data where they should not be?
Q43	Why are these included?
Q44	How are records missing from the data identified as missing?
Q45	How are records missing in the data resolved?
Q46	How are records that are wrongly included in the data, identified?
Q47	How are records, that are wrongly included in the data, resolved?
Q48	Unit imputation is when missing data are replaced with a record or unit. Are any records in the data supplied, imputed records?
Q49	Why are these records imputed?
Q50	How are they imputed?
Q51	What changes have been made to exclusion and inclusion criteria in the data over time? For example, due to policy changes.

Q52	Which variables or values have missing data?
Q53	If there are missing data in variables or values, are there any particular types of records that have data within variables or values missing?
Q54	How are missing data within variables or values identified?
Q55	How are missing data within variables or values resolved?
Q56	How are data, variables or values that are wrongly included in the dataset, identified?
Q57	How are data, variables or values that are wrongly included in the dataset, resolved?
Q58	Item imputation is when missing data are replaced with a value or variable. Which variables, or values in the data are imputed?
Q59	Why are these variables, or values imputed?
Q60	How are they imputed?

Q61	How often, do identical records appear in the data more than once?
Q62	Should there or shouldn’t there be records appearing more than once in the data?
Q63	If records appear more than once, what, is the reason?
Q64	How unique are the records in the data?
Q65	What type of records appear in the data more than once?
Q66	What does each row in the dataset represent?
Q67	How is each unique record identified? For example, a record ID number.
Q68	What measures are carried out to prevent records appearing more than once in the data during data collection?
Q69	What measures are carried out to prevent records appearing more than once in the data during data processing?
Q70	How are records that appear more than once in the data identified?
Q71	What do duplicate records look like in the data?
Q72	How are records that appear more than once in the data resolved?

Q73	How consistent, are the data between variables?
Q74	Which variables, have inconsistent information? What is the reason for this?
Q75	Which types of records, if any, have inconsistent information?
Q76	What is the reason for this?
Q77	If you have a composite dataset (dataset compiled from different sources), how consistent, are the data across the different sources?
Q78	How consistent, are the data over time?
Q79	Have there been any changes to the way the data are collected over time?
Q80	What changes have there been to the variables over time? For example, changes to definition.
Q81	Which variables, if any, were changed?
Q82	What is used to measure consistency or identify inconsistencies in the supplied data?
Q83	What aspects of the data are checked for consistency? Such as, all data items, certain variables, certain time points.
Q84	How are inconsistencies in the data resolved?

Q85	When are the data collected? For example, constantly or over a certain timeframe?
Q86	Up to date refers to whether the data supplied is the latest version. For example, if there are new data being collected, but is not reflected in the current data, then the data are not up to date.
Q87	How up to date are the data at the point of it being supplied?
Q88	What can impact how up to date the data are?
Q89	Reference dates refer to timestamps which indicate when the data have been changed. Are there any reference dates for each record?
Q90	At what point of the data collection phase are reference dates produced? For example, when the data are collected, or when the data were last updated.
Q91	How up to date, are the variables at the point of it being supplied?
Q92	Which types of records, do not have up to date information in these variables?
Q93	What methods are used to check that the data are up to date?
Q94	What methods are carried out to resolve data if they are not up to date?
Q95	How often are the data updated?
Q96	What information is updated?
Q97	Are there any time lags between the reference dates in the data and the date in which the data are supplied?
Q98	What are the different processes by which new records are added?
Q99	How often, are existing records within the data updated with new information?
Q100	What are the different processes by which existing records are updated with new information?
Q101	What are the different processes by which variables or values are updated with new information?
Q102	How often are the data updated to remove records from the data?
Q103	Under what circumstances are records removed from the data?
Q104	What are the different processes by which unwanted records are removed?
Q105	When records meet the criteria for removal, how long would it typically take for the record to be deleted from the data supplied?
Q106	How often, are existing records within the data, updated to correct for any errors?
Q107	How often, are variables within records, updated to correct for any errors?
Q108	What are the different processes by which existing records are updated to correct for any errors?

Q109	How, if at all, are the data linked?
Q110	Why is the data linkage conducted?
Q111	Why are these methods used to link data?
Q112	How often are the data linkage methods changed?
Q113	What changes have been made?
Q114	Why were these changes made?
Q115	What assessments, checks or preparations, are carried out on the different datasets before data linkage occurs?
Q116	What variables are used to link or match the datasets?
Q117	Why are these variables used?
Q118	A match-key is created by putting together pieces of information to create unique keys that are then hashed and used for automated matching. How does your organisation use match keys?
Q119	How is the success of these match keys evaluated?
Q120	How are decisions made about whether records should be declared a match or a non-match?
Q121	How is data linkage quality assessed?