From 22301e27080986086c1448341b662fdb3c18d461 Mon Sep 17 00:00:00 2001 From: chelseamcg <134400730+chelseamcg@users.noreply.github.com> Date: Tue, 12 Nov 2024 15:51:26 +0000 Subject: [PATCH] Delete docs directory --- docs/Page_1_Homepage.html | 1630 ---------------- docs/Page_2_Question_Bank_Use.html | 1633 ---------------- docs/Page_3_Quality_Dimensions.html | 2699 --------------------------- docs/Page_4_Data_Linkage.html | 1693 ----------------- docs/Page_5_Accessibility.html | 1595 ---------------- docs/question_bank.css | 22 - 6 files changed, 9272 deletions(-) delete mode 100644 docs/Page_1_Homepage.html delete mode 100644 docs/Page_2_Question_Bank_Use.html delete mode 100644 docs/Page_3_Quality_Dimensions.html delete mode 100644 docs/Page_4_Data_Linkage.html delete mode 100644 docs/Page_5_Accessibility.html delete mode 100644 docs/question_bank.css diff --git a/docs/Page_1_Homepage.html b/docs/Page_1_Homepage.html deleted file mode 100644 index a52995c..0000000 --- a/docs/Page_1_Homepage.html +++ /dev/null @@ -1,1630 +0,0 @@ - - - - -
- - - - - - - - -The Administrative Data Quality Question Bank (ADQQB) is a collection -of specially designed questions that act as a guide to help analysts to -understand the quality of their administrative data for research and -statistical purposes. These questions can be used in two ways.
-Analysts can ask the data suppliers these questions. It is strongly -encouraged for analysts to communicate with data suppliers prior to -undertaking any research using administrative data, to fully understand -the data in question and its’ quality.
-Analysts can also use these questions as their own guides when -assessing the quality of their data.
-The questions have been designed to guide the analyst to do the -following:
-understand the data at the point that either the analyst or -organisation has acquired and accessed the administrative data
make the right decisions on how to treat the data at the data -processing and analysis phase and onwards
determine whether the data are fit for purpose
transparently communicate the quality of the data in statistical -research and outputs
As such, this question bank falls within the requirements of the -three principles of government statistical data Quality, as set out in -the Code of Practice -for Statistics:
-suitable data sources
sound methods
assured quality
The question bank has also been designed to be consistent with, and -is encouraged to be used alongside the -Administrative Data Quality Framework (ADQF). The question bank -consists of:
-an explanation of what it is and how it can be used
examples of questions, organised by theme and sub-theme
description of the themes the questions are organised into, -including definitions where appropriate
Administrative data are data which have been collected during the -operations of an organisation. Government produces a large amount of -administrative data, which can provide a valuable resource in the -production of statistics. Administrative data must be accessed securely -and via legal gateways. Their use represents an opportunity for -analysts, however, it is important to remember that the subjects of the -data must be protected from misuse. This question bank does not support -you with making decisions about access to data, however, this is -something you need to consider. Your organisation will have data -protection policies, such as these -data protection guidelines from the ONS. If you have questions, you -should contact your Data Protection Officer.
-Administrative data are collected for operational purposes and not -statistical purposes. This can lead to challenges when using it for -statistics, a summary of which can be found in David Hand’s paper, “Statistical -challenges of administrative and transaction data”.
-This question bank groups questions into sections, some of which -having further sub-sections. These sections and sub-sections are as -follows:
-Quality dimensions as defined by the Data Management Association -UK (DAMA), paired into the following:
-accuracy and validity
completeness and uniqueness
consistency and timeliness
Data linkage
These sections and their sub-sections have been chosen to give you a -wide selection of questions to gain further insights into administrative -data quality. According to the Code of Practice for -Statistics, “quality means that statistics fit their intended uses, -are based on appropriate data and methods, and are not materially -misleading.”
-In essence, quality centers around a consideration of fitness for -purpose, including:
-Are the data good enough for what I want to use it for?
Did the statistic I produce meet the needs of the people who are -using it?
The questions in this question bank can be used to assess the -different aspects of fitness for your use. Very rarely will there be -data that is completely perfect for statistical and research purposes. -Understanding which dimensions are important for your specific uses will -help you when deciding if data are fit for purpose. To this end, the -question bank has been designed to be flexible, and your approach can be -tailored in proportion to your needs, for example, by making them -relevant to the variables you are interested in. These questions provide -a structure into assessing the data’s fitness for purpose and ensuring -that you cover the key issues to help understand the data’s quality.
-The first key step is to identify what dataset you are interested in -and wish to assess using the set of questions in this question bank. You -can then use the questions in this The Administrative Data Quality -Question Bank (ADQQB) and tailor these to find out more about that -specific dataset.
-The Administrative Data Quality Question Bank (ADQQB) focuses on -assessing quality of data at input data level. Input data level refers -to the point at which your organisation receives the data. Quality at -this stage refers to how well the data fits the purpose(s) you want to -use them for. Essentially, this could be suitability of the data to -produce statistics, or suitability of the data to carry out analysis or -research.
-We have included questions on the DAMA -quality dimensions because these are dimensions that are widely used -across the government to assess if data is good enough to use, or -whether improvements need to be made. We have also chosen to include -questions on data linkage in data collection and production as answers -to these questions could supply further information and context around -how the data are produced. It also provides a reminder that you should -check-in regularly with the data suppliers regarding any changes that -may affect the resulting data.
-As we further develop the Administrative Data Quality Question Bank -(ADQQB) beyond the current publication, we will add further sections. -These will include output data quality: “how well your ‘final’ output -meets your users’ needs”. This will be done through integrating relevant -dimensions from the European Statistical System’s (ESS) dimensions of -quality.
-The question bank includes data quality themes as defined by the Data -Management Association UK (DAMA) dimensions outlined in The -Government Data Quality Framework. These dimensions and definitions -used are the same as those outlined in our Administrative -Data Quality Framework (ADQF). The dimensions covered are:
-accuracy
validity
completeness
uniqueness
consistency
timeliness
We have used these because they were developed by experts in data -quality to assess the fitness for purpose of data. Finding which -dimensions are important for you will help you make decisions around how -fit for purpose the data are for your needs.
-In future publications of this question bank, we intend to include -questions based on relevant selected principles from the European -Statistics Code of Practice. The principles covered will be:
-relevance: coverage, content, purpose and collection
accessibility and clarity: accessing the data, data format, -availability of supporting information, quality and sufficiency of -metadata, illustrations and accompanying advice
Before going into the questions for each data quality dimension, we -have provided some general questions for you, which can be used to -ensure that you have a fundamental understanding of the data and its’ -quality.
-- | -- | -
---|---|
-Q1 - | --How are the data from this dataset collected? For example, through -public contact with services over the phone, registration forms, etc. - | -
-Q2 - | --What organisation(s) collects the data? - | -
-Q3 - | --Are there different organisations which collect different data or -variables in the dataset? For example, where one organisation is -responsible for collecting income-related data, and another organisation -is responsible for collecting demographic data, and this data is -combined to create one composite dataset - | -
-Q4 - | --If so, which data is collected by which organisation? - | -
-Q5 - | --Is the data collected differently? - | -
-Q6 - | --Does this have an impact on the quality? - | -
-Q7 - | --How do suppliers quality assure the data? - | -
-Q8 - | --Are there any known quality issues? - | -
-Q9 - | --What thresholds have the suppliers put in place regarding the data’s -quality? For example, an acceptable number of duplicate records, or an -acceptable amount of missing data. - | -
-Q10 - | --How is the quality for this dataset documented? - | -
-Q11 - | --Are there any supplementary documents related to the dataset that can be -shared? For example, a data dictionary, a metadata list. - | -
-Q12 - | --Are there training manuals related to the work that can be shared? For -example, for coding, updating or maintaining the dataset. - | -
Administrative data accuracy refers to how well the data match -reality - do the data capture what you are trying to measure?
-Valid data is defined as the extent to which the data conform to the -expected format, type, and range. For example, an email address must -have an ‘@’ symbol.
-- | -- | -
---|---|
-Q13 - | --How accurate are the supplied data? - | -
-Q14 - | --How well do the data meet the statistical use? - | -
-Q15 - | --How accurate are the items, or variables in the supplied data? - | -
-Q16 - | --How accurate are the units, or records in the supplied data? - | -
-Q17 - | --What are the accuracy issues in the supplied data? - | -
-Q18 - | --If there are accuracy issues, how are they identified? For example, -through a formal auditing process, or an automatic flagging system. - | -
-Q19 - | --What methods are implemented by the suppliers to prevent any accuracy -issues? For example, checks built into the data collection instrument. - | -
-Q20 - | --If there are accuracy issues, how are they resolved by the suppliers? -And to which variables and types of records? - | -
-Q21 - | --What data accuracy issues are not addressed? - | -
-Q22 - | --Why are the issues not addressed? - | -
-Q23 - | --What happens to data accuracy issues that are not addressed? For -example, logged or reported to a specific team. - | -
-Q24 - | --How are users of the data informed about these data accuracy issues? - | -
- | -- | -
---|---|
-Q25 - | --What are the types of invalid data entries in the data? - | -
-Q26 - | --How many invalid entries are there in the data? - | -
-Q27 - | --What variables have invalid data entries? - | -
-Q28 - | --What types of records have invalid data entries? - | -
-Q29 - | --What methods are used to identify invalid data entries? - | -
-Q30 - | --What methods are used to resolve invalid data entries? - | -
- | -- | -
---|---|
-Q31 - | --What kinds of errors, typos or mistakes, are there in the data? - | -
-Q32 - | --Which variables have typos, errors or mistakes? - | -
-Q33 - | --Which types of records have typos, errors or mistakes? - | -
-Q34 - | --What are the causes of these errors, typos or mistakes in the data? - | -
-Q35 - | --How are errors, typos or mistakes identified in the data? - | -
-Q36 - | --How are errors, typos or mistakes in the data resolved? - | -
Completeness describes the degree to which all values within each -variable are present (or absent of blank, null or empty values). -Completeness applies both at data item (variable) level, and unit -(record) level. At a data item level, you may have an individual’s value -missing, for example a date of birth, from their record within a -dataset. Alternatively, at a unit level, a full record may be missing; -that individual is missing from the dataset entirely.
-Depending on the completeness of your data there may be -under-coverage or over-coverage. Please see the -‘Completeness’ section of the Administrative Data Quality Framework -for more information on over and under-coverage.
-To assess completeness, you will need to identify how many items or -records are missing versus present. This dimension is sometimes -described in terms of “missingness” rather than “completeness”, but the -quality issue is the same. Data are ‘complete’ when all the data -required for your purposes are both present and available for use. This -does not mean your data needs 100% of the fields to be complete, but -that the values and units you need are present. A ‘complete’ dataset may -still be inaccurate if it has values that are not correct.
-- | -- | -
---|---|
-Q37 - | --How complete or incomplete are the data? - | -
-Q38 - | --How many records in the data are considered complete, or to have good -coverage? - | -
-Q39 - | --What types of records need to be in the data to be considered complete? - | -
-Q40 - | --What types of records are missing from the data where they should be -included? - | -
-Q41 - | --Why are they missing? - | -
-Q42 - | --What types of records are included in the data where they should not be? - | -
-Q43 - | --Why are these included? - | -
-Q44 - | --How are records missing from the data identified as missing? - | -
-Q45 - | --How are records missing in the data resolved? - | -
-Q46 - | --How are records that are wrongly included in the data, identified? - | -
-Q47 - | --How are records, that are wrongly included in the data, resolved? - | -
-Q48 - | --Unit imputation is when missing data are replaced with a record or unit. -Are any records in the data supplied, imputed records? - | -
-Q49 - | --Why are these records imputed? - | -
-Q50 - | --How are they imputed? - | -
-Q51 - | --What changes have been made to exclusion and inclusion criteria in the -data over time? For example, due to policy changes. - | -
- | -- | -
---|---|
-Q52 - | --Which variables or values have missing data? - | -
-Q53 - | --If there are missing data in variables or values, are there any -particular types of records that have data within variables or values -missing? - | -
-Q54 - | --How are missing data within variables or values identified? - | -
-Q55 - | --How are missing data within variables or values resolved? - | -
-Q56 - | --How are data, variables or values that are wrongly included in the -dataset, identified? - | -
-Q57 - | --How are data, variables or values that are wrongly included in the -dataset, resolved? - | -
-Q58 - | --Item imputation is when missing data are replaced with a value or -variable. Which variables, or values in the data are imputed? - | -
-Q59 - | --Why are these variables, or values imputed? - | -
-Q60 - | --How are they imputed? - | -
Uniqueness describes the degree to which there is no duplication in -records. This means that the data contains only one record for each -entity it represents, and each value is stored only once.
-Data are unique if it appears only once in a dataset. A record can be -a duplicate even if it has some fields that are different. For example, -a person may have two patient records with matching information in some -fields (for example, name and date of birth) but may have different -addresses and contact numbers in each record, therefore they are treated -as two separate people. Depending on what you are using the data for, -this may or may not be a uniqueness issue. If you want to know the total -number of visits for every patient, this is not a problem. However, if -you want to know how many patients you have on your roster, you could be -counting the same person twice. As such, it is important to take -uniqueness into account and into context when assessing the quality for -and when combining datasets as it can impact the coverage of the -data.
-- | -- | -
---|---|
-Q61 - | --How often, do identical records appear in the data more than once? - | -
-Q62 - | --Should there or shouldn’t there be records appearing more than once in -the data? - | -
-Q63 - | --If records appear more than once, what, is the reason? - | -
-Q64 - | --How unique are the records in the data? - | -
-Q65 - | --What type of records appear in the data more than once? - | -
-Q66 - | --What does each row in the dataset represent? - | -
-Q67 - | --How is each unique record identified? For example, a record ID number. - | -
-Q68 - | --What measures are carried out to prevent records appearing more than -once in the data during data collection? - | -
-Q69 - | --What measures are carried out to prevent records appearing more than -once in the data during data processing? - | -
-Q70 - | --How are records that appear more than once in the data identified? - | -
-Q71 - | --What do duplicate records look like in the data? - | -
-Q72 - | --How are records that appear more than once in the data resolved? - | -
Consistency is achieved when data values do not conflict with other -values within a dataset or across different datasets. For example, date -of birth for the same person should be recorded as the same date within -the same dataset and between datasets. It should also match the age -recorded for that person. Their postcode should also not conflict with -their address, etc. Another example may be where two people who are each -others’ spouses, should both have the same marital status recorded.
-- | -- | -
---|---|
-Q73 - | --How consistent, are the data between variables? - | -
-Q74 - | --Which variables, have inconsistent information? What is the reason for -this? - | -
-Q75 - | --Which types of records, if any, have inconsistent information? - | -
-Q76 - | --What is the reason for this? - | -
-Q77 - | --If you have a composite dataset (dataset compiled from different -sources), how consistent, are the data across the different sources? - | -
-Q78 - | --How consistent, are the data over time? - | -
-Q79 - | --Have there been any changes to the way the data are collected over time? - | -
-Q80 - | --What changes have there been to the variables over time? For example, -changes to definition. - | -
-Q81 - | --Which variables, if any, were changed? - | -
-Q82 - | --What is used to measure consistency or identify inconsistencies in the -supplied data? - | -
-Q83 - | --What aspects of the data are checked for consistency? Such as, all data -items, certain variables, certain time points. - | -
-Q84 - | --How are inconsistencies in the data resolved? - | -
Timeliness refers to how well the data reflect the period they are -supposed to represent. It also describes how up to date the data -are.
-The attributes represented in some data might stay the same over time -– e.g., the day you were born does not change, no matter how much time -passes. Other attributes, such as income, may change.
-Your data are also ‘timely’ if the lag between their collection and -their availability for your use is appropriate for your needs. Are the -data available when expected and needed? Do they reflect the time they -are supposed to?
-- | -- | -
---|---|
-Q85 - | --When are the data collected? For example, constantly or over a certain -timeframe? - | -
-Q86 - | --Up to date refers to whether the data supplied is the latest version. -For example, if there are new data being collected, but is not reflected -in the current data, then the data are not up to date. - | -
-Q87 - | --How up to date are the data at the point of it being supplied? - | -
-Q88 - | --What can impact how up to date the data are? - | -
-Q89 - | --Reference dates refer to timestamps which indicate when the data have -been changed. Are there any reference dates for each record? - | -
-Q90 - | --At what point of the data collection phase are reference dates produced? -For example, when the data are collected, or when the data were last -updated. - | -
-Q91 - | --How up to date, are the variables at the point of it being supplied? - | -
-Q92 - | --Which types of records, do not have up to date information in these -variables? - | -
-Q93 - | --What methods are used to check that the data are up to date? - | -
-Q94 - | --What methods are carried out to resolve data if they are not up to date? - | -
-Q95 - | --How often are the data updated? - | -
-Q96 - | --What information is updated? - | -
-Q97 - | --Are there any time lags between the reference dates in the data and the -date in which the data are supplied? - | -
-Q98 - | --What are the different processes by which new records are added? - | -
-Q99 - | --How often, are existing records within the data updated with new -information? - | -
-Q100 - | --What are the different processes by which existing records are updated -with new information? - | -
-Q101 - | --What are the different processes by which variables or values are -updated with new information? - | -
-Q102 - | --How often are the data updated to remove records from the data? - | -
-Q103 - | --Under what circumstances are records removed from the data? - | -
-Q104 - | --What are the different processes by which unwanted records are removed? - | -
-Q105 - | --When records meet the criteria for removal, how long would it typically -take for the record to be deleted from the data supplied? - | -
-Q106 - | --How often, are existing records within the data, updated to correct for -any errors? - | -
-Q107 - | --How often, are variables within records, updated to correct for any -errors? - | -
-Q108 - | --What are the different processes by which existing records are updated -to correct for any errors? - | -
Data Linkage is the process of classifying whether two or more -records refer to the same entity. Entities can be anything that a -dataset contains, for example, people, addresses or households. These -questions are looking to understand what data linkage, if any, the data -supplier carries out on the dataset you are interested in.
-- | -- | -
---|---|
-Q109 - | --How, if at all, are the data linked? - | -
-Q110 - | --Why is the data linkage conducted? - | -
-Q111 - | --Why are these methods used to link data? - | -
-Q112 - | --How often are the data linkage methods changed? - | -
-Q113 - | --What changes have been made? - | -
-Q114 - | --Why were these changes made? - | -
-Q115 - | --What assessments, checks or preparations, are carried out on the -different datasets before data linkage occurs? - | -
-Q116 - | --What variables are used to link or match the datasets? - | -
-Q117 - | --Why are these variables used? - | -
-Q118 - | --A match-key is created by putting together pieces of information to -create unique keys that are then hashed and used for automated matching. -How does your organisation use match keys? - | -
-Q119 - | --How is the success of these match keys evaluated? - | -
-Q120 - | --How are decisions made about whether records should be declared a match -or a non-match? - | -
-Q121 - | --How is data linkage quality assessed? - | -
This is an early version of the question bank; we welcome any -feedback on accessibility to inform the next stages of development. If -you find any problems, please contact us by emailing methods.research@ons.gov.uk. Please also get in touch -if you are unable to access any part of this question bank or require -the content in a different format. We will consider your request and aim -to get back to you within five working days.
-Current accessibility features include:
-If you have thoughts or preferences on how the question bank is -structured and how we include questions, please email methods.research@ons.gov.uk to provide feedback.
-