-
Notifications
You must be signed in to change notification settings - Fork 0
Home
SurveyCTO is an easy and flexible platform for building electronic surveys especially for offline data collection. SurveyCTO provides an easy to program interface through the online form builder and the use of an XLS form, it is common for programmers to make simple mistakes or omissions that may lead to problems in the data workflow process.
The SurveyCTO server and desktop applications contain form validation tools that validate XLS forms and reject forms that do not meet the syntax requirements. However, these tools are lenient on various other issues and may accept them with or without warnings. ipacheckscto
is therefore created to complement the checks of the already existing SurveyCTO form validation tools by running checks for the following:
- common programming errors
- common programming practices that are known to cause problems in the data workflow
ipacheckscto
is designed to complement SurveyCTO's validation tools and should therefore be used only after the form validating the XLS form through the server or the desktop application.
NB: Not all issues flagged by
ipacheckscto
will require correction. Review the output carefully and decide if you need to make any changes to your form. For instance, althoughipacheckscto
will flag a required note as an issue, a required note is also often used as a soft constraint check.
ipacheckscto
makes extensive use of Stata. Some parts of the program are also heavily dependent on Stata’s excel modules in order to create output files that are easy to use and disseminate. The excel modules are only available in Stata 14 or later and therefore requires that the user has Stata 14.0 or later installed on their machine prior to running ipacheckscto. IPA employees with older versions of Stata should contact IT for access to a newer version.
ipacheckscto is collated and distributed as a Stata package through github. You can install ipacheckscto directly from github by running the following line of code in your command line or by using a do-file:
net install ipacheckscto, all replace from("https://raw.githubusercontent.com/PovertyAction/ipacheckscto/master/ado")
ipacheckscto
package file includes the following files:
- ipacheckscto.ado – Stata program file
- ipacheckscto.sthlp – Stata helpfile
- ipacheckscto.dlg – Stata dialog file
help ipacheckscto
outfile(filename)
Specifies the output path and filename for exporting results into excel. The filename must include the extension .xls or .xlsx.
If
outfile()
is not specified, results will be displayed in the Stata result window only.
other(integer)
Specifies the integer value for other specify option. If this option is specified, ipacheckscto will flag select_one/select_multiple fields which
- use
other specify
option in their choice list but are missing a childother specify
field; - have a child
other specify
field but appear after the childother specify
field; - use the or_other syntax. for instance
select_one fruits or_other
If the
other()
option is not specified,ipacheckscto
will only check foror_other
syntax.
dontknow(integer)
Specifies the integer value for don't know
option. If specified, ipacheckscto
will flag;
- select_one/select_multiple fields which use choices list that does not include a
don't know
option; - integer/decimal/text fields that do not accept don't know values
If
dontknow()
option is not specified,ipacheckscto
will skip the don't know entirely.
refuse(integer)
Specifies the integer value for refuses to answer option. If specified, ipacheckscto
will flag;
- select_one/select_multiple fields which use choices list that does not include a
refuses to answer
option; - integer/decimal/text fields that do not accept refuses to answer values
If the
refuse()
option is not specified,ipacheckscto
will skip the refuse check entirely.
Check a SurveyCTO XLS form and display results on the Stata window.
ipacheckscto using "C:\Users\Documents\Bontanga Baseline.xlsx"
Check a SurveyCTO XLS form and display results on the Stata window. Include checks for other specify with the value of -666
ipacheckscto using "C:\Users\Documents\Bontanga Baseline.xlsx", other(-666)
Check a SurveyCTO XLS form and display results on the Stata window. Include checks for other specify (-666), dontknow(-999) and refuse to answer (-888).
ipacheckscto using "C:\Users\Documents\Bontanga Baseline.xlsx", other(-666) dontknow(-999) refuse(-888)
Check a SurveyCTO XLS form and display results on the Stata window. Include checks for other specify (-666), dontknow(-999) and refuse to answer (-888). Export results to excel file "C:\Users\Documents\output\botanga_baseline_check.xlsx"
ipacheckscto using "C:\Users\Documents\Bontanga Baseline.xlsx", other(-666) dontknow(-999) refuse(-888) outfile("C:\Users\Documents\output\botanga_baseline_check.xlsx")
Get the dialog box by typing db ipacheckscto
in the Stata command window:
Check a SurveyCTO XLS form and display results in the Stata results window. Include checks for other specify (-666), dontknow(-999) and refuse to answer (-888).
Check a SurveyCTO XLS form and export results to "C:\Users\Ishmamial Boako\Bontanga Baseline.xlsx"
Check a SurveyCTO XLS form and export results to "C:\Users\Ishmamial Boako\Bontanga Baseline.xlsx" Include checks for other specify (-666), dontknow(-999) and refuse to answer (-888).
While using ipacheckscto
the programmer may need to run ipacheckscto
multiple times on their instrument. The diagram below shows a suggested workflow for programming in SurveyCTO.
-
Get the XLS version of SurveyCTO form: Most programmers already program their survey in excel and already have the XLS form. If the XLS form is on google sheets then you will need to download it in .xlsx format. You may also download an XLS form from the server if you used the online form editor.
-
Upload your XLS form to a SurveyCTO server or run it through the validate form tool on SurveyCTO Desktop application to verify and correct any syntax issues.
You can skip this step if you already downloaded your form from the server.
- Run
ipacheckscto
and review results carefully. Make any necessary adjustments to your form.
Repeat steps 2 and 3 until you are satisfied with the status of your form.
- Download the form onto the collect app for bench testing/piloting.
Check 0. summary
The summary results/sheet shows information about the XLS form as well as a summary of the results from the various checks. In the `output()` sheet, the sheet "summary" contains the result of this check.The form details section of the summary sheet shows basic information about the XLS form that was checked. This include:
Filename: Actual filename of the XLS form that was checked
Form Title: This is the title of your form. The form title is under the column “form_title” in the “settings” sheet of the XLS form.
Form ID: This is the unique ID that will identify the form. The form ID is under the column “form_id” in the “settings” sheet of the XLS form.
Form Definition Version: This is the version number of the form, which you must increase each time you modify an existing form. If you started with a form template or with one of the sample forms, then this is determined by a formula and is therefore automatic: so may not necessarily match the version number on the server.
Number of Languages: This is the number of languages in the XLS form. This number is determined by counting the number of label columns in the “survey” sheet of the XLS form.
Default Language: This is the name of the language associated with labels, images, and other content when no other language is specified. Form Encrypted:** This indicates “Yes” if the form data is encrypted using the SurveyCTO encryption keys and “No” if otherwise. IPA requires that all SurveyCTO forms be encrypted.
Number of Publishable Fields: This indicates the number of fields marked as publishable in the XLS form if the XLS form is encrypted. This is left blank if the XLS form is not encrypted.
Submission URL: This is the submission URL to use when submitting encrypted forms. The form will not accept form submissions if it is uploaded to a server that is different from what is indicated in the “submission_url” URL column of the “settings” sheet. The SurveyCTO server debug tool does not detect this discrepancy. However, enumerators may not be able to submit data from this form to our server and will continually get a prompt to re-enter the password whenever they try to submit the form.
This issue can be fixed by unchecking the “Respect submission_url if configured in forms” option which can be found in the admin settings.
The check summary section of the “summary” sheet indicates the results of each check. The results are colour-coded based on the following general rules:
- issues identified [] – This indicates that the specified check was run, and a certain number of issues were identified in the XLS form.
- no issues identified [] – This indicates that the specified check was run, and no issues were identified in the XLS form.
- check was skipped [] – This indicates that the specified check was skipped because it was not applicable to the XLS form. This only applies to checks 7, 8 and 9.
Check 1. recommend fields
Checks SurveyCTO XLS form for IPA recommended fields and shows the results in the “1. recommended fields” sheet. This sheet is divided into “required” and “recommended” sections.
The starttime, endtime & duration fields are required for the IPA DMS and should be included in each XLS form. These fields are automatically included in the SurveyCTO XLS templates.
Fields in the recommended section are not required but can be useful for data quality checks. Note the comments, text audits, audio audits, and sensor stream data may severely affect data download times.
Check 2. field names
Field names are expected to be short, unique, and without any spaces or punctuation. The SurveyCTO server debug tool already checks for spaces and punctuations in the field names; however, ipacheckscto complements this by also checking for “.” & “-”. Although fields that have invalid names may not necessarily cause any problems in the data workflow, they however may lead to problems when the data is imported into Stata. For instance, Stata will automatically substitute "." & "-" for "_".
Long field names in the SurveyCTO form may cause some problems in the data workflow due to Stata variable name restrictions. Stata will truncate any long variable name to 32 characters, and this could lead to an error if the first 32 characters of the multiple fields are the same. For instance:
In this specific scenario, the field “e01_household_purchase_1_month_chicken” is imported as the variable “e01_household_purchase_1_month_c” while the field “e01_household_purchase_1_month_corned_beef” is imported as variable “v30”. This is because the first 32 characters of “e01_household_purchase_1_month_corned_beef” is “e01_household_purchase_1_month_c” which already exist in the dataset.
However, in the Stata import do-file generated by SurveyCTO, the variable names for both fields are exactly the same although the second variable was renamed. This will lead to the error message shown below
ipacheckscto
flags for review if the field name is greater than 22 characters.
NB. 22 characters is an arbitrary number.
While naming your fields, it is important to allow for expansion of field names during data quality checks, data cleaning and analysis. Capping the length of the field name at 22 allows for 10 characters to be added to the varname when necessary.
Check 3. disabled, read only
The disablaed & readonly column allows users to mark fields as disabled & readonly respectively when needed. Although the readonly column is seldomly used, the disabled column is mostly to disable fields that are no longer needed. However, since the disabled column is just before the required column, it is often common for programmers to mistakenly disable a field when they intended to require it.
ipacheckscto
therefore shows a list of all fields that are disabled or readonly so the programmer can review and ensure that these fields are disabled or readonly on purpose.
Check 4. field requirements
The required column allows programmers to require individual fields so as to ensure that enumerators do not ignore or forget to fill out responses for relevant questions. However, it is common for programmers make some errors regarding field requirements and these may include:
- integer, text, date, datetime, time, select_one or select_multiple field is not required
- note field is required
- field is readonly and required
- select_one or select_multiple field is required and has the appearance type “label”
- Visible geopoint field is required: Required geopoint fields can prevent the user from finalizing the form if they are unable to record a value for the field.
ipacheckscto
flags issues with field requirement and export issues to the “4. requirements” sheet. Reviewing this output will help identify and ensure that fields that need to be required are correctly required while fields that should not be required are either not required or are required on purpose.
Check 5. constraint
The constraint column in the XLS form is used to constrain the values that can be entered for each field. Constraints are very important for data quality because it prevents the enumerators from entering invalid responses to fields. To ensure good data quality, it is important to carefully include the appropriate constraints and constraint messages into the SurveyCTO form.
ipacheckscto
flags issues with constraint and exports these issues to the “5. constraints” sheet. This sheet will contain a list of fields with at least one of the following violations.
- integer or decimal fields with missing constraint
- text fields with appearance type “numbers” or “numbers_phone” and with missing constraint
- fields that are constrained but are missing a constraint message
Check 6. other specify
"Close-ended" questions are a common question type for quantitative surveys. Although programmers may carefully consider choices and list out most of the expected responses, it is often common to include an option for enumerators to enter valid responses which are not already included in the pre-coded choice list. This is generally included in SurveyCTO by defining one of the choices in the choice list as "Other Specify" which will trigger a text field to collect the _**other specify**_ response.
ipacheckscto
checks the XLS form for issues with other specify and exports results to the sheet “6. other specify”. This will include fields with at least one of the following violations.
- or_other syntax: SurveyCTO technically still allows the use of or_other with select_one and select_multiple variables. For instance, select_one gender or_other will automatically create an other specify field if the order option is selected. However, the parent field will be stored as a string variable instead of numeric and will cause the import do-file to crush.
-
Child field is not specified: Programmers may sometimes forget to include a field that will be triggered when other specify option is selected.
ipacheckscto
checks that the child field is specified if the choice_list for the parent field includes an other (specify) option -
Child field is placed before the parent field: While rearranging fields, programmers may sometimes leave a child field before the parent field and the child field will be skipped until the user attempts to finalize the form.
ipacheckscto
checks that the child field is ordered after the parent field.
If the option
other()
is not used, then only the or_other syntax will be checked. Theother()
check only works effectively if a single value is consistently used to define other specify choice throughout the choice list.
Check 7. dont know, refuse
During data collections, respondents may agree to the survey but refuse to answer specific questions or may respond that they do not know the answer to these questions. To ensure that enumerators are able to appropriately record the responses when in these scenarios it is important that the programmer allows for _**don't know**_ and _**refuses to answer**_ responses to each field that collects respondent's responses.
ipacheckscto
flags fields that do not allow for don’t know and/or refuses to answer responses. This check will only apply field types such as integer, decimal & text fields.
If the options
dontknow()
and/orrefuse()
are not used, then this check will be skipped. Thedontknow()
and/orrefuse()
checks only works effectively if a single value is consistently used to define these choices.
Check 8. group names
Groups are field types that are to put fields in groups. Groups may be used to organize fields, apply relevance or format the appearance of multiple fields. Although SurveyCTO allows users to define a group name at the beginning and end of each group, it does not require that these names be the same and does not even require a name to be defined for the end group/repeat field type. However, as a best practice, it is important to keep the names the same for the begin & end groups so it is easier to identify where groups begin and where they end. This will come in handy when troubleshooting the XLS form.
ipacheckscto
checks that the names for a begin & end group pair for each group are the same and exports the results to the "8. group names" sheet.
Check 9. repeat fields
In SurveyCTO, fields defined in repeat groups can be referenced just like other fields when used within the same repeat group. However, when these fields are used outside of their repeat group, then they have to be used with special functions to ensure the SurveyCTO is able to determine the appropriate instance it is referencing. When repeat fields are used illegally outside their repeat groups, it causes the forms to crash and shows an error message on the collect app.
ipacheckscto
verifies that repeat fields that are used outside their repeat group are used with the correct functions. These include checking all repeat fields used in the survey, choices or setting sheets. ipacheckscto
also checks this across multiple columns including all label, filter, constraint, relevance and media columns.
Check 10. choices
In SurveyCTO, all choices for select_one and select_multiple type fields are defined in the choices sheet of the XLS form. When an XLS form is uploaded, SurveyCTO checks that the value column for each defined row is completed and at least 1 of the label columns is defined as well. However, SurveyCTO is lenient on a lot of other issues including some issues that will affect the quality of your data.
ipacheckscto
checks for various errors in the choices sheet of the XLS form. This include:
- duplicates in value column for the same choice group
- duplicates in label column for the same choices group
- missing label