Skip to content

Latest commit

 

History

History
379 lines (331 loc) · 49.2 KB

README.md

File metadata and controls

379 lines (331 loc) · 49.2 KB

fokus

CRAN Status

fokus allows to (pre-)process and clean the raw data, analyze and visualize the cleaned data, create the questionnaires and provides other utility functions around the FOKUS post-voting surveys by the Centre for Democracy Studies Aarau (ZDA) at the University of Zurich, Switzerland.

Porting all the functionality from the legacy fokus_aargau repository to this R package and the separate fokus_reports Quarto project is still work in progress.

Documentation

Netlify Status

The documentation of this package is found here.

Installation

To install the latest development version of fokus, run the following in R:

if (!("remotes" %in% rownames(installed.packages()))) {
  install.packages(pkgs = "remotes",
                   repos = "https://cloud.r-project.org/")
}

remotes::install_gitlab(repo = "zdaarau/rpkgs/fokus")

Package configuration

Some of fokus’s functionality is controlled via package-specific global configuration which can either be set via R options or environment variables (the former take precedence). This configuration includes:

::: table-wide

Description R option Environment variable Default value
FOKUS-covered ballot date to fall back to in various functions of this package when it isn’t explicitly specified. Basically a means to globally define the ballot date to be processed. One of "2018-09-23", "2018-11-25", "2019-10-20", "2020-09-27", "2020-10-18", "2021-11-28", "2023-06-18" or "2024-10-20". fokus.ballot_date R_FOKUS_BALLOT_DATE as.Date("2018-09-23")
FOKUS-covered canton name to fall back to in various functions of this package when it isn’t explicitly specified. Basically a means to globally define the canton to be processed. One of "aargau". fokus.canton R_FOKUS_CANTON "aargau"
Language to fall back to in various functions of this package when it isn’t explicitly specified. Basically a means to globally define the language to process. One of "de" and "en". fokus.lang R_FOKUS_LANG "de"
Maximal timespan to preserve the package’s pkgpins cache. Cache entries older than this will be deleted upon package loading. fokus.global_max_cache_age R_FOKUS_GLOBAL_MAX_CACHE_AGE "1 day"
Personal access token of a gitlab.com account with access to the private FOKUS repository. fokus.token_repo_private R_FOKUS_TOKEN_REPO_PRIVATE Sys.getenv("GITLAB_COM_TOKEN")
:::

Questionnaires

Netlify Status

Generated survey questionnaires are automatically deployed to qstnr.fokus.ag/{ballot_date}_{canton}.{ext}, where {ext} is one of html, md, csv or xlsx. The following questionnaires are available:

  • 2018-09-23 Aargau: | | |
  • 2018-11-25 Aargau: | | |
  • 2019-10-20 Aargau: | | |
  • 2020-09-27 Aargau: | | |
  • 2020-10-18 Aargau: | | |
  • 2021-11-28 Aargau: | | |
  • 2023-06-18 Aargau: | | |
  • 2024-10-20 Aargau: | | |

Private FOKUS directory structure

For part of this package’s functionality, a personal access token (PAT) of a gitlab.com account with access to the private FOKUS repository is required. This repository contains additional sensitive, non-public (survey) data under the raw/ subdirectory and certain files are written to its generated/ subdirectory.

Directory structure schema
fokus_private
├── generated
│   ├── for-polling-agency
│   │   ├── {ballot_date}_{canton}_easyvote_municipalities.csv
│   │   ├── {ballot_date}_{canton}_print_recipients.csv
│   │   └── {ballot_date}_{canton}_qr_codes.zip
│   ├── survey_data_de_{ballot_date}_{canton}.rds
│   ├── survey_data_en_{ballot_date}_{canton}.rds
│   ├── survey_data_merged_de_{ballot_date}_{canton}.rds
│   └── survey_data_merged_en_{ballot_date}_{canton}.rds
├── raw
│   ├── easyvote_municipalities_{ballot_date}_{canton}.csv
│   ├── online_participation_codes_{ballot_date}_{canton}.txt
│   ├── survey_data_{ballot_date}_{canton}.xlsx
│   ├── survey_data_{ballot_date}_{canton}_*.xlsx
│   ├── survey_data_preliminary_{ballot_date}_{canton}.xlsx
│   ├── voting_register_data_extra_{date_delivery_statistical_office}_{canton}.xlsx
│   ├── voting_register_ids_{ballot_date}_{canton}.csv
│   └── ...
└── ...

The following placeholders are used in the schema above:

  • ... for further files and/or folders
  • * for a variable character sequence
  • # for a count starting with 1
  • {canton} for the name of the FOKUS-covered canton (in lower case), e.g. aargau
  • {ballot_date} for the FOKUS-covered ballot date (in the format YYYY-MM-DD), e.g. 2018-09-23
  • {date_delivery_statistical_office} for the delivery date of the voting register data provided by the cantonal statistical office (in the format YYYY-MM-DD), e.g. 2019-09-11

Development

R Markdown format

This package’s source code is written in the R Markdown file format to facilitate practices commonly referred to as literate programming. It allows the actual code to be freely mixed with explanatory and supplementary information in expressive Markdown format instead of having to rely on # comments only.

All the .gen.R suffixed R source code found under R/ is generated from the respective R Markdown counterparts under Rmd/ using pkgpurl::purl_rmd()1. Always make changes only to the .Rmd files – never the .R files – and then run pkgpurl::purl_rmd() to regenerate the R source files.

Coding style

This package borrows a lot of the Tidyverse design philosophies. The R code adheres to the principles specified in the Tidyverse Design Guide wherever possible and is formatted according to the Tidyverse Style Guide (TSG) with the following exceptions:

  • Line width is limited to 160 characters, double the limit proposed by the TSG (80 characters is ridiculously little given today’s high-resolution wide screen monitors).

    Furthermore, the preferred style for breaking long lines differs. Instead of wrapping directly after an expression’s opening bracket as suggested by the TSG, we prefer two fewer line breaks and indent subsequent lines within the expression by its opening bracket:

    # TSG proposes this
    do_something_very_complicated(
      something = "that",
      requires = many,
      arguments = "some of which may be long"
    )
    
    # we prefer this
    do_something_very_complicated(something = "that",
                                  requires = many,
                                  arguments = "some of which may be long")

    This results in less vertical and more horizontal spread of the code and better readability in pipes.

  • Usage of magrittr’s compound assignment pipe-operator %<>% is desirable2.

  • Usage of R’s right-hand assignment operator -> is not allowed3.

  • R source code is not split over several files as suggested by the TSG but instead is (as far as possible) kept in the single file Rmd/fokus.Rmd which is well-structured thanks to its Markdown support.

As far as possible, these deviations from the TSG plus some additional restrictions are formally specified in pkgpurl::default_linters, which is (by default) used in pkgpurl::lint_rmd(), which in turn is the recommended way to lint this package.

Abbreviations

The abbreviations used to name things (function and parameter names etc.) in this package include:

Table of abbreviations
Full expression(s) Abbreviation
abbreviate, abbreviation abbr
abbreviations abbrs
absolute abs
argument arg
arguments args
attribute attr
attributes attrs
authenticate, authentication auth
authentications auths
auxiliary aux
back up kbp
background bg
backup bkp
bibliographies bibs
bibliography bib
certificates, certifications certs
certify, certificate, certification cert
chapter chpt
chapters chpts
character chr
characters chrs
column col
columns cols
combination combo
combinations combos
command cmd
commands cmds
condition cnd
conditions cnds
configurations configs
configure, configuration config
connection conn
connections conns
current cur
database db
dataframe df
dataframe column dfc
dataframe row dfr
dataframes dfs
define, definition def
definitions defs
delete, deletion del
deletions dels
depend, dependency dep
dependencies deps
develop, development, developer dev
developments, developers devs
dictionaries dicts
dictionary dict
differences diffs
differentiate, difference diff
directories dirs
directory dir
distribution distro
distributions distros
document doc
documents docs
double dbl
doubles dbls
duplicate, duplication dupl
duplicates, duplications dupls
dynamic dyn
element el
elements els
enumerate, enumeration enum
enumerations enums
environment env
environments envs
evaluate, evaluation eval
evaluations evals
exclude, exclusion excl
execute, execution exec
executions execs
expression expr
expressions exprs
extend, extension ext
extensions exts
factor fct
factors fcts
figure fig
figures figs
filesystem fs
foreign key fk
foreign keys fks
formula fm
formulas, formulae fms
frequencies freqs
frequent, frequency freq
function fn
functions fns
generate, generation gen
generations gens
google g
identifiers ids
identify, identifier id
image img
images imgs
include, inclusion incl
index i
indexes, indices ix
information info
initialize, initialization init
install, installation instl
integer int
integers ints
iterate, iteration, iterator itr
iterations, iterators itrs
label lbl
labels lbls
language lang
languages langs
left-hand side lhs
level lvl
levels lvls
libraries libs
library lib
limit lim
limits lims
list ls
logical lgl
logicals lgls
management mgmt
Markdown md
matrices mats
matrix mat
message msg
messages msgs
modifications mods
modify, modification mod
not a number nan
not available na
number nr
number of n
numbers nrs
numeric num
numerics nums
object obj
objects objs
operate, operation, operator op
operations, operators ops
option opt
options opts
organizations orgs
organize, organization org
package pkg
packages pkgs
parameterize, parameter param
parameters params
position pos
PostgreSQL pg
predicate pred
predicates preds
preparations preps
prepare, preparation prep
primary key pk
primary keys pks
procedures prcds
proceed, procedure prcd
projection, project proj
projections, projects projs
properties props
property prop
prototype ptype
prototypes ptypes
Quarto Markdown qmd
question qstn
questionnaire qstnr
questionnaires qstnrs
questions qstns
R Markdown rmd
refer, reference ref
references refs
referendum rfrnd
referendums, referenda rfrnds
regular expression, regular expressions regex
relative rel
remove, removal rm
repositories repos
repository repo
request req
requests reqs
respond, response resp
responses resps
right-hand side rhs
roxygen2 roxy
separate, separator sep
separators seps
sequence seq
sequences seqs
snippet snip
snippets snips
source src
sources srcs
specifications specs
specify, specification spec
string str
strings strs
structure struct
structures structs
supplement, supplemental, supplementary suppl
symbolize, symbol sym
symbols syms
tables tbls
tabulate, table tbl
template tpl
templates tpls
temporary tmp
user experience ux
user interface ui
value val
values vals
variable var
variables vars
vectorize, vector vctr
vectors vctrs
verbatim verb
version vrsn
versions vrsns
working directory wd

Footnotes

  1. The very idea to leverage the R Markdown format to author R packages was originally proposed by Yihui Xie. See his excellent blog post for his point of view on the advantages of literate programming techniques and some practical examples. Note that using pkgpurl::purl_rmd() is a less cumbersome alternative to the Makefile approach outlined by him.

  2. The TSG explicitly instructs to avoid this operator – presumably because it’s relatively unknown and therefore might be confused with the forward pipe operator %>% when skimming code only briefly. I don’t consider this to be an actual issue since there aren’t many sensible usage patterns of %>% at the beginning of a pipe sequence inside a function – I can only think of creating side effects and relying on R’s implicit return of the last evaluated expression. Therefore – and because I really like the %<>% operator – it’s usage is welcome.

  3. The TSG explicitly accepts -> for assignments at the end of a pipe sequence while Google’s R Style Guide considers this bad practice because it “makes it harder to see in code where an object is defined”. I second the latter.