-
Notifications
You must be signed in to change notification settings - Fork 0
/
testing.tex
59 lines (53 loc) · 2.4 KB
/
testing.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
\section{Testing}
\label{sec:test}
% We created
% test harnesses for regular expression implementations using our algorithms.
We implemented our algorithms in a library to implement a test harness for the \ocaml Re library%
\footnote{\url{https://github.com/ocaml/ocaml-re}},
a commonly used \ocaml regular expression implementation.
%
We also created a set of test cases for student projects in \haskell
to help them write better implementations.
Both libraries provide test harnesses which generate
regular expressions with positive and negative samples. The
implementation under test compiles the regular expression and applies it to the samples.
The library exposes a sample generator in the style of
property testing libraries such as QuickCheck~\cite{DBLP:conf/icfp/ClaessenH00}.
This way we can use the tooling already available in such libraries.
%
The simplified API of the \ocaml version is shown below.
The main function \code{arbitrary n alphabet} returns a generator
which provides on average \code{n} samples using the given alphabet.
\begin{lstlisting}
type test = {
re : Regex.t ;
pos : Word.t list ;
neg : Word.t list ;
}
val arbitrary:
int -> Word.char list -> test QCheck.arbitrary
\end{lstlisting}
Regular expressions are easy to generate using QuickCheck-like
libraries as they are represented by an algebraic datatype. We
restrict generated regular expressions to star-heights less than
3. While our technique can be used for regular expressions with
arbitrarily nested repetitions, it can cause slowdown and large memory
consumption which are inconvenient in the context of automated
testing.
Our library returns a finite number of samples even for an infinite language. We want to generate
test-cases that exercise the implementation under test as much as
possible. For this purpose, we use a technique similar to the fast
approximation for reservoir
sampling~\citep{DBLP:journals/toms/Vitter87}. When considering the
sequence of words in the language, we skip $k$ elements where $k$
follows a power law of mean $n$. We then return the given sample, and
stop the sampling with a probability $1/n$.
%
This technique results on average in $k$ samples that are regularly
spaced at the beginning of the stream, but will occasionally skip ahead
and return very long words. This approach has proven satisfactory at finding good
testing samples in practice.
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
%%% End: