Skip to content

Commit

Permalink
Proof read (#103)
Browse files Browse the repository at this point in the history
* Change introduction of ethics

* Proof read the introduction

* Fix cross referencing

* Remove introduction for the relational model

* Proof read relational model

* Proof read benchmarking databases

* Proof section on category theory

* Proof read databaes implementation

* Proof read implementation chapter

* Proof read benchmark chapter

* Proof read evaluation

* Proof check the conclusion

* Fix significant figures in mean table

* Fix figure and table placements and size

* Fix listing style

* Add information on shuffling table rows
  • Loading branch information
MatBon01 authored Aug 28, 2023
1 parent cfb6bc4 commit 3a93eb2
Show file tree
Hide file tree
Showing 55 changed files with 5,413 additions and 3,816 deletions.
168 changes: 84 additions & 84 deletions analysis/joinbench.ipynb

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions analysis/src/joinbench/benchmark_group_table_generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,8 @@ def get_percentage_change_of_indexed_equijoin_for_counts(
p_percent_change = (i_mean - p_mean) / p_mean * 100
c_percent_change = (i_mean - c_mean) / c_mean * 100
percentage_change_of_means[f"{benchmark.get_tuple_count()} tuples"] = [
p_percent_change,
c_percent_change,
f"{p_percent_change:.3g}\%",
f"{c_percent_change:.3g}\%",
]

return percentage_change_of_means
Expand Down
15 changes: 15 additions & 0 deletions report/.hunspell
Original file line number Diff line number Diff line change
Expand Up @@ -266,3 +266,18 @@ Bina
Vishnuram
Elysia
indexedTableRelAlgOps
mystyle
backgroundcolor
commentstyle
keywordstyle
numberstyle
stringstyle
basicstyle
breakatwhitespace
captionpos
keepspaces
showspaces
showstringspaces
showtabs
tabsize
mystyle
218 changes: 108 additions & 110 deletions report/background/benchmarking.tex

Large diffs are not rendered by default.

24 changes: 15 additions & 9 deletions report/background/categorytheory.tex
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
\section{Category theory}
Category theory will be the main tool in describing the structure and operations
Category theory will be the main tool for describing the structure and operations
of our database system. Category theory is a very powerful tool in mathematics
and can be seen as an ``abstraction of abstractions''. The category theory that
will appear in this paper is very limited and does not do justice to the full
Expand All @@ -18,7 +18,8 @@ \subsection{Categories}
\begin{categorydef}
A \emph{category} \cat{C} is a set\footnote{In more rigorous definitions one must be careful of defining the collections of objects as a set lest Russell's paradox comes into play} of \emph{objects} \objs{C}, such as \obj{a}, \obj{b}, \obj{c}, and \emph{morphisms} (or \emph{arrows}) \morphs{C} between them, such as \morph{f}, \morph{g}. We require that:
\begin{itemize}
\item There are two operations; \emph{domain} which associates with every arrow \morph{f} an object $\obj{a} = \domain{f}$ and \emph{codomain} which associates with every arrow \morph{f} an object $\obj{b} = \codomain{f}$. We can now express this information as \explicitmorph{f}{a}{b}.\footnote{Though we emphasise the distinction between a function and a morphism.}
\item There are two operations; \emph{domain}, which associates with every
arrow \morph{f} an object $\obj{a} = \domain{f}$ and \emph{codomain}, which associates with every arrow \morph{f} an object $\obj{b} = \codomain{f}$. We can now express this information as \explicitmorph{f}{a}{b}.\footnote{Though we emphasise the distinction between a function and a morphism.}
\item There is a composition rule between morphisms such that given \explicitmorph{f}{a}{b} and \explicitmorph{g}{b}{c}, there is another arrow \explicitmorph{\morph{g} \circ \morph{f}}{a}{c} in \morphs{C}.
\item Composition of arrows is associative. That is, for an additional object \obj{d} and arrow \explicitmorph{h}{c}{d} the resulting morphisms $\morph{h} \circ \left(\morph{g} \circ \morph{f}\right)$ and $\left(\morph{h} \circ \morph{g}\right) \circ \morph{f}$ coincide in \morphs{C}.
\item Every object \obj{a} is assigned an arrow $\id{a}: \obj{a} \to \obj{a}$ in \morphs{C}, called the \emph{identity morphism}.
Expand Down Expand Up @@ -53,7 +54,9 @@ \subsection{Functors}

\subsection{Natural transformations}
\theoremstyle{definition}\newtheorem*{nattransdef}{Natural Transformation}
We now can explore an intuitive way of relating two functors, called an \emph{natural transformation}. A natural transformation can be seen as a projection of one functor space to another and is in essence a family of arrows that describes this translation.
We now can explore an intuitive way of relating two functors, called a
\emph{natural transformation}. A natural transformation can be seen as a
projection from one functor space to another and is in essence a family of arrows that describes this translation.
\begin{nattransdef}
Given two functors $\functor{F}, \functor{G} : \cat{C} \to \cat{D}$ a
\emph{natural transformation} $\explicitnattrans{\tau}{F}{G}$ associates to each element $\obj{a} \in \objs{C}$ an arrow $\nattransapply{\tau}{a} \in \morphs{D}$ such that $\nattransapply{\tau}{a}: \functorobj{F}{a} \to \functorobj{G}{a}$. Additionally, for $\obj{b} \in \cat{C}$ and \explicitmorph{f}{a}{b} a morphism in \cat{C}, we also require that $\nattransapply{\tau}{b} \circ \functormorph{F}{f} = \functormorph{G}{f} \circ \nattransapply{\tau}{a}$.
Expand All @@ -66,7 +69,8 @@ \subsection{Natural transformations}
Given two natural transformations
\explicitnattrans{\tau}{E}{F} and
\explicitnattrans{\nu}{F}{G} where $\functor{E, F, G}: \cat{C} \to
\cat{D}$ are functors. The vertical composition \cite{RelationalAlgebraByWayOfAdjunctions}
\cat{D}$ are functors. The vertical
composition~\cite{RelationalAlgebraByWayOfAdjunctions}
$\explicitnattrans{\verticalcomposition{\tau}{\nu}}{E}{G}$
is defined by the composition of the underlying arrows for every
object $\obj{a} \in \cat{C}$. Explicitly, the components
Expand All @@ -77,7 +81,8 @@ \subsection{Natural transformations}


\subsection{Adjunctions}
An adjunction expresses an intersection of arrows of two different functions. For instance Say you have the category \commoncatname{Set} of sets \cite{RelationalAlgebraByWayOfAdjunctions} and two functions $\morph{f}, \morph{g}$ with signatures \explicitmorph{f}{X}{A} and \explicitmorph{g}{X}{B} with $\obj{X}, \obj{A}, \obj{B} \in \commoncatname{Set}$.
An adjunction expresses an intersection of arrows of two different functions.
For instance Say you have the category \commoncatname{Set} of sets~\cite{RelationalAlgebraByWayOfAdjunctions} and two functions $\morph{f}, \morph{g}$ with signatures \explicitmorph{f}{X}{A} and \explicitmorph{g}{X}{B} with $\obj{X}, \obj{A}, \obj{B} \in \commoncatname{Set}$.
in \commoncatname{Set} with a common domain $\domain{f} = \domain{g} = \obj{A}
\in \commoncatname{Set}$, how might we interpret the application of both
functions on one element. We could create a new function \explicitmorph{h}{X}{A
Expand All @@ -96,7 +101,7 @@ \subsection{Adjunctions}
become clear. We introduce the diagonal functor $\functor{\Delta}:
\commoncatname{Set} \to \commoncatname{Set}^2$, s.t. $\functorobj{\Delta}{A} =
(\obj{A}, \obj{A})$ and $\functormorph{\Delta}{f} = (\morph{f}, \morph{f})$.
Furthermore, the Cartesian product can also be seen view the lens of a functor $\functor{\times}: \commoncatname{Set}^2 \to \commoncatname{Set}$ that takes the pair of elements $(\obj{A}, \obj{B})$ and maps it to the set $\obj{A} \times \obj{B}$ and a pair of functions $(\morph{i}, \morph{j})$ to a function $k: \left(\domain{i} \times \domain{j}\right) \to \left(\codomain{i} \times \codomain{j}\right)$.
Furthermore, the Cartesian product can also be seen through the lens of a functor $\functor{\times}: \commoncatname{Set}^2 \to \commoncatname{Set}$ that takes the pair of elements $(\obj{A}, \obj{B})$ and maps it to the set $\obj{A} \times \obj{B}$ and a pair of functions $(\morph{i}, \morph{j})$ to a function $k: \left(\domain{i} \times \domain{j}\right) \to \left(\codomain{i} \times \codomain{j}\right)$.
Considering the above problem again, we can see very clear links between the two
functors. We can work our problem in \commoncatname{Set} in the domain of
$\commoncatname{Set}^2$ by considering $\functorobj{\Delta}{A} = (\obj{A},
Expand All @@ -110,7 +115,7 @@ \subsection{Adjunctions}
Given two functors $\functor{L}: \cat{D} \to \cat{C}$ and $\functor{R}:
\cat{C} \to \cat{D}$, we define an adjunction $\adunction{L}{R}$
such that there is a natural isomorphism between the
hom-sets as follows \cite{RelationalAlgebraByWayOfAdjunctions}:
hom-sets as follows~\cite{RelationalAlgebraByWayOfAdjunctions}:
\[
\lfloor - \rfloor: \homset{C}{\functorobj{L}{A}}{B} \cong
\homset{D}{A}{\functorobj{R}{B}} :\lceil - \rceil
Expand All @@ -123,7 +128,8 @@ \subsection{Adjunctions}
\emph{counit}:
$\explicitnattrans{\epsilon}{L \circ R}{\mathrm{Id}}$ such that
$\nattransapply{\epsilon}{B} = \lceil \id{\functorobj{R}{A}} \rceil$. We
require that the unit and counit obey the `triangle identities' \cite{RelationalAlgebraByWayOfAdjunctions}:
require that the unit and counit obey the `triangle
identities'~\cite{RelationalAlgebraByWayOfAdjunctions}:
$
\verticalcomposition
{\nattransapply{\eta}{\functor{R}}}
Expand All @@ -148,4 +154,4 @@ \subsection{Adjunctions}
that
$\nattransapply{\left(\verticalcomposition{\eta}{\epsilon}\right)}{\obj{A}} =
\morphcomp{\nattransapply{\eta}{\obj{A}}}{\nattransapply{\epsilon}{\obj{A}}}$
where the right hand side is simple the composition of arrows.
where the right-hand side is simple the composition of arrows.
17 changes: 11 additions & 6 deletions report/background/databaserepresentation.tex
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,20 @@ \section{Evolution of database representation}\label{sec:background:dbrep}
\fref{chap:database}.

\subsection{Bags}
\paragraph{Characteristics of a database} We expect our database approximation to not be ordered and admit multiplicities and a finite bag of values is one of the simplest constructions that does so. Like a finite set, a bag contains a collection of unordered values. However, unlike a set, bags can contain duplicate elements. This multiplicity is key for processing non-idempotent aggregations. For instance, if summing up the ages of a database of people, without admitting multiplicity we would only sum each unique age once.
\paragraph{Characteristics of a database} We expect our database approximation
to not be ordered and admit multiplicities and a finite bag of values is one of
the simplest constructions that does so. Like a finite set, a bag contains a
collection of unordered values. However, unlike a set, bags can contain
duplicate elements. This multiplicity is key for processing non-idempotent
aggregations. For instance, summing the ages of a database of people, without admitting multiplicity would only sum each unique age once.

\subparagraph{Generalisation} Furthermore, going forward we generalise to bags
of any types instead of the classical ``bags of records''. This also allows us
to deal with intermediate tables that contain non-record values which, again,
may be useful for describing intermediate states of aggergations or projections.

In \fref{tab:BagRelAlgOps} we summarise the implementation of relational algebra operators with bags
as their bulk type \cite{RelationalAlgebraByWayOfAdjunctions}.
as their bulk type~\cite{RelationalAlgebraByWayOfAdjunctions}.
\begin{table}[h]
\centering
\begin{tabular}{r|l}
Expand Down Expand Up @@ -48,7 +53,7 @@ \subsection{Indexed tables}

We now have the mathematical tools required to define a map. In its finite form a map is widely known in computer science by many other names such as a dictionary, association lists or key-value maps.

Let $\keyset$ be a set and $\valset$ a pointed set. To those already familiar with maps, it may help to think of $\keyset$ as keys and $\valset$ as values.
Let $\keyset$ be a set and $\valset$ a pointed set. For those already familiar with maps, it may help to think of $\keyset$ as keys and $\valset$ as values.
\begin{mapdef}
A map of type $\map{\keyset}{\valset}$ is a total function from K to V.
\end{mapdef}
Expand All @@ -71,11 +76,11 @@ \subsection{Indexed tables}
\end{split}
\end{equation*}

The functions above tell us some extremely important information on creating
empty maps and calculating their unions. As you can see $empty$ returns a
The functions above tell us some extremely important information about creating
empty maps and calculating the unions of maps with the same key type. As you can see $empty$ returns a
function that maps any key to the neutral element $()$. This is to be expected
as there are no values in an empty map. More interestingly, we see the merge
of two maps as a function that returns a function that maps a key to a pair of
of two maps is function that maps a key to a pair of
values, each of which holds the result of the key lookup in the respective
table.

Expand Down
Loading

0 comments on commit 3a93eb2

Please sign in to comment.