Skip to content

Commit

Permalink
Implement a database implementation with bags (#37)
Browse files Browse the repository at this point in the history
* Add LANGUAGE extensions for the library in cabal

* Add file to hold example from appendix

* Remove old test file

* Add types for the example

* Define cartesian product for bags

* Start example with cartesian product

* Add test cases for bag cartesian product

* Add file to hold database operations

* Add table with overview of relational algebra in bags

* Implement empty table

* Define singleton table and bag

* Fix organisation of BagSpec tests

* Test the singleton Bag function

* Define singleton bag

* Test singleton table function

* Define table union

* Define cartesian product for database tables

* Define neutral element for a table

* Implement projection

* Implement selection and filtering

* Add aggregation to bag implementation of database

Also declare instance of related monoids as commutative monoids to use within
the test case.

* Rename selection function to select

* Implement equijoin by cartesian product for bags

* Explain design choice with commutative monoids

* Fix some spellings in implementation report

* Fix spellings

* Test CMonoid implementations of Boolean operations
  • Loading branch information
MatBon01 authored Apr 28, 2023
1 parent c94e173 commit 19223cd
Show file tree
Hide file tree
Showing 14 changed files with 283 additions and 46 deletions.
26 changes: 19 additions & 7 deletions a-deeper-dive-into-relational-algebra-by-way-of-adjunctions.cabal
Original file line number Diff line number Diff line change
Expand Up @@ -22,29 +22,36 @@ extra-source-files:
README.md

library
exposed-modules: Data.Bag, Data.PointedSet, Data.CMonoid, Data.Key
exposed-modules:
Data.Bag,
Data.PointedSet,
Data.Key,
Database.Bag

-- Modules included in this library but not exported.
-- other-modules:
other-modules:
Data.CMonoid

-- LANGUAGE extensions used by modules in this package.
-- other-extensions:
other-extensions:
InstanceSigs, TypeFamilies, FlexibleContexts,
FlexibleInstances
build-depends: base >=4.16.4.0
hs-source-dirs: src
default-language: Haskell2010

executable bags-test
main-is: DatabaseBagsTest.hs
executable appendix-example
main-is: AppendixExample.hs

-- Modules included in this executable, other than Main.
-- other-modules:

-- LANGUAGE extensions used by modules in this package.
-- other-extensions:

build-depends:
base >=4.16.4.0,
a-deeper-dive-into-relational-algebra-by-way-of-adjunctions,
multiset ^>=0.3.4

hs-source-dirs: app
default-language: Haskell2010
Expand All @@ -54,6 +61,11 @@ test-suite spec
type: exitcode-stdio-1.0
hs-source-dirs: test
main-is: Spec.hs
other-modules: Data.BagSpec, Data.CMonoidSpec, Data.PointedSetSpec, Data.KeySpec
other-modules:
Data.BagSpec,
Data.CMonoidSpec,
Data.PointedSetSpec,
Data.KeySpec,
Database.BagSpec
build-depends: base >=4.16.4.0, hspec ^>=2.10, a-deeper-dive-into-relational-algebra-by-way-of-adjunctions
build-tool-depends: hspec-discover:hspec-discover == 2.*
20 changes: 20 additions & 0 deletions app/AppendixExample.hs
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
module Main where

import Data.Bag

-- Define types used in example
type Identifier = String
type Name = String
type Date = String
type Amount = Float

data Customer = C { cid :: Identifier, name :: Name}
data Invoice = I { iid :: Identifier, cust :: Identifier, due :: Date, amount :: Amount}

-- Cartesian product of both databases
exampleWithCP :: Bag Customer -> Bag Invoice -> Bag (Customer, Invoice)
exampleWithCP cs is = cp cs is

main :: IO ()
main = do
putStrLn "Example from appendix"
28 changes: 0 additions & 28 deletions app/DatabaseBagsTest.hs

This file was deleted.

1 change: 1 addition & 0 deletions report/.hunspell
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,4 @@ lhs2
theoreticalanalysis
documenttype
secondmarker
BagRelAlgOps
23 changes: 22 additions & 1 deletion report/background/databaserepresentation.tex
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,27 @@ \subsection{Bags}
\todo{Add the mathematical parts about bags}
\todo{When reading about finite maps it says that it's better for databases as non finite maps cannot be aggregated, can non finite bags be aggregates - is this why they are finite?}

\todo{Write rest of section}

In \fref{tab:BagRelAlgOps} we summarise the implementation of relational algebra operators with bags
as their bulk type\cite{RelationalAlgebraByWayOfAdjunctions}.
\begin{table}[h]
\centering
\begin{tabular}{r|l}
table of $V$ values & \bag{V} \\
empty table & \emptybag \\
singleton table & \singletonbag \\
union of tables & $\bagunion{}{}$ \\
Cartesian product of tables & $\times$ \\
neutral element & $\lbag () \rbag$ \\
projection $\projsymb{f}$ & $\bag{f}$ \\
selection $\selectsymb{p}$ & $filter\ p$ \\
aggregation in monoid $\monoid{M}$ & $reduce\ \monoid{M}$\\
\end{tabular}
\caption{Relational algebra operators implemented for bags}
\label{tab:BagRelAlgOps}
\end{table}

\subsection{Indexed tables}
We want to move towards an indexed representation of our table in order to equijoin by indexing. \todo{Understand if this is right and equijoin by indexing}. So in this section we introduce the mathematical concepts required to define such an implementation.
\theoremstyle{definition}\newtheorem*{psetdef}{Pointed set}
Expand Down Expand Up @@ -34,4 +55,4 @@ \subsection{Indexed tables}
\end{finitemapdef}
The advantage of using a finite map in a database is to allow aggregation.
\todo{Understand why only semi-monoidal}
\todo{Introduced indexed tables}
\todo{Introduced indexed tables}
11 changes: 9 additions & 2 deletions report/background/utils.sty
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,11 @@

% Bag
\newcommand{\bag}[1]{\ensuremath{\mathrm{Bag}\ #1}}
\newcommand{\emptybag}{\ensuremath{\emptyset}}
\newcommand{\singletonbag}{\ensuremath{single}}
\newcommand{\bagunion}[2]{\ensuremath{#1 \uplus #2}}
% TODO:: get the correct letter from bags section of original paper
\newcommand{\monoid}[1]{\ensuremath{\mathrm{\textbf{M}}}}

% Pointed set
\newcommand{\pset}[2]{\ensuremath{\left(#1,\,#2\right)}}
Expand All @@ -31,8 +36,10 @@
% Relational model
\newcommand{\relation}[1]{\ensuremath{\mathrm{#1}}}
\newcommand{\attribute}[1]{\ensuremath{\mathnormal{#1}}}
\newcommand{\proj}[2]{\ensuremath{\pi_{#1}\!\left(\relation{#2}\right)}}
\newcommand{\select}[2]{\ensuremath{\sigma_{#1}\!\left(\relation{#2}\right)}}
\newcommand{\projsymb}[1]{\ensuremath{\pi_{#1}}}
\newcommand{\proj}[2]{\ensuremath{\projsymb{#1}\!\left(\relation{#2}\right)}}
\newcommand{\selectsymb}[1]{\ensuremath{\sigma_{#1}}}
\newcommand{\select}[2]{\ensuremath{\selectsymb{#1}\!\left(\relation{#2}\right)}}
\newcommand{\natjoin}[2]{\ensuremath{\relation{#1}\bowtie\relation{#2}}}
\newcommand{\thetajoin}[3]{\ensuremath{\relation{#2}\bowtie_{#1}\relation{#3}}}
\newcommand{\equijoin}[4]{\ensuremath{\relation{#1}\,{}_{#2}\!\bowtie_{\:#4}\relation{#3}}}
3 changes: 2 additions & 1 deletion report/packages.tex
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,9 @@
\usepackage{comment}
\usepackage[numbers]{natbib}
\usepackage{fancyref}
\usepackage{stmaryrd}

\usepackage{title/titlestructure}

%% Utils
\usepackage{background/utils}
\usepackage{background/utils}
16 changes: 15 additions & 1 deletion report/project/benchmark/implementation.tex
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,21 @@ \subsection{Pointed sets}
container type of the Bag (i.e. \texttt{Eq a => Bag a}). This instance is key
for the indexing operation $ix$. The function $ix$ has a type $ix ::
\bag{(k, v)} \to \map{k}{(\bag{v})}$, coupled with the requirement $(Pointed v)
=> \map{k}{v}$ the necessity of the implementaion becomes clear and it is
=> \map{k}{v}$ the necessity of the implementation becomes clear and it is
important to note that when indexing the bag of key-value pairs, if a key has no
value, it is the "null" value for \texttt{Bag} that is used and not for $v$
(given that $v$ is even a \texttt{PointedSet} which is not required).

\subsection{Commutative Monoids}
You will see an implementation of \texttt{CMonoid} which is a `logical'
implementation of a commutative monoid. You will see in
\cite{RelationalAlgebraByWayOfAdjunctions} it is stated that there is no way to
enforce the commutativity of the monoid structure in Haskell and they give an
empty class definition with a comment. I have chosen to also include a
\texttt{CMonoid} type class in the supplied library as I feel it forces anyone
writing future implementations to think about the properties their monoid has.
The commutativity requirement is key for aggregations of tables as implemented
by bags because by definition bags do not have an order and therefore the
outcome of the aggregation should not depend on the internal representation of
the bag as would happen given a non-commutative monoid.
\todo{Write implementation of CMonoid}
12 changes: 12 additions & 0 deletions src/Data/Bag.hs
Original file line number Diff line number Diff line change
Expand Up @@ -52,3 +52,15 @@ b1 `union` b2 = Bag (elements b1 ++ elements b2)
reduceBag :: CMonoid m => Bag m -> m
-- Reduce a bag of ms into an m (e.g. a bag of bags into a bag)
reduceBag = mconcat . elements

-- Cartesian product
cp :: Bag a -> Bag b -> Bag (a, b)
cp (Bag xs) (Bag ys) = Bag [(x, y) | x <- xs, y <- ys]

-- Create singleton bag
single :: a -> Bag a
single = pure

-- Filter
filter :: (a -> Bool) -> Bag a -> Bag a
filter p = Bag . Prelude.filter p . elements
4 changes: 3 additions & 1 deletion src/Data/CMonoid.hs
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,6 @@ class Monoid a => CMonoid a
-- (<>) is commutative

instance Num k => CMonoid (Sum k)
instance Num k => CMonoid (Product k)
instance Num k => CMonoid (Product k)
instance CMonoid All
instance CMonoid Any
38 changes: 38 additions & 0 deletions src/Database/Bag.hs
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
module Database.Bag where

import qualified Data.Bag as Bag
import Data.CMonoid

type Table = Bag.Bag

empty :: Table a
empty = Bag.empty

single :: a -> Table a
single = Bag.single

union :: Table a -> Table a -> Table a
union = Bag.union

cp :: Table a -> Table b -> Table (a, b)
cp = Bag.cp

-- TODO:: check
neutral :: Table ()
neutral = single ()

projection :: (a -> b) -> Table a -> Table b
projection = fmap

select :: (a -> Bool) -> Table a -> Table a
select p = Bag.Bag . filter p . Bag.elements

aggregate :: CMonoid a => Table a -> a
aggregate = Bag.reduceBag

equijoinWithCp :: Eq c => (a -> c) -> (b -> c) -> Table a -> Table b -> Table (a, b)
equijoinWithCp fa fb as bs = select equality (cp as bs)
where
equality (a, b) = fa a == fb b


22 changes: 18 additions & 4 deletions test/Data/BagSpec.hs
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,11 @@ import Control.Applicative
(.+) = liftA2 (+)
addOneOrTwo x = Bag.Bag [x + 1, x + 2]

-- Some examples for the test
b1 = Bag.Bag ['a', 'b', 'c']
b2 = Bag.Bag ['b', 'c', 'd']
b3 = Bag.Bag ['c', 'd', 'e']

spec :: Spec
spec = do
describe "Data.Bag Eq" $ do
Expand Down Expand Up @@ -82,7 +87,16 @@ spec = do
(getSum . Bag.reduceBag) (Bag.Bag [1, 2, 3, 4] :: Bag.Bag (Sum Int)) `shouldBe` 10
it "correctly reduces a bag using product" $ do
(getProduct . Bag.reduceBag) (Bag.Bag [1, 2, 3, 4] :: Bag.Bag (Product Int)) `shouldBe` 24
where
b1 = Bag.Bag ['a', 'b', 'c']
b2 = Bag.Bag ['b', 'c', 'd']
b3 = Bag.Bag ['c', 'd', 'e']
describe "Data.Bag.cp" $ do
it "correctly can calculate the cartesian product of two bags" $ do
Bag.cp b1 b2 `shouldBe` Bag.Bag [('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'b'), ('b', 'c'), ('b', 'd'), ('c', 'b'), ('c', 'c'), ('c', 'd')]
it "correctly deals with the empty bag in a cartesian product" $ do
Bag.cp b1 (Bag.empty :: Bag.Bag Int) `shouldBe` (Bag.empty :: Bag.Bag (Char, Int))
describe "Data.Bag.single" $ do
it "creates a singleton bag" $ do
Bag.single 'a' `shouldBe` Bag.Bag ['a']
describe "Data.Bag.filter" $ do
it "can filter a bag without multiplicities" $ do
Bag.filter (== 'a') b1 `shouldBe` Bag.Bag ['a']
it "can filter a bag and maintain multiplicities" $ do
Bag.filter even (Bag.Bag [1, 2, 4, 2, 3, 4, 4, 6]) `shouldBe` Bag.Bag [2, 2, 4, 4, 4, 6]
8 changes: 7 additions & 1 deletion test/Data/CMonoidSpec.hs
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,10 @@ spec = do

describe "CMonoid instance of Product k" $ do
it "is commutative" $ do
(13 :: Product Int) <> (23 :: Product Int) `shouldBe` (23 :: Product Int) <> (13 :: Product Int)
(13 :: Product Int) <> (23 :: Product Int) `shouldBe` (23 :: Product Int) <> (13 :: Product Int)
describe "CMonoid instance of Any" $ do
it "is commutative" $ do
Any True <> Any False `shouldBe` Any False <> Any True
describe "CMonoid instance of All" $ do
it "is commutative" $ do
All True <> All False `shouldBe` All False <> All True
Loading

0 comments on commit 19223cd

Please sign in to comment.