Implement indexed tables as databases (#38)

* Add IndexedTable file * Add singleton Indexed Table * Implement unions of indexed tables * Implement projection * Implement selection on IndexedTables * Implement aggregation in indexed table * Define natural join for indexed tables * Explain natural join for indexed tables
MatBon01 · Apr 28, 2023 · aecbe7a · aecbe7a
1 parent 19223cd
commit aecbe7a
Show file tree

Hide file tree

Showing 5 changed files with 91 additions and 4 deletions.
diff --git a/a-deeper-dive-into-relational-algebra-by-way-of-adjunctions.cabal b/a-deeper-dive-into-relational-algebra-by-way-of-adjunctions.cabal
@@ -26,7 +26,8 @@ library
         Data.Bag,
         Data.PointedSet,
         Data.Key,
-        Database.Bag
+        Database.Bag,
+        Database.IndexedTable
 
     -- Modules included in this library but not exported.
     other-modules: 
@@ -66,6 +67,7 @@ test-suite spec
         Data.CMonoidSpec,
         Data.PointedSetSpec,
         Data.KeySpec,
-        Database.BagSpec
+        Database.BagSpec,
+        Database.IndexedTableSpec
     build-depends:      base >=4.16.4.0, hspec ^>=2.10, a-deeper-dive-into-relational-algebra-by-way-of-adjunctions
     build-tool-depends: hspec-discover:hspec-discover == 2.*
diff --git a/report/background/relationalmodel.tex b/report/background/relationalmodel.tex
@@ -199,7 +199,7 @@ \subsubsection{Joins}\label{sec:joins}
   \caption{Relation \relation{S} as example for joins.}
   \label{tab:joinRelationS}
 \end{table}
-\paragraph{Natural join} The natural join is the first way to combine relations. Given that relations \relation{R} and \relation{S} have common attributes \attribute{a_1}, \ldots, \attribute{a_k}, tuples in \relation{R} and \relation{S} are combined if the component of all attributes are equal. This join is expressed as \natjoin{R}{S}.\cite{DatabaseSystems}
+\paragraph{Natural join}\label{sec:natjoin} The natural join is the first way to combine relations. Given that relations \relation{R} and \relation{S} have common attributes \attribute{a_1}, \ldots, \attribute{a_k}, tuples in \relation{R} and \relation{S} are combined if the component of all attributes are equal. This join is expressed as \natjoin{R}{S}.\cite{DatabaseSystems}
 \subparagraph*{Example of the natural join} Given the relations \relation{R} and \relation{S} in \fref{tab:joinRelationR} and \fref{tab:joinRelationS} respectively, the natural join $\natjoin{R}{S}$ is as in \fref{tab:naturalJoinResult}.\cite{RelationalModel}
 In this example we call the tuple \verb|(1, 2, 4)| a \emph{dangling tuple} as it failed to pair with any other tuple in relation \relation{S}.\cite{DatabaseSystems}
 \begin{table}[h]
@@ -220,4 +220,4 @@ \subsubsection{Joins}\label{sec:joins}
 \paragraph{Equijoin} The most important class of joins concerning this project, a specialisation of the theta-join. Equijoin is used when the operator of predicate $\theta$ between two attributes is an equality\footnote{So common that joins using operators other than $=$, such as $<$, are sometimes called \emph{nonequijoins}.\cite{JoinProcessing}}.\cite{JoinProcessing} An equijoin between relations \relation{R} and \relation{S} where we want to join the values of attributes \attribute{a} and \attribute{b} respectively is denoted \equijoin{R}{a}{S}{b}.
 \todo{Write example for equijoin}
 \subsubsection{Note on permutations}
-Permutations is another specialist operation in relational algebra, though not important to the scope of the project. For completion, despite the fact that relations are domain--unordered, their internal representation in computers is not and so permutation may be done for performance benefits despite no logical difference storing a relation and its permutations.\todo{Make sure I worded the performance benefits thing correctly}\cite{RelationalModel} Furthermore, permutation can be used (and is usually implied) to ensure that tuples with identical schemas differing only in ordering can have the normal set operations applied to them. \cite{DatabaseSystems}
+Permutations is another specialist operation in relational algebra, though not important to the scope of the project. For completion, despite the fact that relations are domain--unordered, their internal representation in computers is not and so permutation may be done for performance benefits despite no logical difference storing a relation and its permutations.\todo{Make sure I worded the performance benefits thing correctly}\cite{RelationalModel} Furthermore, permutation can be used (and is usually implied) to ensure that tuples with identical schemas differing only in ordering can have the normal set operations applied to them. \cite{DatabaseSystems}
diff --git a/report/project/benchmark/implementation.tex b/report/project/benchmark/implementation.tex
@@ -38,3 +38,12 @@ \subsection{Commutative Monoids}
 outcome of the aggregation should not depend on the internal representation of
 the bag as would happen given a non-commutative monoid. 
 \todo{Write implementation of CMonoid}
+
+\subsection{Indexed Tables}
+\paragraph{Natural Joins} In the implementation given, a natural join is defined
+by \todo{Add code and mathematical description here} merging two indexed tables
+then applying the raised Cartesian product on them. This translates to a local
+Cartesian product on the keys indexed by the table. In \fref{sec:natjoin} we
+define the natural join as a join that pairs all common indices, and then it is
+clear that our implementation defines all common attributes as the key to the
+finite map.
diff --git a/src/Database/IndexedTable.hs b/src/Database/IndexedTable.hs
@@ -0,0 +1,27 @@
+module Database.IndexedTable where
+
+import qualified Data.Bag as Bag
+import qualified Data.Key as Map
+import Data.CMonoid
+
+empty :: (Map.Key k) => Map.Map k (Bag.Bag v)
+empty = Map.empty
+
+singleton :: (Map.Key k) => (k, v) -> Map.Map k (Bag.Bag v)
+singleton (k, v) = Map.single (k, Bag.single v)
+
+union :: (Map.Key k) => Map.Map k (Bag.Bag v) -> Map.Map k (Bag.Bag v) -> Map.Map k (Bag.Bag v)
+union t1 t2 = (fmap (uncurry Bag.union) . Map.merge) (t1, t2)
+
+projection :: (Map.Key k) => (v -> w) -> Map.Map k (Bag.Bag v) -> Map.Map k (Bag.Bag w)
+projection = fmap . fmap
+
+selection :: (Map.Key k) => (v -> Bool) -> Map.Map k (Bag.Bag v) -> Map.Map k (Bag.Bag v)
+selection p = fmap (Bag.filter p)
+
+aggregation :: (Map.Key k, CMonoid m) => Map.Map k (Bag.Bag m) -> Map.Map k m
+aggregation = fmap Bag.reduceBag
+
+-- Joins on common keys
+naturalJoin :: (Map.Key k) => Map.Map k (Bag.Bag v) -> Map.Map k (Bag.Bag w) -> Map.Map k (Bag.Bag (v, w))
+naturalJoin t1 t2 = fmap (uncurry Bag.cp) (Map.merge (t1 , t2))
diff --git a/test/Database/IndexedTableSpec.hs b/test/Database/IndexedTableSpec.hs
@@ -0,0 +1,49 @@
+module Database.IndexedTableSpec (spec) where
+
+import Test.Hspec
+import qualified Database.IndexedTable as Table
+import qualified Data.Key as Map
+import qualified Data.Bag as Bag
+import Data.Monoid
+
+type Name = String
+data Person = Person { firstName :: Name, lastName :: Name} deriving (Show, Eq)
+
+people :: Map.Map () (Bag.Bag Person)
+people = Map.Lone (Bag.Bag [Person "John" "Smith", Person "Jane" "Doe", Person "John" "Doe"])
+
+spec :: Spec
+spec = do
+  describe "empty" $ do
+    it "returns an empty map" $ do
+      (Table.empty :: Map.Map () (Bag.Bag Int)) `shouldBe` (Map.empty :: Map.Map () (Bag.Bag Int))
+  describe "singleton" $ do
+    it "returns a single table" $ do
+      Table.singleton ((), 3) `shouldBe` Map.Lone (Bag.Bag [3])
+  describe "union" $ do
+    it "can correctly handle union of singletons" $ do
+      Table.union (Table.singleton ((), 3)) (Table.singleton ((), 4)) `shouldBe` Map.Lone (Bag.Bag [3, 4])
+    it "can correctly deal with first element empty" $ do
+      Table.union (Table.empty :: Map.Map () (Bag.Bag Char)) (Table.singleton ((), 'a')) `shouldBe` Map.Lone (Bag.Bag ['a'])
+    it "can correctly deal with second element empty" $ do
+      Table.union (Table.singleton ((), 'a')) (Table.empty :: Map.Map () (Bag.Bag Char)) `shouldBe` Map.Lone (Bag.Bag ['a'])
+  describe "projection" $ do
+    it "can correctly do a general projection" $ do
+      Table.projection  firstName people `shouldBe` Map.Lone (Bag.Bag ["John", "Jane", "John"])
+    it "can correctly project on an empty map" $ do
+      Table.projection lastName (Table.empty :: Map.Map () (Bag.Bag Person)) `shouldBe` Map.empty
+    it "can correctly use the identity projection" $ do
+      Table.projection id people `shouldBe` people
+  describe "selection" $ do
+    it "can correctly select in general" $ do
+      Table.selection ((== "John") . firstName) people `shouldBe` Map.Lone (Bag.Bag [Person "John" "Smith", Person "John" "Doe"])
+    it "can correctly select all elements of a table" $ do
+      Table.selection (const True) people `shouldBe` people
+    it "can correctly select no elements of a table" $ do
+      Table.selection (const False) people `shouldBe` Map.empty
+  describe "aggregation" $ do
+    it "can correctly aggregate a table in general" $ do
+      Table.aggregation (Map.Lone (Bag.Bag [Any True, Any True, Any False])) `shouldBe` Map.Lone (Any True)
+  describe "natural join" $ do
+    it "is a local cartesian product" $ do
+      Table.naturalJoin (Map.Lone (Bag.Bag [1, 2])) (Map.Lone (Bag.Bag [2, 3])) `shouldBe` Map.Lone (Bag.Bag [(1, 2), (1, 3), (2, 2), (2, 3)])