From f36da8cf22b95fc06b623bcdbc0eb99bdf5a7a6f Mon Sep 17 00:00:00 2001 From: Matteo Bongiovanni <40599507+MatBon01@users.noreply.github.com> Date: Thu, 26 Jan 2023 13:32:13 +0000 Subject: [PATCH] Draft project plan (#19) * Write structure for the project plan * Draft a project plan --- report/bibs/interim.bib | 29 ++++++++++++++++++++++- report/evaluation/evaluationplan.tex | 3 ++- report/project/projectplan.tex | 35 ++++++++++++++++++++++++++++ 3 files changed, 65 insertions(+), 2 deletions(-) diff --git a/report/bibs/interim.bib b/report/bibs/interim.bib index a73a353..ac31f5e 100644 --- a/report/bibs/interim.bib +++ b/report/bibs/interim.bib @@ -119,4 +119,31 @@ @inproceedings{ComprehensiveComprehensions title={Comprehensive comprehensions}, booktitle={Proceedings of the ACM SIGPLAN workshop on Haskell workshop}, pages={61-72} -} \ No newline at end of file +} +@book{BasicCategoryTheoryForCS, + author={Benjamin C. Pierce}, + year={1991}, + title={Basic category theory for computer scientists}, + publisher={MIT Press}, + address={Cambridge, Massachusetts}, + note={Includes bibliographical references (p. 81-91) and index.; ID: alma991000237591701591}, + abstract={Category theory is a branch of pure mathematics that is becoming an increasingly important tool in theoretical computer science, especially in programming language semantics, domain theory, and concurrency, where it is already a standard language of discourse. Assuming a minimum of mathematical preparation, Basic Category Theory for Computer Scientists provides a straightforward presentation of the basic constructions and terminology of category theory, including limits, functors, natural transformations, adjoints, and cartesian closed categories. Four case studies illustrate applications of category theory to programming language design, semantics, and the solution of recursive domain equations. A brief literature survey offers suggestions for further study in more advanced texts. Benjamin C. Pierce received his doctoral degree from Carnegie Mellon University.Contents : Tutorial. Applications. Further Reading.}, + isbn={0262660717} +} +@InProceedings{CategoriesForModellingConcurrency, +author="Winskel, Glynn", +editor="Brookes, Stephen D. +and Roscoe, Andrew William +and Winskel, Glynn", +title="Categories of models for concurrency", +booktitle="Seminar on Concurrency", +year="1985", +publisher="Springer Berlin Heidelberg", +address="Berlin, Heidelberg", +pages="246--267", +isbn="978-3-540-39593-5" +} +@misc{HDBC, +title={HDBC-2.4.0.4: Haskell Database Connectivity}, +url={https://hackage.haskell.org/package/HDBC-2.4.0.4/docs/Database-HDBC.html#g:2}, +} diff --git a/report/evaluation/evaluationplan.tex b/report/evaluation/evaluationplan.tex index f89cfd3..db74603 100644 --- a/report/evaluation/evaluationplan.tex +++ b/report/evaluation/evaluationplan.tex @@ -1,4 +1,5 @@ \chapter{Evaluation plan} % 1-2 pages \begin{comment} Project evaluation is very important, so it's important to think now about how you plan to measure success. For example, what functionality do you need to demonstrate? What experiments to you need to undertake and what outcome(s) would constitute success? What benchmarks should you use? How has your project extended the state of the art? How do you measure qualitative aspects, such as ease of use? These are the sort of questions that your project evaluation should address; this section should outline your plan. -\end{comment} \ No newline at end of file +\end{comment} +\section{During implementation}\label{sec:evaluationduringimplementation} \ No newline at end of file diff --git a/report/project/projectplan.tex b/report/project/projectplan.tex index 6474293..644f51d 100644 --- a/report/project/projectplan.tex +++ b/report/project/projectplan.tex @@ -2,3 +2,38 @@ \chapter{Project Plan} % 1-2 pages \begin{comment} You should explain what needs to be done in order to complete the project and roughly what you expect the timetable to be. Don’t forget to include the project write-up (the final report), as this is a major part of the exercise. It’s important to identify key milestones and also fall-back positions, in case you run out of time. You should also identify what extensions could be added if time permits. The plan should be complete and should include those parts that you have already addressed (make it clear how far you have progressed at the time of writing). This material will *not* appear in the final report. \end{comment} + +The project has a clear starting point and a wide range of potential branches to eventually explore. This allows the project to venture down many different specific roots, ranging in application focused to theoretically based. +We outline the stages the project can take below. + +\section{Initial research and literature review}\label{sec:researchstage} +As the project could eventually take a turn into the theoretical side of the paper \cite{RelationalAlgebraByWayOfAdjunctions} this part is crucial. The time frame for this will be ongoing and has started, however the bulk should be finished by the end of January. An intuition as well as knowledge of the applications of category theory in the wider range of computer science would be vital in order to spot potential improvements in any results found while implementing and benchmarking the findings of the base paper \cite{RelationalAlgebraByWayOfAdjunctions} . + +I do expect, however, that there would be a period of time after finishing \fref{sec:implementation} to return to this step in more detail than normal in order to fully solidify the direction of the project. + +Of course research is also important to create a fair and rigorous benchmarking system. This style of research should be concluded in the coming days in order to ensure that any implementation is built to play nicely with the tools to be used (and potentially initially specifically for the datasets chosen). +\section{Implementation of the differing equijoin algorithms}\label{sec:implementation} +After understanding the basics of the theoretical implications as well as how category theory underpins Haskell as whole, work can be started on the first tangible step. That is to implement a mini database system that can query a given dataset, both using the inefficient and efficient equijoins. +\subsection{Implementation} +Initially the implementation will be very tailored to the data sources chosen. This will be done to avoid the difficulties of generalising types and creating a coherent system under Haskell's strict rules. It is not too difficult or even out of the scope of the project, but I feel as though focusing on the results of benchmarking will be more efficient in earlier stages. I hope to get this done by the end of February. +\subsection{Evaluation and analysis} +The analysis should begin as soon as the implementation has been completed. At the very least, benchmarking seems to be quite heuristic and experience helps a lot, thus some time should be used to make sure that the results are as accurate as possible. At this stage, if experience has suggested that changing the implementation would not be too much effort, I might also consider changing the data source the tests were based on in order to deal with more interesting or accurate results if any issues seem to emerge. This process should hopefully take a couple weeks. + +Evaluation, however, should be done as an ongoing process during the project. A plan to evaluate will be presented in \fref{sec:evaluationduringimplementation}. + +\section{Theoretical analysis of the remaining relational algebra}\label{sec:theoreticalanalysis} +After the implementation of the findings of the paper, I hope to have gained a much deeper understanding of the practicalities of database querying. This along with the mathematical intuition I hope to have gained during the stage described \fref{sec:researchstage} would inspire thought to which other operations currently do not adhere to a monadic structure or efficient implementations. Of course this would be greatly aided by research in the large field of database queries, potentially choosing to describe already known optimisations using this new model and adjunctions instead. This section of the project could last until new work is no longer being found. +\section{Reporting the project} +% Report due Monday 19 June 2022 +Reporting the project is a necessity. My project is unique in that the first section is itself an analysis and therefore, if done correctly, will almost immediately be written in a final report style while it is being conducted. Later parts of the project might need their own time set apart to write about as mathematics is rarely linear and the report should tell a compelling story. + +My plan is to continue to develop the tools to efficiently write my report alongside the implementation of the database, and begin to draft and validate the final parts of the report a month before the deadline. Useful tools I have already integrated are both manual and commit triggered checks on the spelling and code style of the reports. I have also worked on automatic releases and management of different report versions, early in December. A few of the systems are helpful but not at their potential yet and so I might put some time to think about possible extensions, for instance custom dictionaries for the spell checker so it does not flag up domain specific terminology as errors. I definitely need to migrate the report structure I have to lhs2\TeX{} so that when I have code snippets to report it can be done with ease. This should be finished by February, in time to develop the database in the report if that is the workflow I find most beneficial. +\section{Possible extensions} +\subsection{Progress in optimising other aspects of relational algebra} +Given a successful stage in \fref{sec:theoreticalanalysis} this might give the project a lot of scope to grow as an extension over the final months. I could choose between using the category theory learned to write proofs of the theoretical advantage of any novel approach I think of, or just a novel description of what is already there. Alternatively I could extend my implementation to include the modified operations and do another analysis of their practical performance change. +\subsection{A full implementation} +If I choose to take this project down a more practical root, a full general implementation of the database system implemented could be made as an extension. This would allow a specific schema and interface with the data stored in Haskell's data structures itself, differing from database abstraction layers that communicate to SQL servers currently used in production.\cite{HDBC} Although this extension would not be production ready, I hope that it would be demonstrative on whether list comprehensions have a place among relational calculus, both by its viability to be implemented but also developer satisfaction using them. +\subsection{Applications of related fields to relational calculus} +Category theory is extremely general and has countless applications in computer science, from describing lambda calculi to describing type systems.\cite{BasicCategoryTheoryforCS} A particularly interesting paper about modelling concurrency \cite{CategoriesForModellingConcurrency} caught my eye as a topic to research into in the future, given how vital concurrency is to database management systems \cite{DatabaseSystems}. + +Research could also be done into the viability of modelling mutating behaviours via referential calculus, that either change schema or components.