Adding Q&A to the intensionally defined relation

DavidPratten · Jan 11, 2023 · 4407cea · 4407cea
1 parent e6e2e45
commit 4407cea
Show file tree

Hide file tree

Showing 4 changed files with 252 additions and 6 deletions.
diff --git a/Goal_seeking_covid_vaccination_and_work.ipynb b/Goal_seeking_covid_vaccination_and_work.ipynb
@@ -0,0 +1,131 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "%reload_ext jetisu.query_idr_magic"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "# Interviewing using Intensionally Defined Relations\n",
+    "Intensionally defined relations support goal directed question and answer with users. This notebook shows this working for arbitrary columns in the ```covid_vaccinations_and_work``` relation.\n",
+    "\n",
+    "Goal seeking is begun by nominating the relation being searched and the list of columns that are sought.  Jetisu chooses the questions to ask that will give the fastest answer. The principles involved in choosing the columns to query are explained below, but first here is an example.\n",
+    "\n",
+    "_To see this in action, you will need to [run the example notebooks](docs/run_notebooks.md) for yourself._"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "outputs": [
+    {
+     "data": {
+      "text/plain": "<IPython.core.display.Markdown object>",
+      "text/markdown": "## Answer\n|covid_vaccination_work_recommended_doses|covid_vaccination_work_mandatory|\n|----|----|\n|3|True|\n### Because\nwork_sector='aged_care' and work_location='new_south_wales' and aged_care_facility\n\n### Along the way, the following additional values were determined:\n|disability_worker_in_school|\n|----|\n|False|\n\n|nsw_health_worker|\n|----|\n|False|\n\n|specialist_school|\n|----|\n|False|\n\n\n### And the following values were under-determined:\n|private_home_only|\n|----|\n|False|\n|True|\n\n"
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%jetisu_seek_goal\n",
+    "{\n",
+    "    \"table_name\": \"covid_vaccinations_and_work\",\n",
+    "    \"goal_list\": [\n",
+    "        \"covid_vaccination_work_recommended_doses\",\n",
+    "        \"covid_vaccination_work_mandatory\"\n",
+    "    ]\n",
+    "}"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "# What sequence are questions asked?\n",
+    "The best column to ask next is computed 'on-the-fly' by jetisu using the following scoring system:\n",
+    "- The more values in the column, the higher the column's score.\n",
+    "- If the column is a cross-product of the other columns the score is lower.\n",
+    "- The more equal the distribution of values in the column the higher the score.\n",
+    "\n",
+    "Here is another example which will interview the person to find their work sector based on the Covid rules."
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "outputs": [
+    {
+     "data": {
+      "text/plain": "<IPython.core.display.Markdown object>",
+      "text/markdown": "## Answer\n|work_sector|\n|----|\n|aged_care|\n### Because\ncovid_vaccination_work_recommended_doses=3 and work_location='western_australia' and aged_care_facility\n\n### Along the way, the following additional values were determined:\n|covid_vaccination_work_mandatory|\n|----|\n|True|\n\n|private_home_only|\n|----|\n|False|\n\n|disability_worker_in_school|\n|----|\n|False|\n\n|nsw_health_worker|\n|----|\n|False|\n\n|specialist_school|\n|----|\n|False|\n\n\n### And the following values were under-determined:\n"
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%jetisu_seek_goal\n",
+    "{\n",
+    "    \"table_name\": \"covid_vaccinations_and_work\",\n",
+    "    \"goal_list\": [\n",
+    "        \"work_sector\"\n",
+    "    ]\n",
+    "}"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "# Next step\n",
+    "You can edit and re-run this example workbook by following the instructions here:\n",
+    "[How to run the example notebooks](https://github.com/DavidPratten/jetisu/blob/main/docs/run_notebooks.md)\n",
+    "\n",
+    "You could, for example seek the ```work_location``` column."
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
diff --git a/README.md b/README.md
@@ -5,14 +5,19 @@ Jetisu is a toolset for modelling with __intensionally defined relations__.
 
 Jetisu is understood to be the first implementation of intensionally defined relations. If you are aware of an earlier implementation, please raise an issue.
 
-## Modelling Rules As Code
+## Querying Rules As Code
 The examples are chosen to illustrate the benefits (and/or challenges) of using intensionally defined relations to model rules as code.
 - [ACT Conveyance Duty](ACT_Conveyance_Duty.ipynb)
 - [Australian GST](Australian_GST.ipynb)
 - [Birthday Money](Birthday%20Money.ipynb)
 - [Australian COVID vaccinations up-to-date and eligibility](COVID_vaccinations.ipynb)
 - [Australian COVID vaccinations mandatory for work roles](COVID_vaccinations_and_work.ipynb)
 - [Range](Range.ipynb)
+
+## Interactive Q&A with Rules as Code
+This example shows user Q&A using the same rule-set as the above "Australian COVID vaccinations mandatory for work roles" example:
+- [Tell me my status for mandatory vaccinations](Goal_seeking_covid_vaccination_and_work.ipynb)
+
 ## Edit and re-run the example notebooks
 You can  ```docker run``` the example Jupyter notebooks for yourself.
 

diff --git a/jetisu/idr_query.py b/jetisu/idr_query.py
@@ -9,6 +9,7 @@
 import subprocess
 import sys
 import tempfile
+import math
 
 from sqlglot import exp, parse_one
 from sqlglot.executor import execute
@@ -38,6 +39,11 @@ def mzn_output_quote(k, v, typed_parameters_dict):
     else:
         return v
 
+def sqlglot_table2md(res):
+    return '|' + '|'.join(res.columns) + '|' + "\n" + '|' + '|'.join(
+            ["----" for x in res.columns]) + '|' + "\n" + "\n".join(
+            ['|' + '|'.join([str(val) for val in r]) + '|' for r in res.rows])
+
 def idr_query(sql_query, return_data):
     assert return_data in ['raw', 'data', 'markdown table', 'model', 'constrained model']
 
@@ -221,8 +227,85 @@ def idr_query(sql_query, return_data):
     elif return_data == 'data':
         return res.columns, res.rows
     elif return_data == 'markdown table':
-        return '|' + '|'.join(res.columns) + '|' + "\n" + '|' + '|'.join(
-            ["----" for x in res.columns]) + '|' + "\n" + "\n".join(
-            ['|' + '|'.join([str(val) for val in r]) + '|' for r in res.rows])
+        return sqlglot_table2md(res)
     else:
         return 'Programming error - this should never occur'
+
+def jetisu_goal_directed(goal_list, table_name):
+    tables = {table_name: idr_query(f"select * from {table_name};", 'raw')}
+    where_condition = ''
+    residual_columns_list = []
+    while not residual_columns_list:
+        goal_list, tables, where_condition, residual_columns_list = jetisu_ask_next_question(goal_list, tables, where_condition)
+    return tables, where_condition, residual_columns_list
+
+def jetisu_ask_next_question(goal_list, tables, where_condition, residual_columns=''):
+    table_name = list(tables.keys())[0]
+    schema = tables[table_name].columns
+    search_prompt = f"Finding {', '.join(goal_list)}"
+    search_over_list = set(schema) - set(goal_list)
+    cross_product_ratio = {}  # 1 = cross product, higher is better
+    randomness_proxy = {}  # 1 = apparently random, higher is better
+    effectiveness_ratio = {}
+
+    res_goal = execute(f"select count(*) from (select distinct {', '.join(goal_list)} from {table_name})",
+                       tables=tables)
+    res_search_over = execute(f"select count(*) from (select distinct {', '.join(search_over_list)} from {table_name})",
+                       tables=tables)
+    if res_goal.rows[0][0] == 1 or res_search_over.rows[0][0] == 1:
+        # found solution
+        # tables contains residual cases which to which the relation is indifferent
+        return goal_list, tables, where_condition, search_over_list
+
+    for q in search_over_list:
+        other_columns = set(schema) - set((q,))
+        # do the count of distinct for q and for other cols take the division and save it.jetisu_ask_next_question
+        res_all = execute(f"select count(*) from {table_name}", tables=tables)
+        res_q_and_goal = execute(
+            f"select count(*) from (select distinct {', '.join(goal_list + [q])} from {table_name})", tables=tables)
+        res_q = execute(f"select count(*) from (select distinct {q} from {table_name})", tables=tables)
+        res_other = execute(f"select count(*) from (select distinct {', '.join(other_columns)} from {table_name})",
+                            tables=tables)
+        # for cross product this will be 1, for primary key this will be N the cardinality of the table.
+        cross_product_ratio[q] = (res_q.rows[0][0] * res_goal.rows[0][0]) / res_q_and_goal.rows[0][
+            0]  # / (1+res_all.rows[0][0]-res_q_and_goal.rows[0][0])
+
+        # calculate sum of squares
+        res_sum_of_squares = execute(
+            f"select sum(sos) ssos from (select {q}, count(*)*count(*) sos from {table_name} group by {q})",
+            tables=tables)
+        randomness_proxy[q] = math.sqrt(res_sum_of_squares.rows[0][0]) / math.sqrt(
+            res_all.rows[0][0] * pow(res_all.rows[0][0] / res_q.rows[0][0], 2))
+
+        effectiveness_ratio[q] = cross_product_ratio[q] * randomness_proxy[q] * res_q.rows[0][0]
+    sortedq = sorted(effectiveness_ratio.items(), key=lambda x: x[1], reverse=True)
+    chosen_q = sortedq[0][0]
+    chosen_other_columns = set(schema) - set((chosen_q,))
+    qlist = execute(f"select distinct {chosen_q} from {table_name}", tables=tables).rows
+    enumerated_qlist = enumerate(qlist, start=1)
+    prompt = '\n'.join([f"{x}) {y[0]}" for (x, y) in enumerated_qlist])
+    response_valid = False
+    while not response_valid:
+        response = input(f"{search_prompt}\n{chosen_q}?\n{prompt}")
+        if response == '':
+            return "Search Cancelled ..."
+        try:
+            response_int = int(response)
+        except ValueError:
+            continue
+        if response_int >= 1 and response_int <= len(qlist):
+            response_valid = True
+            chosen_where_condition = where_condition + (" and " if where_condition else '')
+            this_where_condition = f"{chosen_q}" if str(qlist[response_int - 1][0]) == "True" else (
+                    f"not {chosen_q}" if str(
+                        qlist[response_int - 1][0]) == "False" else (f"{chosen_q}={qlist[response_int - 1][0]}" if (
+                        isinstance(qlist[response_int - 1][0], int) or isinstance(qlist[response_int - 1][0],
+                                                                                  float)) else f"{chosen_q}='{qlist[response_int - 1][0]}'"))
+            chosen_where_condition += this_where_condition
+
+    retdata = execute(
+        f"select distinct {', '.join(chosen_other_columns)} from {table_name} where {this_where_condition}",
+        tables=tables)
+
+    chosen_tables = {table_name: retdata}
+    return goal_list, chosen_tables, chosen_where_condition, False
diff --git a/jetisu/query_idr_magic.py b/jetisu/query_idr_magic.py
@@ -1,7 +1,8 @@
 from IPython.display import display, Markdown, Latex
-from jetisu.idr_query import idr_query
+from jetisu.idr_query import idr_query, jetisu_goal_directed, sqlglot_table2md
 import hashlib
-
+import json
+from sqlglot.executor import execute
 
 def jetisu_query(line, cell):
     display(Markdown(idr_query(cell, 'markdown table')))
@@ -20,6 +21,31 @@ def jetisu_show(line, cell):
 def jetisu_show_prepared(line, cell):
     display(Markdown("```\n\n" + idr_query(cell, 'constrained model') + "\n```"))
 
+def jetisu_seek_goal(line, cell):
+    # {
+    #     "table_name": "covid_vaccinations_and_work",
+    #     "goal_list": [
+    #         "covid_vaccination_work_recommended_doses",
+    #         "covid_vaccination_work_mandatory"
+    #     ]
+    # }
+    request = json.loads(cell)
+    goal_list = request["goal_list"]
+    table_name = request["table_name"]
+    tables, where_condition, residual_columns_list = jetisu_goal_directed(goal_list, table_name)
+    answer = "## Answer\n"+sqlglot_table2md(execute(f"select distinct {', '.join(goal_list)} from {table_name}", tables=tables))
+    answer += f"\n### Because\n{where_condition}\n"
+    answer += "\n### Along the way, the following additional values were determined:\n"
+    for col in residual_columns_list:
+        res = execute(f"select distinct {col} from {table_name}", tables=tables)
+        if len(res.rows) == 1:
+            answer += sqlglot_table2md(res)+"\n\n"
+    answer += "\n### And the following values were under-determined:\n"
+    for col in residual_columns_list:
+        res = execute(f"select distinct {col} from {table_name}", tables=tables)
+        if len(res.rows) > 1:
+            answer += sqlglot_table2md(res)+"\n\n"
+    display(Markdown(answer))
 
 def load_ipython_extension(ipython):
     """This function is called when the extension is
@@ -31,3 +57,4 @@ def load_ipython_extension(ipython):
     ipython.register_magic_function(jetisu_show, 'cell')
     ipython.register_magic_function(jetisu_testcase, 'cell')
     ipython.register_magic_function(jetisu_show_prepared, 'cell')
+    ipython.register_magic_function(jetisu_seek_goal, 'cell')