Skip to content

Commit

Permalink
Adding Q&A to the intensionally defined relation
Browse files Browse the repository at this point in the history
  • Loading branch information
DavidPratten committed Jan 11, 2023
1 parent e6e2e45 commit 4407cea
Show file tree
Hide file tree
Showing 4 changed files with 252 additions and 6 deletions.
131 changes: 131 additions & 0 deletions Goal_seeking_covid_vaccination_and_work.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"%reload_ext jetisu.query_idr_magic"
]
},
{
"cell_type": "markdown",
"source": [
"# Interviewing using Intensionally Defined Relations\n",
"Intensionally defined relations support goal directed question and answer with users. This notebook shows this working for arbitrary columns in the ```covid_vaccinations_and_work``` relation.\n",
"\n",
"Goal seeking is begun by nominating the relation being searched and the list of columns that are sought. Jetisu chooses the questions to ask that will give the fastest answer. The principles involved in choosing the columns to query are explained below, but first here is an example.\n",
"\n",
"_To see this in action, you will need to [run the example notebooks](docs/run_notebooks.md) for yourself._"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 2,
"outputs": [
{
"data": {
"text/plain": "<IPython.core.display.Markdown object>",
"text/markdown": "## Answer\n|covid_vaccination_work_recommended_doses|covid_vaccination_work_mandatory|\n|----|----|\n|3|True|\n### Because\nwork_sector='aged_care' and work_location='new_south_wales' and aged_care_facility\n\n### Along the way, the following additional values were determined:\n|disability_worker_in_school|\n|----|\n|False|\n\n|nsw_health_worker|\n|----|\n|False|\n\n|specialist_school|\n|----|\n|False|\n\n\n### And the following values were under-determined:\n|private_home_only|\n|----|\n|False|\n|True|\n\n"
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%jetisu_seek_goal\n",
"{\n",
" \"table_name\": \"covid_vaccinations_and_work\",\n",
" \"goal_list\": [\n",
" \"covid_vaccination_work_recommended_doses\",\n",
" \"covid_vaccination_work_mandatory\"\n",
" ]\n",
"}"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"# What sequence are questions asked?\n",
"The best column to ask next is computed 'on-the-fly' by jetisu using the following scoring system:\n",
"- The more values in the column, the higher the column's score.\n",
"- If the column is a cross-product of the other columns the score is lower.\n",
"- The more equal the distribution of values in the column the higher the score.\n",
"\n",
"Here is another example which will interview the person to find their work sector based on the Covid rules."
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 3,
"outputs": [
{
"data": {
"text/plain": "<IPython.core.display.Markdown object>",
"text/markdown": "## Answer\n|work_sector|\n|----|\n|aged_care|\n### Because\ncovid_vaccination_work_recommended_doses=3 and work_location='western_australia' and aged_care_facility\n\n### Along the way, the following additional values were determined:\n|covid_vaccination_work_mandatory|\n|----|\n|True|\n\n|private_home_only|\n|----|\n|False|\n\n|disability_worker_in_school|\n|----|\n|False|\n\n|nsw_health_worker|\n|----|\n|False|\n\n|specialist_school|\n|----|\n|False|\n\n\n### And the following values were under-determined:\n"
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%jetisu_seek_goal\n",
"{\n",
" \"table_name\": \"covid_vaccinations_and_work\",\n",
" \"goal_list\": [\n",
" \"work_sector\"\n",
" ]\n",
"}"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"# Next step\n",
"You can edit and re-run this example workbook by following the instructions here:\n",
"[How to run the example notebooks](https://github.com/DavidPratten/jetisu/blob/main/docs/run_notebooks.md)\n",
"\n",
"You could, for example seek the ```work_location``` column."
],
"metadata": {
"collapsed": false
}
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
7 changes: 6 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,19 @@ Jetisu is a toolset for modelling with __intensionally defined relations__.

Jetisu is understood to be the first implementation of intensionally defined relations. If you are aware of an earlier implementation, please raise an issue.

## Modelling Rules As Code
## Querying Rules As Code
The examples are chosen to illustrate the benefits (and/or challenges) of using intensionally defined relations to model rules as code.
- [ACT Conveyance Duty](ACT_Conveyance_Duty.ipynb)
- [Australian GST](Australian_GST.ipynb)
- [Birthday Money](Birthday%20Money.ipynb)
- [Australian COVID vaccinations up-to-date and eligibility](COVID_vaccinations.ipynb)
- [Australian COVID vaccinations mandatory for work roles](COVID_vaccinations_and_work.ipynb)
- [Range](Range.ipynb)

## Interactive Q&A with Rules as Code
This example shows user Q&A using the same rule-set as the above "Australian COVID vaccinations mandatory for work roles" example:
- [Tell me my status for mandatory vaccinations](Goal_seeking_covid_vaccination_and_work.ipynb)

## Edit and re-run the example notebooks
You can ```docker run``` the example Jupyter notebooks for yourself.

Expand Down
89 changes: 86 additions & 3 deletions jetisu/idr_query.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
import subprocess
import sys
import tempfile
import math

from sqlglot import exp, parse_one
from sqlglot.executor import execute
Expand Down Expand Up @@ -38,6 +39,11 @@ def mzn_output_quote(k, v, typed_parameters_dict):
else:
return v

def sqlglot_table2md(res):
return '|' + '|'.join(res.columns) + '|' + "\n" + '|' + '|'.join(
["----" for x in res.columns]) + '|' + "\n" + "\n".join(
['|' + '|'.join([str(val) for val in r]) + '|' for r in res.rows])

def idr_query(sql_query, return_data):
assert return_data in ['raw', 'data', 'markdown table', 'model', 'constrained model']

Expand Down Expand Up @@ -221,8 +227,85 @@ def idr_query(sql_query, return_data):
elif return_data == 'data':
return res.columns, res.rows
elif return_data == 'markdown table':
return '|' + '|'.join(res.columns) + '|' + "\n" + '|' + '|'.join(
["----" for x in res.columns]) + '|' + "\n" + "\n".join(
['|' + '|'.join([str(val) for val in r]) + '|' for r in res.rows])
return sqlglot_table2md(res)
else:
return 'Programming error - this should never occur'

def jetisu_goal_directed(goal_list, table_name):
tables = {table_name: idr_query(f"select * from {table_name};", 'raw')}
where_condition = ''
residual_columns_list = []
while not residual_columns_list:
goal_list, tables, where_condition, residual_columns_list = jetisu_ask_next_question(goal_list, tables, where_condition)
return tables, where_condition, residual_columns_list

def jetisu_ask_next_question(goal_list, tables, where_condition, residual_columns=''):
table_name = list(tables.keys())[0]
schema = tables[table_name].columns
search_prompt = f"Finding {', '.join(goal_list)}"
search_over_list = set(schema) - set(goal_list)
cross_product_ratio = {} # 1 = cross product, higher is better
randomness_proxy = {} # 1 = apparently random, higher is better
effectiveness_ratio = {}

res_goal = execute(f"select count(*) from (select distinct {', '.join(goal_list)} from {table_name})",
tables=tables)
res_search_over = execute(f"select count(*) from (select distinct {', '.join(search_over_list)} from {table_name})",
tables=tables)
if res_goal.rows[0][0] == 1 or res_search_over.rows[0][0] == 1:
# found solution
# tables contains residual cases which to which the relation is indifferent
return goal_list, tables, where_condition, search_over_list

for q in search_over_list:
other_columns = set(schema) - set((q,))
# do the count of distinct for q and for other cols take the division and save it.jetisu_ask_next_question
res_all = execute(f"select count(*) from {table_name}", tables=tables)
res_q_and_goal = execute(
f"select count(*) from (select distinct {', '.join(goal_list + [q])} from {table_name})", tables=tables)
res_q = execute(f"select count(*) from (select distinct {q} from {table_name})", tables=tables)
res_other = execute(f"select count(*) from (select distinct {', '.join(other_columns)} from {table_name})",
tables=tables)
# for cross product this will be 1, for primary key this will be N the cardinality of the table.
cross_product_ratio[q] = (res_q.rows[0][0] * res_goal.rows[0][0]) / res_q_and_goal.rows[0][
0] # / (1+res_all.rows[0][0]-res_q_and_goal.rows[0][0])

# calculate sum of squares
res_sum_of_squares = execute(
f"select sum(sos) ssos from (select {q}, count(*)*count(*) sos from {table_name} group by {q})",
tables=tables)
randomness_proxy[q] = math.sqrt(res_sum_of_squares.rows[0][0]) / math.sqrt(
res_all.rows[0][0] * pow(res_all.rows[0][0] / res_q.rows[0][0], 2))

effectiveness_ratio[q] = cross_product_ratio[q] * randomness_proxy[q] * res_q.rows[0][0]
sortedq = sorted(effectiveness_ratio.items(), key=lambda x: x[1], reverse=True)
chosen_q = sortedq[0][0]
chosen_other_columns = set(schema) - set((chosen_q,))
qlist = execute(f"select distinct {chosen_q} from {table_name}", tables=tables).rows
enumerated_qlist = enumerate(qlist, start=1)
prompt = '\n'.join([f"{x}) {y[0]}" for (x, y) in enumerated_qlist])
response_valid = False
while not response_valid:
response = input(f"{search_prompt}\n{chosen_q}?\n{prompt}")
if response == '':
return "Search Cancelled ..."
try:
response_int = int(response)
except ValueError:
continue
if response_int >= 1 and response_int <= len(qlist):
response_valid = True
chosen_where_condition = where_condition + (" and " if where_condition else '')
this_where_condition = f"{chosen_q}" if str(qlist[response_int - 1][0]) == "True" else (
f"not {chosen_q}" if str(
qlist[response_int - 1][0]) == "False" else (f"{chosen_q}={qlist[response_int - 1][0]}" if (
isinstance(qlist[response_int - 1][0], int) or isinstance(qlist[response_int - 1][0],
float)) else f"{chosen_q}='{qlist[response_int - 1][0]}'"))
chosen_where_condition += this_where_condition

retdata = execute(
f"select distinct {', '.join(chosen_other_columns)} from {table_name} where {this_where_condition}",
tables=tables)

chosen_tables = {table_name: retdata}
return goal_list, chosen_tables, chosen_where_condition, False
31 changes: 29 additions & 2 deletions jetisu/query_idr_magic.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
from IPython.display import display, Markdown, Latex
from jetisu.idr_query import idr_query
from jetisu.idr_query import idr_query, jetisu_goal_directed, sqlglot_table2md
import hashlib

import json
from sqlglot.executor import execute

def jetisu_query(line, cell):
display(Markdown(idr_query(cell, 'markdown table')))
Expand All @@ -20,6 +21,31 @@ def jetisu_show(line, cell):
def jetisu_show_prepared(line, cell):
display(Markdown("```\n\n" + idr_query(cell, 'constrained model') + "\n```"))

def jetisu_seek_goal(line, cell):
# {
# "table_name": "covid_vaccinations_and_work",
# "goal_list": [
# "covid_vaccination_work_recommended_doses",
# "covid_vaccination_work_mandatory"
# ]
# }
request = json.loads(cell)
goal_list = request["goal_list"]
table_name = request["table_name"]
tables, where_condition, residual_columns_list = jetisu_goal_directed(goal_list, table_name)
answer = "## Answer\n"+sqlglot_table2md(execute(f"select distinct {', '.join(goal_list)} from {table_name}", tables=tables))
answer += f"\n### Because\n{where_condition}\n"
answer += "\n### Along the way, the following additional values were determined:\n"
for col in residual_columns_list:
res = execute(f"select distinct {col} from {table_name}", tables=tables)
if len(res.rows) == 1:
answer += sqlglot_table2md(res)+"\n\n"
answer += "\n### And the following values were under-determined:\n"
for col in residual_columns_list:
res = execute(f"select distinct {col} from {table_name}", tables=tables)
if len(res.rows) > 1:
answer += sqlglot_table2md(res)+"\n\n"
display(Markdown(answer))

def load_ipython_extension(ipython):
"""This function is called when the extension is
Expand All @@ -31,3 +57,4 @@ def load_ipython_extension(ipython):
ipython.register_magic_function(jetisu_show, 'cell')
ipython.register_magic_function(jetisu_testcase, 'cell')
ipython.register_magic_function(jetisu_show_prepared, 'cell')
ipython.register_magic_function(jetisu_seek_goal, 'cell')

0 comments on commit 4407cea

Please sign in to comment.