CHI-1855: Refactor HRM SQL to use a composable function for selecting a full contact #444

stephenhand · 2023-08-24T14:45:45Z

Description

A POC PR as part of investigating permissions changes that might be required for case merging - so just for consideration & discussion rather than something that I absolutely want to merge

This PR refactors our contact selection logic to use a PostgreSQL function to encapsulate the repeated selection relationship logic in a reusable way.

This does 3 things:

Simplifies the client side SQL we need to generate significantly
Provides an easier way to enforce DB permissions around contacts than having to change all the client SQL we use
These functions are visible to the Postgres permissions system, so in future, if we applied this pattern more widely we could revoke some direct table SELECT permissions from the HRM database user and force it to go via more restrictive functions we define

Checklist

Corresponding issue has been opened
N/A New tests added

Related Issues

CHI-1855

Verification steps

Automated tests

…ntact

stephenhand · 2023-08-24T14:52:13Z

hrm-domain/hrm-service/migrations/20230824113300-contact-function.js

+  ) media ON true
+$$
+LANGUAGE SQL
+STABLE;


Marking the functions 'STABLE' should mean that the Postgres query planner should be able to optimise it's query plan for the whole query, taking the logic inside the function and the logic of the calling statement and treating it as one big query, meaning it shouldn't be any less performant than not using a function

GPaoloni · 2023-08-31T20:40:01Z

Feeling like I get the idea here, but I have a few questions, maybe you can answer in a short meeting if you don't mind, but I will write them here so I don't forget.

This does not changes anything from the current behavior, is only prep work for something you are planning to do, that is using functions to "retrieve contacts with relations" so we can be more restrictive in the queries rather than post processing in the "business layer", right?
I'm not against moving all the relations code within the contact function itself, but I remember we explicitly broke that one into pieces intentionally, so we'll have "an explicit reference" (even in the form of "import join csam SQL statement") to the relations that are bundled with a contact. What is the reasoning to undo that? If it's just convenience I'm fine with it, just asking :P
While we plan to tear apart this huge payloads, and only send the csam/referral/conversation media ids with the contact payload, this functions will be useful to filter those at query time, so I'm guessing this is not intended to be a short term thing but rather a change in the way we are handling permissions, right?
Do we plan to extend this functions to what extent? Is the idea to get rid of the "permissions framework" we currently have?

stephenhand · 2023-09-01T06:43:10Z

@GPaoloni - the point of this is to demonstrate a different way of encapsulating & reusing SQL that can be used in combination with client side techniques we currently use or ones we could potentially use with Knex.

So for example, instead of attaching the same extra WHERE clause to lots of different queries on the client to enforce SQL level permissions, we can create a function that implements the same logic and use that in the queries instead.

To answer your specific questions:

As said above, this is just demonstrating an alternative approach to encapsulating SQL, it's not about doing more or less in the SQL layer than we were originally planning to do, just suggesting an additional technique we can employ to manage our SQL. You can see from the PR it significantly reduces the complexity of our client side SQL generation code
At the moment we don't make practical use of that split, we always take all the relations - so for the current logic we can combine it & simplify things. If the relations are in the application code it makes sense to put the related SQL logic in the application's function area, but if we use a function we don't need to have those relations in the application layer at all
We might still pull those relations when we request a single contact, not sure exactly how granular we want to make the future API at this point, but we could use functions in some or all of the new queries we create implementing it. Remember we can treat function results like any other table in a query and join extra stuff onto them etc., so the function doesn't need to take care of the whole query. I'm thinking one of their primary uses will be implementing SQL level permission checks, as an alternative to reusing WHERE clauses in the client layer. But this would only change permissions queries that we are going to handle in the SQL layer anyway, not doing more than we already intend to in the DB layer.
Using functions wouldn't entail changing anything we currently plan to do with permissions beyond how we encapsulate the SQL involved. We wouldn't be doing any more or less in the DB layer than we currently plan to, it would just be a technique we can use to simplify our client side query generation for logic we want to put in the DB layer anyway. To put it another way, using functions should never change the answer to the question 'Should we do this in the database or the service?' - it just might change HOW we might go about implementing it if we DO decide to do something in the DB layer

Happy to talk more on a call tho!

stephenhand · 2023-09-11T08:19:22Z

So any further thoughts? Merge or no merge? @robert-bo-davis @murilovmachado

robert-bo-davis · 2023-09-11T11:03:08Z

What are the primary benefits you see in moving this abstraction into postgres functions? Does that benefit outweigh the cognitive load of the developer having to think about "does this happen in our data access layer or is is based on some SQL that is hidden away in a stored function somewhere?"

My gut reaction is that this muddles data retrieval sql generation responsibilities in a way that will end up being more difficult to understand and maintain in the future as complexity grows than generating the SQL (or using a query builder) in the data access layer would be, but I don't have any well researched arguments to back that up.

stephenhand · 2023-09-11T14:09:37Z

What are the primary benefits you see in moving this abstraction into postgres functions? Does that benefit outweigh the cognitive load of the developer having to think about "does this happen in our data access layer or is is based on some SQL that is hidden away in a stored function somewhere?"

My gut reaction is that this muddles data retrieval sql generation responsibilities in a way that will end up being more difficult to understand and maintain in the future as complexity grows than generating the SQL (or using a query builder) in the data access layer would be, but I don't have any well researched arguments to back that up.

The 2 main benefits I see are

This is a neater way to abstract common filters such as those used for permissions vs 'always add this WHERE clause everywhere' type features in query builders / ORMs, which add a similar cognitive load, if anything moreso because it's not always obvious from the SQL itself you are using such 'ambient' where clauses, whereas a parameterised call in the SQL is obviously not just a table, even if you don't immediately know what it is. It's also worth noting that this project has already ran into issues using data access side approaches to do this sort of thing already - I think a database side function composes more easily and flexibly than SQL chunks when they are being reused a lot, because they are used just like a table.

I'm not proposing we move loads of business logic into functions, just the highly reused filters & joins, i.e. those case when the times we DON'T want to apply this logic are the exception (permissions being a good example). I used the 'full contact join' for the example because we always do those joins when we select a contact in the current data access logic

Just a future possibility, but like I suggested, we have the option in the future to restrict database users to having permission to call functions rather than granting direct table SELECT access, thus giving a more tailored and granular set of permissions and better 'defense in depth' vs attackers that get access to run SQL on the DB

Refactor HRM SQL to use a composable function for selecting a full co…

acc87c3

…ntact

stephenhand requested review from murilovmachado, robert-bo-davis, nickhurlburt, GPaoloni, acedeywin and mythilytm August 24, 2023 14:45

Oops, revert changes that shouldn't have been pushed

f7d8643

stephenhand commented Aug 24, 2023

View reviewed changes

stephenhand marked this pull request as ready for review August 24, 2023 14:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CHI-1855: Refactor HRM SQL to use a composable function for selecting a full contact #444

CHI-1855: Refactor HRM SQL to use a composable function for selecting a full contact #444

stephenhand commented Aug 24, 2023

stephenhand Aug 24, 2023

GPaoloni commented Aug 31, 2023

stephenhand commented Sep 1, 2023

stephenhand commented Sep 11, 2023

robert-bo-davis commented Sep 11, 2023

stephenhand commented Sep 11, 2023

CHI-1855: Refactor HRM SQL to use a composable function for selecting a full contact #444

Are you sure you want to change the base?

CHI-1855: Refactor HRM SQL to use a composable function for selecting a full contact #444

Conversation

stephenhand commented Aug 24, 2023

Description

Checklist

Related Issues

Verification steps

stephenhand Aug 24, 2023

Choose a reason for hiding this comment

GPaoloni commented Aug 31, 2023

stephenhand commented Sep 1, 2023

stephenhand commented Sep 11, 2023

robert-bo-davis commented Sep 11, 2023

stephenhand commented Sep 11, 2023