Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CHI-1855: Refactor HRM SQL to use a composable function for selecting a full contact #444

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

stephenhand
Copy link
Collaborator

Description

A POC PR as part of investigating permissions changes that might be required for case merging - so just for consideration & discussion rather than something that I absolutely want to merge

This PR refactors our contact selection logic to use a PostgreSQL function to encapsulate the repeated selection relationship logic in a reusable way.

This does 3 things:

  • Simplifies the client side SQL we need to generate significantly
  • Provides an easier way to enforce DB permissions around contacts than having to change all the client SQL we use
  • These functions are visible to the Postgres permissions system, so in future, if we applied this pattern more widely we could revoke some direct table SELECT permissions from the HRM database user and force it to go via more restrictive functions we define

Checklist

  • Corresponding issue has been opened
  • N/A New tests added

Related Issues

CHI-1855

Verification steps

Automated tests

) media ON true
$$
LANGUAGE SQL
STABLE;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking the functions 'STABLE' should mean that the Postgres query planner should be able to optimise it's query plan for the whole query, taking the logic inside the function and the logic of the calling statement and treating it as one big query, meaning it shouldn't be any less performant than not using a function

@stephenhand stephenhand marked this pull request as ready for review August 24, 2023 14:53
@GPaoloni
Copy link
Collaborator

Feeling like I get the idea here, but I have a few questions, maybe you can answer in a short meeting if you don't mind, but I will write them here so I don't forget.

  • This does not changes anything from the current behavior, is only prep work for something you are planning to do, that is using functions to "retrieve contacts with relations" so we can be more restrictive in the queries rather than post processing in the "business layer", right?
  • I'm not against moving all the relations code within the contact function itself, but I remember we explicitly broke that one into pieces intentionally, so we'll have "an explicit reference" (even in the form of "import join csam SQL statement") to the relations that are bundled with a contact. What is the reasoning to undo that? If it's just convenience I'm fine with it, just asking :P
  • While we plan to tear apart this huge payloads, and only send the csam/referral/conversation media ids with the contact payload, this functions will be useful to filter those at query time, so I'm guessing this is not intended to be a short term thing but rather a change in the way we are handling permissions, right?
  • Do we plan to extend this functions to what extent? Is the idea to get rid of the "permissions framework" we currently have?

@stephenhand
Copy link
Collaborator Author

@GPaoloni - the point of this is to demonstrate a different way of encapsulating & reusing SQL that can be used in combination with client side techniques we currently use or ones we could potentially use with Knex.

So for example, instead of attaching the same extra WHERE clause to lots of different queries on the client to enforce SQL level permissions, we can create a function that implements the same logic and use that in the queries instead.

To answer your specific questions:

  • As said above, this is just demonstrating an alternative approach to encapsulating SQL, it's not about doing more or less in the SQL layer than we were originally planning to do, just suggesting an additional technique we can employ to manage our SQL. You can see from the PR it significantly reduces the complexity of our client side SQL generation code
  • At the moment we don't make practical use of that split, we always take all the relations - so for the current logic we can combine it & simplify things. If the relations are in the application code it makes sense to put the related SQL logic in the application's function area, but if we use a function we don't need to have those relations in the application layer at all
  • We might still pull those relations when we request a single contact, not sure exactly how granular we want to make the future API at this point, but we could use functions in some or all of the new queries we create implementing it. Remember we can treat function results like any other table in a query and join extra stuff onto them etc., so the function doesn't need to take care of the whole query. I'm thinking one of their primary uses will be implementing SQL level permission checks, as an alternative to reusing WHERE clauses in the client layer. But this would only change permissions queries that we are going to handle in the SQL layer anyway, not doing more than we already intend to in the DB layer.
  • Using functions wouldn't entail changing anything we currently plan to do with permissions beyond how we encapsulate the SQL involved. We wouldn't be doing any more or less in the DB layer than we currently plan to, it would just be a technique we can use to simplify our client side query generation for logic we want to put in the DB layer anyway. To put it another way, using functions should never change the answer to the question 'Should we do this in the database or the service?' - it just might change HOW we might go about implementing it if we DO decide to do something in the DB layer

Happy to talk more on a call tho!

@stephenhand
Copy link
Collaborator Author

So any further thoughts? Merge or no merge? @robert-bo-davis @murilovmachado

@robert-bo-davis
Copy link
Collaborator

What are the primary benefits you see in moving this abstraction into postgres functions? Does that benefit outweigh the cognitive load of the developer having to think about "does this happen in our data access layer or is is based on some SQL that is hidden away in a stored function somewhere?"

My gut reaction is that this muddles data retrieval sql generation responsibilities in a way that will end up being more difficult to understand and maintain in the future as complexity grows than generating the SQL (or using a query builder) in the data access layer would be, but I don't have any well researched arguments to back that up.

@stephenhand
Copy link
Collaborator Author

What are the primary benefits you see in moving this abstraction into postgres functions? Does that benefit outweigh the cognitive load of the developer having to think about "does this happen in our data access layer or is is based on some SQL that is hidden away in a stored function somewhere?"

My gut reaction is that this muddles data retrieval sql generation responsibilities in a way that will end up being more difficult to understand and maintain in the future as complexity grows than generating the SQL (or using a query builder) in the data access layer would be, but I don't have any well researched arguments to back that up.

The 2 main benefits I see are

  1. This is a neater way to abstract common filters such as those used for permissions vs 'always add this WHERE clause everywhere' type features in query builders / ORMs, which add a similar cognitive load, if anything moreso because it's not always obvious from the SQL itself you are using such 'ambient' where clauses, whereas a parameterised call in the SQL is obviously not just a table, even if you don't immediately know what it is. It's also worth noting that this project has already ran into issues using data access side approaches to do this sort of thing already - I think a database side function composes more easily and flexibly than SQL chunks when they are being reused a lot, because they are used just like a table.

I'm not proposing we move loads of business logic into functions, just the highly reused filters & joins, i.e. those case when the times we DON'T want to apply this logic are the exception (permissions being a good example). I used the 'full contact join' for the example because we always do those joins when we select a contact in the current data access logic

  1. Just a future possibility, but like I suggested, we have the option in the future to restrict database users to having permission to call functions rather than granting direct table SELECT access, thus giving a more tailored and granular set of permissions and better 'defense in depth' vs attackers that get access to run SQL on the DB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants