fix calling rpc with variadic argument #1552

wolfgangwalther · 2020-07-07T19:45:26Z

Still todo: Update docs

Adding the changelog item is hard on the first commit - because you have to guess the number of the PR. Let's see whether I guessed right.

Resolves #1470

wolfgangwalther · 2020-07-09T10:15:34Z

rebased to current master.

test/Feature/RpcSpec.hs

wolfgangwalther · 2020-07-10T06:35:28Z

So I had a look at the existing INOUT tests and I am confused.

This is what the test looks like:

it "returns a scalar result when there is a single INOUT param" $
        get "/rpc/single_inout_param?num=2" `shouldRespondWith`
          [json|3|] { matchHeaders = [matchContentTypeJson] }

and this is the SQL for the function

create function test.single_inout_param(INOUT num int) AS $$
  select num + 1;
$$ language sql;

Nowhere in the function definition the volatility is set to STABLE or IMMUTABLE - so according to the docs it should be marked as VOLATILE per default. Yet, the function is called with GET and not with POST.

I guess I am not understanding the PostgREST docs correctly, where it says:

The API endpoint supports POST (and in some cases GET) to execute the function.
[...]
Stable and immutable functions can be called with the HTTP GET verb if desired.

So does the volatility actually have any effect on PostgREST? Or is it just that function that modify data will fail in the read-only transaction and that's it - independent of the volatility? Does it have an affect on the json schema exposed at the api root?

@steve-chavez Can you help my understanding here?

steve-chavez · 2020-07-10T14:39:33Z

@wolfgangwalther Yes, the volatility is just a hint for the pg planner. For GET we call the function in a read only transaction, disregard the actual volatility and let pg err if the function modifies the database. For POST we do consider the volatility for the transaction.

postgrest/src/PostgREST/App.hs

Lines 115 to 121 in 343e41c

    
           ActionInvoke InvGet  -> HT.Read 
        
           ActionInvoke InvHead -> HT.Read 
        
           ActionInvoke InvPost -> 
        
             let v = maybe Volatile pdVolatility proc in 
        
             if v == Stable || v == Immutable 
        
                then HT.Read 
        
                else HT.Write

(For reference: the idea came from #328 (comment))

Does it have an affect on the json schema exposed at the api root?

You mean if the volatility is considered for the openapi output to hint whether a function should be called with GET/POST? I believe this isn't being done now.

I guess I am not understanding the PostgREST docs correctly, where it says:

You're right about the docs. Perhaps we can clarify and mention that GET enforces non-mutability with the read-only tx, despite the volatility.

wolfgangwalther · 2020-07-11T11:19:17Z

Thanks for clearing that up for me and providing that background - makes a lot more sense now. Improving the docs in that regard might help others as well.

You mean if the volatility is considered for the openapi output to hint whether a function should be called with GET/POST? I believe this isn't being done now.

Yes, it only shows POST I guess: #1222 #1151

Now, of course I asked myself what kind of error will be thrown in case of calling a volatile proc with GET. I found this:

    context "only for GET rpc" $ do
      it "should fail on mutating procs" $ do
        get "/rpc/callcounter" `shouldRespondWith` 500
        get "/rpc/setprojects?id_l=1&id_h=5&name=FreeBSD" `shouldRespondWith` 500

Instead of a generic 500, it would probably be better to return 405 Method Not Allowed, which seems like it's designed for exactly that case. Do you think the error returned on "mutation inside the read-only transaction" can easily be caught and mapped to 405 instead?

steve-chavez · 2020-07-13T23:15:55Z

Do you think the error returned on "mutation inside the read-only transaction" can easily be caught and mapped to 405 instead?

@wolfgangwalther I think it would be a matter of detecting the "25006 read_only_sql_transaction" pg error code in Errors.hs:

postgrest/src/PostgREST/Error.hs

Lines 178 to 187 in e8b4e37

    
           pgErrorStatus authed (P.SessionError (H.QueryError _ _ (H.ResultError rError))) = 
        
             case rError of 
        
               (H.ServerError c m _ _) -> 
        
                 case toS c of 
        
                   '0':'8':_ -> HT.status503 -- pg connection err 
        
                   '0':'9':_ -> HT.status500 -- triggered action exception 
        
                   '0':'L':_ -> HT.status403 -- invalid grantor 
        
                   '0':'P':_ -> HT.status403 -- invalid role specification 
        
                   "23503"   -> HT.status409 -- foreign_key_violation 
        
                   "23505"   -> HT.status409 -- unique_violation

(For reference https://www.postgresql.org/docs/12/errcodes-appendix.html)

I was thinking if there is a case where it wouldn't make sense to map a 25006 to a Method not allowed but couldn't come up with one. So I think 405 would be good.

src/PostgREST/DbStructure.hs

src/PostgREST/QueryBuilder.hs

test/fixtures/schema.sql

test/Feature/RpcSpec.hs

steve-chavez · 2020-07-15T18:02:45Z

@wolfgangwalther Thought you were going to do the 405 logic here, but it's actually better on another PR. Check my above review.

Co-authored-by: Steve Chavez <[email protected]>

wolfgangwalther · 2020-07-16T06:18:47Z

Yeah, that 405 stuff wasnt really related to this here - just noticed it while working on the stuff.

But there might be some other code that I would add to this PR instead of creating a new one: I have the "multiple query params with the same name should make an array" in the pipeline as suggested in #1470 (comment) and realized that this might have implications on how variadic handling (should) work(s) as well.

…arrays

wolfgangwalther · 2020-07-16T16:55:51Z

So as a first step, I extended the existing tests about RPC params a little bit, because I noticed passing various argument types via query string was not tested for, yet. The same thing about application/x-www-form-urlencoded, which was only tested for Prefer params=single-object, but not for regular function arguments. Then I added an array argument to those tests as well, because that was not included at all.

I noticed a couple of things - see my comments below.

test/Feature/RpcSpec.hs

wolfgangwalther · 2020-07-16T17:04:23Z

test/Feature/RpcSpec.hs

+              { matchHeaders = [matchContentTypeJson] }
+
+      it "accepts a variety of arguments with GET" $
+        -- without JSON / JSONB here, because passing those via query string is useless - they just become a "json string" all the time


So passing JSON / JSONB via query string does not seem to work. At least if I try to pass something that should match an object, it seems like this is just returned as a json string.

Looking at the tests below that ("parses xxx JSON as JSON"), where this is explicitely tested for POST, I wonder: Is this expected behaviour for GET / query string (composing json objects in a query string is really ugly...) or is this to be considered a bug?

The transformation from query params to sql params happens in Haskell code. I think it may be possible to fix this, but don't think is worth it. As you mentioned is ugly to deal with json(urlencoding) in the params.

I think users that wanted to pass json on GET params have been served well by Prefer: params=single-object http://postgrest.org/en/v7.0.0/api.html#calling-functions-with-a-single-json-parameter.

@wolfgangwalther Now that I think about it maybe the shortcoming is in the second JSON.toJSON in:

rpcPrmsToJson = ProcessedJSON (JSON.encode $ mapFromListWithMultiple $ second JSON.toJSON <$> rpcQParams) (S.fromList $ fst <$> rpcQParams)

The fix might could be using something like this?(Not sure):

postgrest/src/PostgREST/Middleware.hs

Lines 90 to 95 in 8e4687f

unquoted :: JSON.Value -> Text

unquoted (JSON.String t) = t

unquoted (JSON.Number n) =

toS $ formatScientific Fixed (if isInteger n then Just 0 else Nothing) n

unquoted (JSON.Bool b) = show b

unquoted v = toS $ JSON.encode v

wolfgangwalther · 2020-07-16T17:05:33Z

test/Feature/RpcSpec.hs

+      it "accepts a variety of arguments from an html form" $
+        request methodPost "/rpc/varied_arguments"
+            [("Content-Type", "application/x-www-form-urlencoded")]
+            "double=3.1&varchar=hello&boolean=true&date=20190101&money=0&enum=foo&arr=%7Ba,b,c%7D&integer=43"


I did not try to pass in JSON here, yet, but I think that would take a slightly different route then the query params. So that might yield a different result.

wolfgangwalther · 2020-07-16T17:10:25Z

test/Feature/StructureSpec.hs

@@ -452,6 +453,10 @@ spec = do
                    "format": "enum_menagerie_type",
                    "type": "string"
                  },
+                  "arr": {
+                    "format": "text[]",
+                    "type": "string"


So I had to add those lines to make the tests pass, because I added the array to the varied_arguments function. The result does not seem correct, though - I think this looks like #1405. Now I don't think it makes sense to keep this in here like this, because this is not what we want it to do.

If we just changed that here accordingly - did I accidently create a test for #1405 (comment) ?

wolfgangwalther · 2020-07-16T21:02:35Z

Now to my suggestion to implement support for GET /rpc/array_argument?arg=a&arg=b&arg=c to be passed as an array. On top of #1470 (comment) I found some more (better) reasons as to why to do this:

this has actually been asked for in the past already: rpc: specify argument types (GET requests on RPCs) #328 (comment)
other libraries do it as well: e.g. NodeJS , PHP (with different syntax), and probably many others
browsers send forms with multiple fields with the same name like this. Think multiple <input type="checkbox" name="x"> or <select multiple name="y">.... HTML5 allows it: https://stackoverflow.com/a/37579281

Ok, to be honest - I did the implementation first, because I was curious how the query string was parsed etc. - and only then looked for a reason ;)

So how did I implement this and what did I have in mind?

query strings and application/x-www-form-urlencoded bodies are now parsed, so that all params are put into a list
all lists with only one element are unpacked again. For non-repeated params, this is the same before. Repeated params are now passed as array to the RPCs.
to use this properly with a function taking an array, this function would have to be overloaded for the "single item" case, like this:

CREATE FUNCTION repeated(arg INT) RETURNS INT[]
LANGUAGE SQL AS $$
  SELECT repeated(ARRAY[arg])
$$;

CREATE FUNCTION repeated(arg INT[]) RETURNS INT[]
LANGUAGE SQL AS $$
  SELECT arg
$$;

This could now be called with both GET /rpc/repeated?arg=1&arg=2 and GET /rpc/repeated?arg=1. At least that's the theory, I couldn't get this to work, yet. Right now the tests for the overloaded function fail, because the wrong function is chosen. Not sure whether that's related to #1469 (see next paragraph) or something else we can control. Maybe postgres just chooses the wrong overload consistently...

For VARIADIC functions this will lead to another problem: We can't define an overloaded function with VARIADIC taking a single element instead of an array (that would be pointless...) - and we can't define the overloaded function without VARIADIC as well, because we can't reliably determine whether to actually use the VARIADIC keyword when passing the arguments or not. This is somewhat related to #1469 - we are not resolving the "overloading" in haskell code, so don't know beforehand which of the functions is going to be used.

But what we might be able to do is to call the VARIADIC function something like this in SQL:

SELECT * FROM variadic_func(VARIADIC arg := CASE jsonb_typeof(args.arg) WHEN 'array' THEN args.arg ELSE ARRAY[args.arg] END)

If that works, then there would suddenly be a valid use-case for VARIADIC functions in the first place: To support that "repeated query params" syntax properly. Not implemented yet, so the test for it still fails.

wolfgangwalther · 2020-07-16T21:11:46Z

This could now be called with both GET /rpc/repeated?arg=1&arg=2 and GET /rpc/repeated?arg=1. At least that's the theory, I couldn't get this to work, yet. Right now the tests for the overloaded function fail, because the wrong function is chosen. Not sure whether that's related to #1469 (see next paragraph) or something else we can control. Maybe postgres just chooses the wrong overload consistently...

Maybe this can be solved by putting those lists that are generated from the repeated query params into {...,...} format instead of JSON array format. Would need to do that for PG < 10 anyways, at least some tests fail for 9.4 right now because of that.

wolfgangwalther · 2020-07-20T06:27:46Z

Couple more thoughts about the overloading example above (and the corresponding failing test):

Why does this not work?

CREATE FUNCTION repeated(arg INT) RETURNS INT[]
LANGUAGE SQL AS $$
  SELECT repeated(ARRAY[arg])
$$;

CREATE FUNCTION repeated(arg INT[]) RETURNS INT[]
LANGUAGE SQL AS $$
  SELECT arg
$$;

I think we currently can't use overloaded functions with arguments with the same name. This is because we are resolving which overloaded function to use in Haskell and Postgres does not have to make any choice itself. This is because of:

postgrest/src/PostgREST/QueryBuilder.hs

Line 140 in 8e4687f

    
           "SELECT * FROM json_to_recordset(" <> selectBody <> ") AS _(" <> fmtArgs (\a -> " " <> pgaType a) <> ")",

where we cast all the arguments to postgres types (and thereby choosing which function to call). To do this, we are currently resolving the overloaded functions only by argument name, so we will only ever use one of the casts.

To improve this, we would need to either:

Move this logic to postgres somehow. I'm not sure this is even possible.
Add the actual value to the "which-overload-to-use" resolving process. That's not easy as well and I'm not sure if it's possible for query strings at all, because we don't have json types there, but just strings.

Will think about this a bit more.

steve-chavez · 2020-07-23T16:46:07Z

This is because we are resolving which overloaded function to use in Haskell and Postgres does not have to make any choice itself. This is because of:

@wolfgangwalther Yeah, that was needed to fix #1005. I think it was worth it.

Now to my suggestion to implement support for GET /rpc/array_argument?arg=a&arg=b&arg=c to be passed as an array

This PR got a bit complicated, how about if we leave the multiple params as array for another PR?

You already did solve #1470. We could merge the variadic part and improve it later.

variadic parameters are not common on API functions but being compliant with pg syntax is definitely a win for us.

wolfgangwalther · 2020-07-23T17:56:28Z

Yes, you're right it's gotten a bit more. I have thought about it and I think we can make it work now - at least in parts.

I would definitely keep the overloading out of this (and remove the tests in that direction for now), because that's a lot more complicated - and is an issue we need to solve more generally as well.

I might be able to get the "multiple params as array" working for variadic functions by looking at the function arguments while parsing the query string already. In this case there would be no overloading involved. If it doesn't work, then I will keep the "multiple params as array" out completely for now.

Would that work for you?

I think it would be good if we could include that in the next release. What's your current plan: When do you want to release? So I know by when I have to get this done. Had a bit of other stuff on my plate the last couple of days.

steve-chavez · 2020-07-23T23:01:15Z

I will keep the "multiple params as array" out completely for now.
Would that work for you?

Yes, that would work! It would also help me in reviewing the multiple param array idea better.

What's your current plan: When do you want to release? So I know by when I have to get this done. Had a bit of other stuff on my plate the last couple of days.

I was thinking to release when we merge this one and #1546. No pressure from my part :)

My current plan is to release what we have - so users can try the new reloading features and see if there's any bugs - add docs, and do PostGIS features for the next release.

wolfgangwalther · 2020-08-07T13:01:15Z

I thought about this in-depth again - and I don't think it would be a good idea to merge the variadic part I did so far as it is now. This is to avoid breaking changes later on.

My reasoning is as follows:

I strongly believe that it would be good to support some form of "multiple params as array" for the reasons somewhere upthread.
Whenever we support this, we have to deal with the zero and one params cases as well. It's easy to turn 2+ params of the same name into an array and pass this to a function - but the hard part will be to decide whether to turn zero repetitions into an empty array and one repetition into a 1-item array or not.
Since we are potentially dealing with multi-dimensional arrays (arrays of arrays), this zero/one-as-array-decision will not be easily possible by checking the function arguments type. Also this would be a major challenge with overloaded functions, because our current approach for choosing the function to call (based on argument names only) would not be able to handle this at all. Even though we probably need to improve on the overloading stuff in general, I'm not sure whether we will be able to do everything we would need to support this.
Variadic functions seems like a perfect candidate to solve this: We still need to handle the overloading case to be able to decide whether we are dealing with a variadic function or not - but this can use the current implementation (of course with the same drawbacks as currently the case). We can use the variadic marker for the decision to turn non-repeated params into an array or not: Once the argument name is variadic, we will just create an array, no matter how often the param is repeated.
This would have the implication, however, that we can't call variadic functions with array syntax anymore ?variadic_argument={a,b} - for the same reason as above (multi-dimensional arrays and the decision whether this is already the variadic array or not). I don't think this is a negative in itself - because if you want to pass an array in one param in the query string, you can easily use a regular array argument already. So "calling variadic with array syntax" doesn't actually add any benefit.
Now if we implement the "calling variadic with array syntax" now, it would be a breaking change to go back to the "multiple params as array" approach.

To avoid having to break this again, if we decide to support repeated params, I suggest to just implement the variadic in the form of repeated params from the start. This also makes a lot more sense conceptually - because that's exactly what "variadic" is all about.

I do already have a very good idea of how to implement this. I still need to do a smaller refactor, to have the function definition (including arguments + overloading decision) available where the query params are parsed, but that will not be a problem. I will not be able to do it this week, though. so @steve-chavez you might want to do the next release without variadic.

steve-chavez · 2020-08-12T22:42:58Z

@wolfgangwalther Sure, let's think better about variadic for the next release.

Also, check my latest review on PostgREST/postgrest-docs#339. It would be great to merge those additions.

wolfgangwalther · 2020-10-04T13:25:07Z

closed in favor of #1603

wolfgangwalther force-pushed the fix-variadic branch from 66a0d5d to 811587a Compare July 9, 2020 10:14

wolfgangwalther commented Jul 10, 2020

View reviewed changes

test/Feature/RpcSpec.hs Outdated Show resolved Hide resolved

wolfgangwalther mentioned this pull request Jul 11, 2020

Improve docs PostgREST/postgrest-docs#339

Merged

wolfgangwalther added 2 commits July 14, 2020 11:46

fix calling rpc with variadic argument

20907ae

refactor test

2f392d7

wolfgangwalther force-pushed the fix-variadic branch from 329544e to 2f392d7 Compare July 14, 2020 09:46

wolfgangwalther mentioned this pull request Jul 15, 2020

Return 405 Method not Allowed for GET of volatile RPC instead of 500 #1560

Merged