Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for custom output format on GET #766

Closed
steve-chavez opened this issue Dec 7, 2016 · 9 comments
Closed

Support for custom output format on GET #766

steve-chavez opened this issue Dec 7, 2016 · 9 comments

Comments

@steve-chavez
Copy link
Member

I have a table:

  create table workers(
    id char(8) primary key,
    first_name text not null,
    last_name text not null,
    birth_date date
  )

I need to download a file with this table rows in the following format:

09310817|JOHN|DOE|15/04/88|
42152780|FRED|BLOGGS|20/02/85|
43006541|OTTO|NORMALVERBRAUCHER|01/07/90|
02452492|ERIKA|MUSTERMANN|11/01/80|

Sort of a pipe separated values format with that extra pipe at the end, this is a requirement for a local goverment entity.

Currently there's no way to output that format with just postgREST I would need some custom application code, this is a problem since I also need the security and filtering that postgREST provides.

I can construct the format in a column of a custom view, it would be convenient to have a way to tell postgREST that the JSON or CSV wrapping is not required and just output that column.

Seeing the following issues #257, #506 I see there's some overlap with my use case.

My proposal is similar to this one differing in that the required parameter only be selecting a single column select=col when doing an Accept: application/octet-stream if no single col is selected a 406 Not Acceptable response could be applied.

The single valued rows of the response could be concatenated taking advantage of the string_agg pg function, that supports text and bytea.

For my text case it's appropriate that postgREST concats the rows with a '\n' but not sure if this is good for the binary output, if that's the case then we could just concat with '' and let the client decide the delimiter; though it would be better if the users of multiple row binary output clarify this.

I already have a working custom build of postgREST with this feature, I think it would be great to support this flexibility. Glad to do a PR if we reach an agreement.

@ruslantalpa
Copy link
Contributor

@steve-chavez i think you are overcomplicating things with this approach.
Maybe in some other cases your proposed method with one column would be needed but in this case i think it's not a good approach.
The way to do this is to postprocess the response from postgrest at the proxy level.
Since you are talking gov/security ... you need nginx in front of postgrest for security, since it's already there, go for openresty, once you have openresty, do this

  • make a normal request to postgrest and request back a CSV format.
  • when the request comes back from postgrest, use body_filter_by_lua_block (or other methods to postprocess the response) to turn every , to | (i am sure it's a bit more complicating then replacing a single char, but you get the idea), it will be literally a few lines of lua that will work reliably without the need for a hack on the postgrest/db side

@ruslantalpa
Copy link
Contributor

Although not specifically about this issue, i'd like to give you an example of what this setup can give you (and i use it in one of my examples).

Say i have an RPC that returns rows of a certain type.
In normal conditions, one can use &select to request only the columns he wants.
In my particular case however, i needed to let the use do just that, and get that reply he wanted, but i also wanted that when the request actually got to postgrest, a few mandatory fields would be requested, so i was able to do just that

  • before the request was sent to postgrest, the select parameter was altered and the original form remembered
  • when the request came back, i was able to postprocess it and remove the data the user did not request

this is just one example of the power of having a scriptable proxy in front of postgrest

@ruslantalpa
Copy link
Contributor

ruslantalpa commented Dec 7, 2016

what this method might look in you case is this:

  • have your clients make requests with Accept: my_custom_gove_mime_type
  • the proxy preprocesses the request and translates that header to request CSV (and remembers it needs postprocessing)
  • when the response comes back, it alteres the response body and sets back the Content-Type to my_custom_gov_mime_type.

This way with a few lines of lua, it will work on all your tables/views and the clients making the request can use all the features of postgrest (filters/select ...etc) and no special code is needed in postgrest or the db for this to work

@steve-chavez
Copy link
Member Author

steve-chavez commented Dec 8, 2016

@ruslantalpa I understand the advantages of having nginx in front of postgrest and I have too considered doing this at the proxy level as an option with a form similar to your suggestion.

But I think this formatting would be better done in the db, I don't consider doing a
select concat(id, '|', first_name, '|', last_name, '|') and defining that in a view column or a computed column such an ugly hack, plus I believe it goes better with the postgREST way of doing most of the application logic in the db.

Also I only need this for a couple of tables, I think that doing custom logic in the proxy adds more complexity than just defining a couple of views in the db.

Now aside from my use case, the other reason for this proposed feature is that I saw the similarities with the binary output case, with the same code I could support both.

@ruslantalpa
Copy link
Contributor

@steve-chavez this is going to be a long post, i'll have a potato at the end for you :)
Also don't consider this as a "no, this will never happen" but a debate to the merits of the idea, the usecases it enables and the costs/tradeoffs. Also, as the saying goes, the views are my own and not PostgREST (community/leadership/ownership however you want to call it)

In terms of rules for adding new features, i always think of it as a sum/relation between the following points:

  • Is there absolutely now other way of implementing it besides a change in PostgREST?
  • Is it a non trivial amount of code/logic to implement this?
  • Does it enable a big class of usecases that would make it worth overlooking a bit the 2 points above

If you take your suggestion and look at it from the perspective of your example/usecase, i would say it fails all three points.
If however you consider the problem of binary data, that has a better chance of being accepted since it scores better in 1 and 3. Now if somehow the the implementation for the binary data, somehow enables your usecase, i have no problem with that.

Comparing my proposed solution to yours for your specific usecase:
You can only refer to it as complex in terms of adding a new tool (nginx) to the toolchain, because i disagree 10 lines of lua is complexity. Now about a new tool, unless this is an internal system, you ALWAYS put nginx in front of postgrest if you expose it to the internet, and so, if it's already there, those 10 lines do not add complexity.
Another difference is that my method allows for the use of select and filters and in general of this feature on all the outputs from all the endpoints available, your api client is in control. With your implementation, you only have the ability to filter, but not the ability to use select, in adition to that, it can only be used on the views you specifically implemented it (so i would say it's more complexity since you have to implement the same logic in multiple places)

The way to implement this (looking from the perspective of binary data).
It was discussed somewhere that we should have the possibility to allow database side logic/code to have better control of the http response, i.e. have the decision of what headers, response code, content type, body, reside in the database. The conclusion was that it's a good idea to have that, and the way to implement it would be to handle separately, in a special way, responses coming from functions that have a special return type, for example http_response(code integer, headers text[], response_type text, body binary) or something along those lines, the downside of course is that one can not use filters/select with that. But at the same time, most of the cases needing binary data, do not need the filters or the select capabilities.

As promised (potatoes) -> Oo :)

@ruslantalpa
Copy link
Contributor

@steve-chavez have you tried requesting only that column and specifying that you want the reply in csv format? Don't you get in that case exactly the thing you want?

@steve-chavez
Copy link
Member Author

steve-chavez commented Dec 10, 2016

@ruslantalpa If the request is made specifying csv format it also includes the name of the column as a header so it's not exactly what I want, still requires custom proxy logic to strip the first line, having said that I totally get your point in that my particular use case doesn't merit being a new feature in postgREST seeing that I'm the only user that requested this and that there's a workaround.

So I'll focus instead on the binary case and see my use case only as a possible added benefit.

Regarding the control of the http response, I see that the way you mentioned could add great flexibility but don't think that is necessary for the binary use case, my initial proposal could be just enough to satisfy all the requested use cases and now that plurality=singular support is being removed currently there's no way to enforce a single row so I think an option could be concatenating the binary rows without a delimiter.

I'll close this issue and continue the discussion in the binary case issues.
Thank you for the potatoes by the way.

@ruslantalpa
Copy link
Contributor

@steve-chavez
plurality-singular is not being removed, it will be just another way to specify it using the Accept header.
About the csv thing, it seems there is a very small step needed for your usecase.
I think the implementation is very simple, it might be beneficial to other users too, maybe you should implement that:

something along the line of what the current prefer singular discussion, i.e. say
Accept: text/csv; noheader

@ruslantalpa
Copy link
Contributor

you would need to add a custom accept content type https://github.com/begriffs/postgrest/blob/master/src/PostgREST/ApiRequest.hs#L61
then based on that type have this function skip the header
https://github.com/begriffs/postgrest/blob/master/src/PostgREST/QueryBuilder.hs#L430

(but wait a bit before the current PRs are merged, me and @begriffs will need to touch a few core files)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants