Query building performance discussion #435

tomjaguarpaw · 2019-06-30T10:06:51Z

Migrating from #434

tomjaguarpaw · 2019-06-30T10:10:11Z

I would be very happy to have benchmarks in the CI. That would be a good way of catching performance regressions early. I'm also open to improving rendering performance by switching away from String. What exactly did you have in mind? I'm surprised that the pretty package is not already optimised as much as possible!

csasarak · 2019-06-30T12:19:19Z

I had a look at the pretty source code and it appears that while the Doc type is a rope structure, when it comes to rendering it uses Strings, which have the O(n) cost of appending. I'm actually not too surprised, the documentation says its well suited to things like compilers and if the whole point is to generate user-facing messages then the slowness probably isn't worth fixing. That SPJ did a lot of work on it suggests to me also that this is the case, given his work on GHC.

I was running into some trouble with the more complex queries in my program at work (I do data warehousing/analytics), what I found is that if I dump the generated code into postgres it runs quite quickly, but for whatever reason opaleye was taking a long time. When I profiled, most of the time was attributed to formatAndShowSql and the only CAF that it creates is the query String. You can see an example of flamegraph (made with ghc-prof-flamegraph) showing this here. This graph shows where time is spent proportionally, but for larger queries it can turn into a lot of real time.

For most of my simpler queries this isn't much of a problem, but I have a few which aggregate data across multiple tables, with multiple left-joins, inner joins, and unions. Opaleye is really great because I can abstract a lot of the common behavior into query combinators, but maybe this is abusive?

In any case, you can see a really rough non-working example of what I'm envisioning here: https://github.com/csasarak/haskell-opaleye/tree/bytestrings It's basically a direct copy of the existing rendering code, just with bytestring's Builder type. One thing it doesn't do is print a pretty version though.

The bytestring builders have O(1) append and postgresql-simple appears to be able to handle them directly. I can try to work on an example in the benchmark repo that I worked in for the unionAll question. I'd like to convince someone else that this is indeed a problem or that it can be done better before pushing too hard for change.

tomjaguarpaw · 2019-06-30T12:55:08Z

Opaleye is really great because I can abstract a lot of the common behavior into query combinators, but maybe this is abusive?

This is exactly what Opaleye is designed for so this is definitely a use case I want to perform well.

Could it be possible just to rewrite the internals of pretty to use bytestring, or come up with a roughly API-compatible version that does? That sounds like the right place to fix the problem and would benefit the community.

I will take a look at your analysis and example later.

csasarak · 2019-06-30T13:07:11Z

That is also possible, I was looking for a render function from Doc -> Text but could not find one - text also doesn't seem to have the O(1) api for appending. There does seem to be some work in this direction: haskell/pretty#42 and haskell/pretty#36.

I will familiarize myself more with the pretty library in order to know what writing an API compatible version would look like. I'm hoping I'm understanding what's happening correctly.

ocharles · 2019-06-30T13:51:21Z

You may want to explore the prettyprinter library - imo it's best in class

csasarak · 2019-06-30T16:07:55Z

Thanks for the tip! I'll look too

tomjaguarpaw · 2019-06-30T18:39:16Z

Thanks @ocharles. I happy to change library if there's something with better performance.

ocharles · 2020-02-18T22:43:16Z

I'm going to do some work on this, because I'm also getting hit by the performance of formatAndShowSQL:

This is from a Hedgehog test run - it takes 20s to run 100 tests, and the majority of this time is just formatting queries - this is pretty mad!

csasarak · 2020-02-19T03:07:35Z

Hey, I apologize for not working more diligently on this. The company I was working for closed-up so I've been busy working on other things. So, anecodotally PR #436 shows a bit of a decrease in time when we switch to prettyprinter.

I have more time recently and since there's interest here I'm happy to try to help out if you like. I have a few very simple benchmarks which might be useful here: https://github.com/csasarak/haskell-opaleye/tree/bench
The main thing they try to do is a few successive unions or joins since this seems to be where queries start getting bigger. In particular, I recall seeing a lot of time spent in the commaV function, I think where it would be concatenating column names.

tomjaguarpaw · 2020-02-19T10:32:04Z

Great, it would be good to get some more insight into what is causing the slow performance.

ocharles · 2020-02-19T22:16:15Z

Investigations so far:

prettyprinter doesn't make a difference
I defined Opaleye.Pretty, which has just the pretty printing API we need, but just uses Endo [String] and doesn't actually do any pretty printing at all (just concatenates strings). This is barely faster than the pretty printer we already have.

This could be tougher than we think!

tomjaguarpaw · 2020-02-20T11:48:56Z

Is it possible that generating the SQL expression tree from the PrimQuery expression tree is the thing that is taking a lot of time, and it is wrongly being attributed to showSqlExplicit?

ocharles · 2020-02-20T12:00:24Z

I don't think so, but I'll play around with deepseq and see if giving that a fully evaluated thing to print changes anything

saurabhnanda · 2020-09-01T10:53:05Z

I think in view of #476 this issue should be re-opened. Probably prepare if for Hackoberfest so that we can tackle this as a community and get it fixed once and for all.

tomjaguarpaw mentioned this issue Jun 30, 2019

UnionAll increases showSQL time exponentially #434

Closed

csasarak mentioned this issue Jul 1, 2019

Test using the prettyprinter library for rendering SQL #436

Closed

tuomohopia mentioned this issue Mar 9, 2020

Slow query rendering morphismtech/squeal#158

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query building performance discussion #435

Query building performance discussion #435

tomjaguarpaw commented Jun 30, 2019

tomjaguarpaw commented Jun 30, 2019

csasarak commented Jun 30, 2019 •

edited

Loading

tomjaguarpaw commented Jun 30, 2019

csasarak commented Jun 30, 2019

ocharles commented Jun 30, 2019

csasarak commented Jun 30, 2019 via email •

edited by tomjaguarpaw

Loading

tomjaguarpaw commented Jun 30, 2019

ocharles commented Feb 18, 2020

csasarak commented Feb 19, 2020

tomjaguarpaw commented Feb 19, 2020

ocharles commented Feb 19, 2020

tomjaguarpaw commented Feb 20, 2020 •

edited

Loading

ocharles commented Feb 20, 2020

saurabhnanda commented Sep 1, 2020

Query building performance discussion #435

Query building performance discussion #435

Comments

tomjaguarpaw commented Jun 30, 2019

tomjaguarpaw commented Jun 30, 2019

csasarak commented Jun 30, 2019 • edited Loading

tomjaguarpaw commented Jun 30, 2019

csasarak commented Jun 30, 2019

ocharles commented Jun 30, 2019

csasarak commented Jun 30, 2019 via email • edited by tomjaguarpaw Loading

tomjaguarpaw commented Jun 30, 2019

ocharles commented Feb 18, 2020

csasarak commented Feb 19, 2020

tomjaguarpaw commented Feb 19, 2020

ocharles commented Feb 19, 2020

tomjaguarpaw commented Feb 20, 2020 • edited Loading

ocharles commented Feb 20, 2020

saurabhnanda commented Sep 1, 2020

csasarak commented Jun 30, 2019 •

edited

Loading

csasarak commented Jun 30, 2019 via email •

edited by tomjaguarpaw

Loading

tomjaguarpaw commented Feb 20, 2020 •

edited

Loading