Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understand where styler spends most of the time #759

Open
krlmlr opened this issue Mar 25, 2021 · 21 comments
Open

Understand where styler spends most of the time #759

krlmlr opened this issue Mar 25, 2021 · 21 comments

Comments

@krlmlr
Copy link
Member

krlmlr commented Mar 25, 2021

This is crucial for #558.

Because we use mostly recursive calls, and effectively each function can call each other function, I suspect it is difficult to see where exactly styler spends most of its time, and which parts perhaps should be optimized. Can you confirm that?

To work around, I propose processing the .Rprof files created from a profiler run. If we erase the right places from the call stack for each sample, we might just end up with a clean flat structure that immediately shows where the time is spent. What do you think?

Alternatively, we could (temporarily) replace the recursive calls with later::later() or replace recursion with a queue and a while {} loop. Editing the call stack sounds like less work.

@lorenzwalthert
Copy link
Collaborator

lorenzwalthert commented Mar 25, 2021

The recursive calls are time consuming, yes. In particular, there are two important recursions:

  • building the nested parse table in compute_parse_data_nested()
  • apply the transformer rules in apply_transformers()

I can't remember correctly but I think they take up similar amount of time. Recently, there were two important speed improvements made (speed change in unseen example code benchmarked with the {here} package). #739 removed unnecessary sorting in merge() inside a recursion since we sort right afterwards anyways (-4%) and in #711, transformers that are never used are removed (-10%).

That said, I think your proposed approach sounds interesting. I can confirm that it's hard for me to understand where bottlenecks are. I already spend quite some time optimising R code, maybe not very systematically though. And I think the cost/benefit ratio is too high for me now. I think time is better spent on parallelisation at the moment. I think the latest release (1.4.0) introduced speed improvements of about 25%+ with the few changes described in the NEWS.md, but then there were edge case fixes and new features that slowed things down again in this release, leaving us with like 15% diff. Because minor changes were always so hard to measure, I built {touchstone}, so now every PR is benchmarked and we benchmark against the current release as well on merge in master and for all new code, I tried to write is as efficient as possible while keeping it maintainable and understandable. You can find the speed implications under the comment PR step in the benchmarking workflow:
screenshot-pr-comment

I am not familiar enough with {later} to comment on whether or not we should use it. I just think moving large parts of the code base into C++ would force me to learn C++, which I want to avoid xD.

@IndrajeetPatil
Copy link
Collaborator

@lorenzwalthert If this is still up for grabs, I would be happy to work on trying to improve performance using C++ (via {cpp11} package).

I just think moving large parts of the code base into C++ would force me to learn C++, which I want to avoid xD.

LOL, I have the opposite problem where I have learned a fair bit of C++, but haven't found a "real" project to contribute to.

I should mention though that I'd be slow to implement this because I can work on FOSS projects only during free time, and not during office hours.

Lemme know if that works for you, and I can get started by creating a fork. :)

@lorenzwalthert
Copy link
Collaborator

Thanks @IndrajeetPatil, I really appreciate your offer. Using C++ code in {styler} is quite a big decision that comes with a number of implications, so I think this needs to be well thought out. As @krlmlr said, we first need to identify the bottlenecks before starting implementing C++ functionality. Also, I would not be able to review your pull requests, we'd have to ask @krlmlr to do that. Would you have time to review these PRs @krlmlr?

That being said, would you want to start witht he profiling as Kirill suggested?

@IndrajeetPatil
Copy link
Collaborator

would you want to start witht he profiling as Kirill suggested?

On it!

Also, I would not be able to review your pull requests, we'd have to ask @krlmlr to do that

Yes, I would really appreciate if he could do that! :)

@krlmlr
Copy link
Member Author

krlmlr commented Mar 17, 2022

Thanks. I agree profiling should be the first step.

There are four steps:

  • Parsing (done by base R, should be fast enough)
  • Nesting of parse data (compute_parse_data_nested())
  • Walking the parse data (apply_transformers() and the transformers)
  • Composition of the styled text

Optimizing the third part will be hardest. Do we have a good understanding which parts take how long for a typical file?

Of course caching should be off in all experiments.

@lorenzwalthert
Copy link
Collaborator

I agree @krlmlr and I added the names of the functions that are relevant in your post.

In addition, I want to highlight that styling rules that we know are not affecting the code are removed before all transformers are applied. Which tokens affect which rules is specified in transformers_drop argument in create_style_guide(). See #711 and the doc of create_style_guide().

@IndrajeetPatil
Copy link
Collaborator

Thank you both for pointers on what might be the best places to start digging into this.

Will keep you posted as I go through them.

@krlmlr
Copy link
Member Author

krlmlr commented Aug 21, 2022

I took another look at styling performance.

With the two new PRs combined, and when replacing %>% with |>, we now finally get usable stack traces when profiling. With proffer::pprof(filter.callframes = TRUE, line.profiling = TRUE), we get a helpful call graph. With profvis() we would need to post-process the stack traces to remove recursive calls, should be easy to do with {profile}.

For a small-ish file from this repo, R/nest.R, the overhead (nesting and composition of the styled text) takes the lion's share of the time. Composition looks like an easy target, optimizing the nesting will be a bit trickier.

For easier profiling, it seems useful to specifically isolate the time spent styling from the overhead, by temporarily injecting calls to Rprof(...) and Rprof(NULL) in the appropriate places.

By and large, a whole lot of time seems to be spent in $<-.tbl_df() . Could we get noticeable results by consistently using unclass(pd_flat) and new_tibble(pd_flat)?

@krlmlr
Copy link
Member Author

krlmlr commented Aug 21, 2022

Current state of affairs: https://rpubs.com/krlmlr/styler-prof-1 .

This requires a cleanup of Rprof.out and a tweak to visit_one(), will share soon.

Surprisingly, I didn't have to convert pipes (magrittr -> native) to get this.

@krlmlr
Copy link
Member Author

krlmlr commented Aug 21, 2022

With #980 and #982 we require only a cleanup of Rprof.out to get https://rpubs.com/krlmlr/styler-prof-2:

text <- readLines("~/git/R/styler/R/nest.R")

styler::cache_deactivate()
#> Deactivated cache.

Rprof(filter.callframes = TRUE, line.profiling = FALSE)
for (i in 1:10) styler::style_text(text)
#> Warning: Invalid stylerignore sequences found, potentially ignoring some of the markers set.
#> See `help("stylerignore", "styler")`.
Rprof(NULL)

prof <- profile::read_rprof("Rprof.out")
prof$samples$locations <- lapply(prof$samples$locations, vctrs::vec_unique)
profile::write_rprof(prof, "Rprof-unique.out")

profvis::profvis(prof_input = "Rprof-unique.out")

Created on 2022-08-21 by the reprex package (v2.0.1)

@krlmlr
Copy link
Member Author

krlmlr commented Aug 21, 2022

The profiling with %>% replaced by |> might be a tad cleaner, I'm not sure:

https://rpubs.com/krlmlr/styler-prof-3

After merging all relevant PRs, I believe a speed improvement by a factor of 2 is achievable. Perhaps more, perhaps a bit less.

@IndrajeetPatil
Copy link
Collaborator

By and large, a whole lot of time seems to be spent in $<-.tbl_df() . Could we get noticeable results by consistently using unclass(pd_flat) and new_tibble(pd_flat)?

I think this approach is definitely worth a try:

library(tibble)

df1 <- iris
df2 <- df3 <- as_tibble(iris)

bench::mark(
  "data.frame" = df1$x <- "x",
  "tibble" = df2$x <- "x",
  "unclass tibble" = {
    df3 <- unclass(df3)
    df3$x <- "x"
    new_tibble(df3)
  },
  check = FALSE
)[1:5]
#> # A tibble: 3 × 5
#>   expression          min   median `itr/sec` mem_alloc
#>   <bch:expr>     <bch:tm> <bch:tm>     <dbl> <bch:byt>
#> 1 data.frame       2.79µs   3.28µs   284932.    1.22KB
#> 2 tibble          19.35µs  20.54µs    47633.  165.32KB
#> 3 unclass tibble   4.55µs   5.04µs   193397.        0B

Created on 2022-09-19 with reprex v2.0.2

@lorenzwalthert
Copy link
Collaborator

lorenzwalthert commented Sep 20, 2022

Ok that is indeed a big difference. Can’t we fix that at the root and make tibble assignment faster?

@lorenzwalthert
Copy link
Collaborator

@krlmlr ?

@IndrajeetPatil
Copy link
Collaborator

Another change I am going to make is using if() + else() instead of ifelse() when the test condition is of length 1.

mark = 25

bench::mark(
  "if + else" = if(mark >= 40) {
    "pass" 
  } else {
    "fail"
  },
  "ifelse" = ifelse(mark >= 40, "pass", "fail"),
  "dplyr-ifelse" = dplyr::if_else(mark >= 40, "pass", "fail"),
  time_unit = "ns"
)
#> # A tibble: 3 × 6
#>   expression     min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>   <dbl>  <dbl>     <dbl> <bch:byt>    <dbl>
#> 1 if + else       0    41.0 19230132.        0B      0  
#> 2 ifelse        533.  615.   1501476.        0B      0  
#> 3 dplyr-ifelse 8405. 9102.    107454.     8.7MB     86.0

Created on 2022-09-20 with reprex v2.0.2

@krlmlr
Copy link
Member Author

krlmlr commented Sep 20, 2022

I'd only go for ifelse() if benchmarks suggest that this is useful.

We want tibble subsetting to be fast, but it's a larger project. We can:

  • fix in styler now, while making it easy to revert
  • fix tibble
  • revert

@krlmlr
Copy link
Member Author

krlmlr commented Sep 24, 2022

A quick heads-up: I tested with tidyverse/tibble#1469, which reverts to regular data.frame subsetting.

  • All styler tests still pass
  • For styler's nest.R, style_text(scope = "none") drops from 1.03s to 812ms, and style_text() drops from 1.6s to 1.1s
  • Profiling shows that a lot of time is still spent in e.g. $<-.data.frame or similar

Also, there are major obstacles to making tibble subsetting as fast as data frame subsetting for the $ operator: it is a primitive and enjoys special treatment in base R, but we need to implement a method in tibble to get the "unknown or uninitialized column" warning. We can fix this by overriding $, see tidyverse/tibble#1470 for a proof of concept.

Suggested course of action:

  • Replace tibbles with data frames or "tbl" objects in our nested data
  • Profile and rework style_text(scope = "none"), which will then become the major bottleneck
  • Use list subsetting/subassignment instead of d.f. subset/subassign in select cases, depending on profiling results

🐌 -> 🏎️

@MichaelChirico
Copy link
Contributor

if you're doing a lot of in-place overwrites / it's a bottleneck, maybe worth considering data.table as a backend. Happy to advise/author some work towards that.

@krlmlr
Copy link
Member Author

krlmlr commented Sep 25, 2022

A move to data.table seems like a paradigm shift -- from immutable data to one overarching mutable "parse data" structure. To me, this looks like a major refactoring, touching all styling functions, with uncertain benefits. What's the estimated effort for moving to data.table, compared to using data.frames and doing a more efficient nesting + unnesting?

@MichaelChirico
Copy link
Contributor

looks like a major refactoring

That's likely true :)

And ah, I didn't read the above closely enough. I thought tidyverse/tibble#1469 was a working branch shifting to data.frames on styler already. I think it'll be easier to go from data.frames-as-backend to data.tables-as-backend.

@sebffischer
Copy link

FWIW: I am working a lot with R6 classes and it seems that these are especially problematic for the speed of styler, as there are so many nested functions / objects I assume.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants