Replace tibbles with data frames to improve performance #1007

IndrajeetPatil · 2022-09-26T11:08:26Z

~~Need to use continuous benchmarking, so not converting this to a draft.~~

codecov-commenter · 2022-09-26T11:12:49Z

Codecov Report

Merging #1007 (fa98f9c) into main (1f4437b) will increase coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head fa98f9c differs from pull request most recent head 94e30f8. Consider uploading reports for the commit 94e30f8 to get more accurate results

@@           Coverage Diff           @@
##             main    #1007   +/-   ##
=======================================
  Coverage   91.14%   91.14%           
=======================================
  Files          46       46           
  Lines        2664     2665    +1     
=======================================
+ Hits         2428     2429    +1     
  Misses        236      236

Impacted Files	Coverage Δ
R/nested-to-tree.R	`92.85% <ø> (ø)`
R/style-guides.R	`99.43% <ø> (ø)`
R/stylerignore.R	`100.00% <ø> (ø)`
R/token-define.R	`66.66% <ø> (ø)`
R/ui-styling.R	`100.00% <ø> (ø)`
R/compat-dplyr.R	`92.85% <100.00%> (ø)`
R/compat-tidyr.R	`100.00% <100.00%> (ø)`
R/nest.R	`100.00% <100.00%> (ø)`
R/parse.R	`88.09% <100.00%> (ø)`
R/token-create.R	`96.92% <100.00%> (ø)`
... and 3 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

github-actions · 2022-09-26T11:42:16Z

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 7c691c9 is merged into main:

❗🐌cache_applying: 27.1ms -> 31.6ms [+15.61%, +16.95%]
:rocket:cache_recording: 1.26s -> 875ms [-31.19%, -30%]
:rocket:without_cache: 3.33s -> 2.2s [-34.28%, -33.43%]

Further explanation regarding interpretation and methodology can be found in the documentation.

IndrajeetPatil · 2022-09-26T11:47:25Z

@lorenzwalthert, @krlmlr That's quite the bump in performance when switching to data frames instead of tibbles as our data structure of choice! 😮

lorenzwalthert · 2022-09-26T16:04:18Z

Wow yes @IndrajeetPatil and without much code change even! Well done. Now only hurdle is to make it pass on old releases...

MichaelChirico · 2022-09-26T16:29:03Z

wow, quite impressive speed improvement per LoC change!!

krlmlr

Nice speedup! Can we encapsulate the choice of data structure in helper functions? We could add e.g. new_styler_df() and styler_df() that use vctrs::new_data_frame() and vctrs::data_frame() under the hood. This means that we could later change the underlying data structure with lesser effort.

Do we still need to import tibble?

github-actions · 2022-09-26T16:41:49Z

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 63a27d0 is merged into main:

❗🐌cache_applying: 37.9ms -> 41.6ms [+6.86%, +12.7%]
:rocket:cache_recording: 1.91s -> 1.28s [-34.03%, -31.47%]
:rocket:without_cache: 5.24s -> 3.33s [-37.36%, -35.55%]

Further explanation regarding interpretation and methodology can be found in the documentation.

IndrajeetPatil · 2022-09-27T05:44:04Z

Nice speedup! Can we encapsulate the choice of data structure in helper functions? We could add e.g. new_styler_df() and styler_df() that use vctrs::new_data_frame() and vctrs::data_frame() under the hood. This means that we could later change the underlying data structure with lesser effort.

Good idea. Done!

Do we still need to import tibble?

Only for tibble::tribble(), which we use in a few places. But, if we wish to get rid of {tibble} from imports, removing this function's usage should be easy to do. Should I do that?

krlmlr · 2022-09-27T05:56:23Z

Let's keep the tribble() call for now.

github-actions · 2022-09-27T05:58:09Z

This is how benchmark results would change (along with a 95% confidence interval in relative change) if a0d6c20 is merged into main:

❗🐌cache_applying: 26.8ms -> 30.8ms [+13.87%, +15.99%]
:rocket:cache_recording: 1.25s -> 824ms [-34.47%, -33.66%]
:rocket:without_cache: 3.32s -> 2.08s [-37.6%, -36.8%]

Further explanation regarding interpretation and methodology can be found in the documentation.

R/token-define.R

R/utils.R

github-actions · 2022-09-27T06:54:56Z

This is how benchmark results would change (along with a 95% confidence interval in relative change) if fa98f9c is merged into main:

❗🐌cache_applying: 33.8ms -> 38ms [+8.2%, +16.44%]
:rocket:cache_recording: 1.92s -> 1.19s [-41.8%, -34.25%]
:rocket:without_cache: 4.31s -> 2.58s [-40.78%, -39.48%]

Further explanation regarding interpretation and methodology can be found in the documentation.

lorenzwalthert · 2022-09-27T07:18:05Z

Also, it seems removing {tibble} does not reduce recursive dependencies:

library(magrittr)
deps <- desc::desc_get_deps() %>%
  dplyr::filter(type == 'Imports') %>%
  dplyr::pull(package)

recursive_deps_before <- purrr::map(deps, ~names(renv:::renv_package_dependencies(.x))) %>%
  unlist() %>%
  unique()


deps_without_tibble <- setdiff(deps, 'tibble')

recursive_deps_after <- purrr::map(deps_without_tibble, ~names(renv:::renv_package_dependencies(.x))) %>%
  unlist() %>%
  unique()


waldo::compare(recursive_deps_before, recursive_deps_after)
#> ✔ No differences

^{Created on 2022-09-27 by the reprex package (v2.0.1)}

This is because we use {rematch2} (in one place only, can be worked around probably some how), which in turn depends on {tibble}. That {tibble} dependency was suggested to be removed in r-lib/rematch2#14, where @krlmlr was not all in for the suggested implementation. With the additional development that happened over the last 2 years and more recursive dependencies added to tibble, I think it would be even more beneficial to remove that dependency.

lorenzwalthert

Great job.

IndrajeetPatil · 2022-09-27T07:45:37Z

I think this is a big enough improvement to consider creating a new CRAN release?

* Get rid of unnecessary `.name_repair` arg This is generating warnings. Follow-up on #1007 * make the wrapper even thinner

IndrajeetPatil · 2022-10-11T04:36:09Z

I think this is a big enough improvement to consider creating a new CRAN release?

Any thoughts, @lorenzwalthert and @krlmlr?

We also need to get rid of NOTEs in checks: https://cran.r-project.org/web/checks/check_results_styler.html
Let's not wait to get an email about this 😬

lorenzwalthert · 2022-10-11T07:25:26Z

Yes, I agree. Do you want to m make a PR to main similar to #930, plus using fledge? If not, I can do it, but not this week. Once all checks green, I can submit it.

lorenzwalthert · 2022-10-11T07:26:01Z

I already bumped the version recently and tried to organise the news items a bit.

IndrajeetPatil · 2022-10-12T16:01:52Z

If not, I can do it, but not this week. Once all checks green, I can submit it.

@lorenzwalthert I can wait! :)

IndrajeetPatil added 2 commits September 26, 2022 13:04

as_tibble -> as.data.frame

91b7086

new_tibble -> data.frame

916a421

github-actions bot and others added 2 commits September 26, 2022 11:14

pre-commit

922ce62

Update utils.R

63ad83b

Update ui-caching.R

17a66db

IndrajeetPatil requested a review from lorenzwalthert September 26, 2022 16:19

krlmlr reviewed Sep 26, 2022

View reviewed changes

krlmlr mentioned this pull request Sep 27, 2022

New C callables to support tibble r-lib/vctrs#1679

Open

10 tasks

encapsulate in wrappers around vctrs functions

e81acb6

IndrajeetPatil changed the title ~~Check for performance improvements with data.frame~~ Replace tibbles with data frames to improve performance Sep 27, 2022

IndrajeetPatil and others added 2 commits September 27, 2022 07:40

Add vctrs to DESCRIPTION

035de78

pre-commit

f0de7b6

IndrajeetPatil added 2 commits September 27, 2022 07:58

Update utils.R

1d40618

Update compat-dplyr.R

6313b71

IndrajeetPatil requested a review from krlmlr September 27, 2022 06:09

lorenzwalthert reviewed Sep 27, 2022

View reviewed changes

R/token-define.R Show resolved Hide resolved

lorenzwalthert reviewed Sep 27, 2022

View reviewed changes

R/utils.R Show resolved Hide resolved

IndrajeetPatil added 2 commits September 27, 2022 08:25

Update detect-alignment.Rmd

60ff313

Don't import entire tibble package

94e30f8

IndrajeetPatil requested a review from lorenzwalthert September 27, 2022 07:10

lorenzwalthert approved these changes Sep 27, 2022

View reviewed changes

IndrajeetPatil merged commit 1a8bab3 into r-lib:main Sep 27, 2022

IndrajeetPatil deleted the perf_dataframe branch September 27, 2022 07:23

IndrajeetPatil mentioned this pull request Sep 27, 2022

Simplify styler_df() signature #1009

Merged

krlmlr mentioned this pull request Sep 28, 2022

Remove rematch2 and tibble dependencies #1010

Closed

IndrajeetPatil added a commit that referenced this pull request Sep 28, 2022

Simplify styler_df() signature (#1009)

35519b9

* Get rid of unnecessary `.name_repair` arg This is generating warnings. Follow-up on #1007 * make the wrapper even thinner

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace tibbles with data frames to improve performance #1007

Replace tibbles with data frames to improve performance #1007

IndrajeetPatil commented Sep 26, 2022 •

edited

Loading

codecov-commenter commented Sep 26, 2022 •

edited

Loading

github-actions bot commented Sep 26, 2022

IndrajeetPatil commented Sep 26, 2022

lorenzwalthert commented Sep 26, 2022

MichaelChirico commented Sep 26, 2022

krlmlr left a comment

github-actions bot commented Sep 26, 2022

IndrajeetPatil commented Sep 27, 2022

krlmlr commented Sep 27, 2022 •

edited

Loading

github-actions bot commented Sep 27, 2022

github-actions bot commented Sep 27, 2022

lorenzwalthert commented Sep 27, 2022 •

edited

Loading

lorenzwalthert left a comment

IndrajeetPatil commented Sep 27, 2022

IndrajeetPatil commented Oct 11, 2022

lorenzwalthert commented Oct 11, 2022 •

edited

Loading

lorenzwalthert commented Oct 11, 2022

IndrajeetPatil commented Oct 12, 2022

Replace tibbles with data frames to improve performance #1007

Replace tibbles with data frames to improve performance #1007

Conversation

IndrajeetPatil commented Sep 26, 2022 • edited Loading

codecov-commenter commented Sep 26, 2022 • edited Loading

Codecov Report

github-actions bot commented Sep 26, 2022

IndrajeetPatil commented Sep 26, 2022

lorenzwalthert commented Sep 26, 2022

MichaelChirico commented Sep 26, 2022

krlmlr left a comment

Choose a reason for hiding this comment

github-actions bot commented Sep 26, 2022

IndrajeetPatil commented Sep 27, 2022

krlmlr commented Sep 27, 2022 • edited Loading

github-actions bot commented Sep 27, 2022

github-actions bot commented Sep 27, 2022

lorenzwalthert commented Sep 27, 2022 • edited Loading

lorenzwalthert left a comment

Choose a reason for hiding this comment

IndrajeetPatil commented Sep 27, 2022

IndrajeetPatil commented Oct 11, 2022

lorenzwalthert commented Oct 11, 2022 • edited Loading

lorenzwalthert commented Oct 11, 2022

IndrajeetPatil commented Oct 12, 2022

IndrajeetPatil commented Sep 26, 2022 •

edited

Loading

codecov-commenter commented Sep 26, 2022 •

edited

Loading

krlmlr commented Sep 27, 2022 •

edited

Loading

lorenzwalthert commented Sep 27, 2022 •

edited

Loading

lorenzwalthert commented Oct 11, 2022 •

edited

Loading