-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
POC: Set focus to new and used columns after mutate() #6252
base: main
Are you sure you want to change the base?
Conversation
Hmmmm I like the way this looks, but adding the extra attribute is makes a lot of tests fail. I'm confident we can work around this for dplyr itself (see current hack in |
Crazy thought: what if we reversed responsibility here? If each Otherwise: can we set |
Concrete example of what I don't particularly love about focus columns. I'm explicitly requesting that the new column come before library(dplyr)
nycflights13::flights %>%
mutate(speed = air_time / distance, .before = time_hour)
#> # A tibble: 336,776 × 20
#> # Focus columns: air_time, distance, speed
#> year month day dep_time sched_…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ air_time distance speed
#> <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 2013 1 1 517 515 2 830 819 11 227 1400 0.162
#> 2 2013 1 1 533 529 4 850 830 20 227 1416 0.160
#> 3 2013 1 1 542 540 2 923 850 33 160 1089 0.147
#> 4 2013 1 1 544 545 -1 1004 1022 -18 183 1576 0.116
#> 5 2013 1 1 554 600 -6 812 837 -25 116 762 0.152
#> 6 2013 1 1 554 558 -4 740 728 12 150 719 0.209
#> 7 2013 1 1 555 600 -5 913 854 19 158 1065 0.148
#> 8 2013 1 1 557 600 -3 709 723 -14 53 229 0.231
#> 9 2013 1 1 557 600 -3 838 846 -8 140 944 0.148
#> 10 2013 1 1 558 600 -2 753 745 8 138 733 0.188
#> # … with 336,766 more rows, 8 more variables: carrier <chr>, flight <int>, tailnum <chr>,
#> # origin <chr>, dest <chr>, hour <dbl>, minute <dbl>, time_hour <dttm>, and abbreviated
#> # variable names ¹sched_dep_time, ²dep_delay, ³arr_time, ⁴sched_arr_time, ⁵arr_delay I also don't really love that it makes the column look like its in the 12th position, like I could access it with |
Also why is |
Thanks for your feedback. I need to look into the confused column order, something doesn't seem to be right. I have thought about adding a vertical separator if columns are omitted, perhaps
|
I'm not sure I follow, can you mock an example? |
The column order is correct. We have three focus column as a result of the operation: the two input columns, and the output column. The focus columns are shown in the header. Is this helpful at all? I have tweaked the reprex to show what the vertical separator could look like. Should we treat focus columns differently in non-interactive mode, e.g., ignore them for now? library(conflicted)
library(dplyr)
options(max.print = 20)
nycflights13::flights %>%
mutate(speed = air_time / distance, .before = time_hour) %>%
attributes()
#> $names
#> [1] "year" "month" "day" "dep_time"
#> [5] "sched_dep_time" "dep_delay" "arr_time" "sched_arr_time"
#> [9] "arr_delay" "carrier" "flight" "tailnum"
#> [13] "origin" "dest" "air_time" "distance"
#> [17] "hour" "minute" "speed" "time_hour"
#>
#> $row.names
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
#> [ reached getOption("max.print") -- omitted 336756 entries ]
#>
#> $class
#> [1] "tbl_df" "tbl" "data.frame"
#>
#> $pillar_focus
#> [1] "speed" "air_time" "distance"
nycflights13::flights %>%
mutate(speed = air_time / distance, .before = time_hour)
#> # A tibble: 336,776 × 20
#> # Focus columns: speed, air_time, distance
#> year month day dep_time sched_de…¹ dep_d…² arr_t…³|air_time distance speed
#> <int> <int> <int> <int> <int> <dbl> <int>| <dbl> <dbl> <dbl>
#> 1 2013 1 1 517 515 2 830 227 1400 0.162
#> 2 2013 1 1 533 529 4 850 227 1416 0.160
#> 3 2013 1 1 542 540 2 923 160 1089 0.147
#> 4 2013 1 1 544 545 -1 1004 183 1576 0.116
#> 5 2013 1 1 554 600 -6 812 116 762 0.152
#> 6 2013 1 1 554 558 -4 740 150 719 0.209
#> 7 2013 1 1 555 600 -5 913 158 1065 0.148
#> 8 2013 1 1 557 600 -3 709 53 229 0.231
#> 9 2013 1 1 557 600 -3 838 140 944 0.148
#> 10 2013 1 1 558 600 -2 753 138 733 0.188
#> # … with 336,766 more rows, 10 more variables: sched_arr_time <int>,
#> # arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>, origin <chr>,
#> # dest <chr>, hour <dbl>, minute <dbl>, time_hour <dttm>, and abbreviated
#> # variable names ¹sched_dep_time, ²dep_delay, ³arr_time Created on 2022-08-28 by the reprex package (v2.0.1) |
Needs r-lib/pillar#549.
In the example below,
speed
,air_time
anddistance
are shown because they played a role in themutate()
. Normally none of these columns are visible.The focus columns are highlighted with an underline, this formatting can't be shown in a reprex.
Created on 2022-04-30 by the reprex package (v2.0.1)