Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

factor levels order is not retained #104

Open
jangorecki opened this issue Nov 3, 2019 · 7 comments · May be fixed by #106
Open

factor levels order is not retained #104

jangorecki opened this issue Nov 3, 2019 · 7 comments · May be fixed by #106

Comments

@jangorecki
Copy link

jangorecki commented Nov 3, 2019

Using factor is nice when you want to have ordered your strings, and is being used commonly in plotting libraries/functions. I don't see reason why pivot table should behave differently.
Below example shows that order of levels is ignored. This feature request is about ordering those rows/cols entries according to order of factor levels.

df = data.frame(name = factor(c("b","a","b","b","a"), levels=c("b","a")),
                grp = factor(c("x","x","y","y","y"), levels=c("y","x")),
                val = 1:5)
rpivotTable(df,
            rows = "name",
            cols = "grp",
            aggregatorName = "Average",
            vals = "val")

factor levels could automatically populate sorters argument.

@nicolaskruchten
Copy link
Collaborator

This would have to be passed to the JS layer via sorters

@rlavelli
Copy link

rlavelli commented Feb 19, 2020

Hello, I've tried the solution proposed in #106, but in my case it still won't give the correct result. See my example:

set.seed(123)
library(dplyr)
dat <- data.frame(
  x = rnorm(30)*10,
  y = rnorm(30)*10
) %>% 
  mutate(x_cut = cut(x,5),
         y_cut = cut(y,5))

# desired ordering
# x order: (-23.9,-15.2] (-15.2,-6.58] (-6.58,2.06] (2.06,10.7] (10.7,19.4]
# y order: (-19.9,-11.9] (-11.9,-3.97] (-3.97,3.96] (3.96,11.9] (11.9,19.9]

rpivotTable(dat) # wrong order, no sorter

# solution
make_sorters <- function(data) {
  if( !length(data) ) return(NULL)
  f <- sapply(data, is.factor)
  if( !sum(f) ) return(NULL)
  fcols <- names(data)[f]
  flvls <- sapply(fcols, function(fcol, data) levels(data[[fcol]]), data=data, simplify=FALSE)
  jslvls <- sapply(flvls, function(lvls) paste(paste0("\"",lvls,"\""), collapse=", "))
  sorter <- sprintf("if (attr == \"%s\") { return sortAs([%s]); }", fcols, jslvls)
  sprintf("function(attr) {\nvar sortAs = $.pivotUtilities.sortAs;\n%s\n}", paste(sorter, collapse="\n"))
}
s <- make_sorters(dat)

rpivotTable(dat, sorter = s) # wrong order, with sorter

I'm running:

packageVersion("rpivotTable")
[1] ‘0.3.0’

Thanks!

@jangorecki
Copy link
Author

jangorecki commented Feb 21, 2020

@rlavelli your desired output does not seems to be corresponding to your data. Are you on old R version having different random generator algo? Please include sessionInfo(). Also use of dplyr seems to be irrelevant here, best to strip out unrelated stuff to ensure it is not interferring the process.

set.seed(123)
dat <- data.frame(
  x = rnorm(30)*10,
  y = rnorm(30)*10
)
dat$x_cut = cut(dat$x,5)
dat$y_cut = cut(dat$y,5)

will do.

Also please provide output of levels(dat$x_cut) and levels(dat$y_cut).
Did you actually install factor-sorters branch? there is some logic change in the package made, it is not only a matter of passing sorter argument. Note that argument does not exist in the branch, there is sorters argument instead.

@rlavelli
Copy link

rlavelli commented Feb 21, 2020

Thank you for the reply, and sorry to bother.
No I didn't install the actual branch, I was under the impression that the new make_sorters function would suffice. I'll try to give an update about that.

For the sake of completeness, here's the info you asked (in a clean R session).

set.seed(123)
dat <- data.frame(
  x = rnorm(30)*10,
  y = rnorm(30)*10
)
dat$x_cut = cut(dat$x,5)
dat$y_cut = cut(dat$y,5)

levels(dat$x_cut)
# [1] "(-19.7,-12.2]" "(-12.2,-4.65]" "(-4.65,2.86]"  "(2.86,10.4]"   "(10.4,17.9]"  
levels(dat$y_cut)
# [1] "(-15.5,-8.05]"  "(-8.05,-0.617]" "(-0.617,6.82]"  "(6.82,14.3]"    "(14.3,21.7]"   

# sessionInfo()
# R version 3.6.1 (2019-07-05)

I was actually able to fix the sorting problem by adding an increasing number before each cut label in my actual case. Like: "1 (-19.7,-12.2]", "2 (-12.2,-4.65]", "3 (-4.65,2.86]" "4 ..". It's not pretty but it gives the correct result even without the use of sorter.

I'll try to install the full branch update and test it. Again, Thank you.

@jangorecki
Copy link
Author

Your workaround is basically avoiding the problem in the first place. As stated in this issue, alphabetical order is used instead of order of levels, thus adding a prefix number is disabling the issue.
Note that your initial report included x order: (-23.9,-15.2] ... which was probably generated with different random seed set. Updated one looks fine.

To easily install this branch you can use remotes or devtools package.

remotes::install_github("jangorecki/rpivotTable@factor-sorters")

Please report back if you still have a problem even when using this branch.

@jangorecki
Copy link
Author

@rlavelli any news if the branch address your case?

@rlavelli
Copy link

I'm sorry for the delay. I've tried the full branch and it works. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants