Replies: 1 comment
-
Data StructureI think we're all in agreement that the primary data for a swimmers plot should be long! Nice! The long data structure is common for making these types of segmented lines, and it's a perfect match for our data. The symbols and points we're adding to the plot, however, are not commonly kept in a long format, and I believe that is our only point of disagreement. The most common swimmers plot is one that is colored by treatment group, with symbols at the time of death, AEs, etc. In this case, the data set would be one line per subject and the code may end of looking something like this: data |>
ggswim(id = MRN, xend = time_last_followup, color = trt) +
add_marker("time_to_death", symbol = "\u1234") +
add_marker("time_to_ae", symbol = "\u9876") We've got the most common use case covered here, and it's super simple for users. In a somewhat more complex case, we may want to color the bars by on and off treatment (another very common plot). Typically, a user begins with a data set that is one line per subject, with a column for time to last followup and another for time to treatment stop. In that case, the code would look something like this: data_long <- <code that takes a one line per subject data set and make is one line per subject per treatment period>
# we now have a long data set and our original data set
data_long |>
ggswim(id = id, xstart = start_time, xend = end_time, color = on_or_off_treatment) + # we don't need to require xstart as I think it can be safely inferred from the data in most cases (e.g. first starting at time 0, second start time is the end time from the previous period, etc. But by adding this arg, it's general to use dates as well, or even relative times that a are negative...anything!
add_marker(data, "time_to_death", symbol = "\u1234") +
add_marker(data, "time_to_ae", symbol = "\u9876") This structure of course also allows for a long data data frame of markers you want to add to a figure (ie the multiple events case mentioned above). Figure centeringI think we should chat about this in more detail, because i don't think i see the rationale. Isn't this something that would be handled by a quick call to |
Beta Was this translation helpful? Give feedback.
-
Background
This discussion is intended to dive into the design choices and considerations we want to make so that a user is more likely to go "Swimmer plots? No problem!" intead of "Ugh, swimmer plots."
Data Structure
Currently
ggswim()
only works with the long "event stream" data structure shown in the README. This looks something like:Where a subject/record can have multiple rows and single "events" column comprises both lane colors and markers values.
A wider subject-per-row structure could also be considered, essentially by pivoting wider by
event
, in which case the values represent time points:This structure has several disadvantages, however.
NA
.So in my (SK's) opinion this means we should stick to the long event stream structure.
An important additional consideration is that in swimmer plots, just as in KM plots, "time" is relative to some other event such as an intervention. For example, above time is 0 on the day of CAR-T infusion. It would be useful to have a way to work with an event stream that has absolute times such as timestamps or dates:
A possible solution could be a
center_on()
function which takes an absolute event stream tibble and creates relative times like so:A possible function definition of
center_on()
could be:ggswim()
vsgeom_swim_*()
One of the key pieces to making this package successful is folding it into existing
ggplot2
syntax and tools. Currently aggswim
plot is made via:Where instead, we could consider making this its own
geom_*
function that builds off ofggplot()
:This has the benefits of being familiar to users, simple to use, and allows easy inclusion of additional ggplot elements regardless of future updates to surrounding packages.
One downside is this would like require a predicate function that shapes the data before passing it to
ggplot()
.Shapes, Colors, and Emojis
When using shapes, are colors necessary? Or is this needlessly extra busy? And if not, how do we best control for what single color should be used in response to lane colors?
Emojis, are they worth the struggle? They are attractive and make the package pop, but how likely are they to be used in an academic setting? Does the lack of support or the requirements needed by the user to get them to work detract from the package itself?
Beta Was this translation helpful? Give feedback.
All reactions