-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Develop a function to create ISO8601 date #10
Comments
Please have a look at |
Quick evaluation of library(tidyverse)
library(parsedate)
#>
#> Attaching package: 'parsedate'
#> The following object is masked from 'package:readr':
#>
#> parse_date
value <-
c(
"12NOV2020",
"12 NOV 2020",
"12 NOV 202015:15",
"03 NOV 20203:30",
"UN DEC 201914:00",
"3 NOV 2020",
"NOV 2020",
"MAR 2019",
"UN APR 2018",
"UN UNK 2015",
"APR 2018",
"UK APR 2018",
"UN UNK 0",
"UN UN 2015",
"20190212",
"2019020202:20",
"2019-04-041045",
"20200505null"
)
result <-
c(
"2020-11-12",
"2020-11-12",
"2020-11-12T15:15",
"2020-11-03T03:30",
"2019-12--T14:00",
"2020-11-03",
"2020-11",
"2019-03",
"2018-04",
"2015",
"2018-04",
"2018-04",
NA_character_,
"2015",
"2019-02-12",
"2019-02-02T02:20",
"2019-04-04T10:45",
"2020-05-05"
)
spec <- tibble(value, result)
spec |>
dplyr::mutate(
parse_date = parsedate::parse_date(value),
parsedate_chr = as.character(parse_date)
)
#> # A tibble: 18 × 4
#> value result parse_date parsedate_chr
#> <chr> <chr> <dttm> <chr>
#> 1 12NOV2020 2020-11-12 2020-01-12 00:00:00 2020-01-12
#> 2 12 NOV 2020 2020-11-12 2020-11-12 00:00:00 2020-11-12
#> 3 12 NOV 202015:15 2020-11-12T15:15 2015-11-12 00:00:00 2015-11-12
#> 4 03 NOV 20203:30 2020-11-03T03:30 2030-11-03 00:00:00 2030-11-03
#> 5 UN DEC 201914:00 2019-12--T14:00 2023-12-01 00:00:00 2023-12-01
#> 6 3 NOV 2020 2020-11-03 2020-11-03 00:00:00 2020-11-03
#> 7 NOV 2020 2020-11 2020-11-01 00:00:00 2020-11-01
#> 8 MAR 2019 2019-03 2019-03-01 00:00:00 2019-03-01
#> 9 UN APR 2018 2018-04 2018-04-01 00:00:00 2018-04-01
#> 10 UN UNK 2015 2015 2015-01-01 00:00:00 2015-01-01
#> 11 APR 2018 2018-04 2018-04-01 00:00:00 2018-04-01
#> 12 UK APR 2018 2018-04 2018-04-01 00:00:00 2018-04-01
#> 13 UN UNK 0 <NA> 2023-10-10 01:45:05 2023-10-10 01:45:05
#> 14 UN UN 2015 2015 2015-01-01 00:00:00 2015-01-01
#> 15 20190212 2019-02-12 2019-02-12 00:00:00 2019-02-12
#> 16 2019020202:20 2019-02-02T02:20 2033-12-24 06:56:42 2033-12-24 06:56:42
#> 17 2019-04-041045 2019-04-04T10:45 2019-01-04 00:00:00 2019-01-04
#> 18 20200505null 2020-05-05 2020-05-05 00:00:00 2020-05-05 Created on 2023-10-10 with reprex v2.0.2 |
Great stuff. Should we consider minimising dependencies? This is a simple feature that we can develop to suit our needs. @edgar-manukyan @galachad - Please chime in. |
In the meeting, It was decided to develop this functionality in the sdtm.oak package. Please self assign and start the development, |
Hi @rammprasad and others: Can we agree on an interface for the function?
|
|
@rammprasad, please correct me if I am wrong:
|
Let me see if I understood:
@rammprasad: In your example: Related to tidyverse/lubridate#1142, r-lib/clock#361. |
Packages that we might learn from: SDTMIG v3.4 on dates: SDTMIG v3.4-DTC.pdf. |
@ramiromagno - Sounds good to me. Shall we start putting this in the code and then make improvements? Feel free to take it as you have done quite a bit of research. |
Hi @rammprasad -- Sorry for answering only now. Yes, I can take this. |
Example 1visit_date vector - c("01NOV2015", "02DEC2015", "5-DEC-2023", "25JAN2015")
calculate_iso_date_time(var_name = visit_date,format=ddmmmyyyy)
Function output - c("2015-11-01","2015-12-02","2023-12-05","2015-01-25") Example 2collection_date <- c("01NOV2015", "02DEC2015", "5-DEC-2023", "JAN2015")
collection_time<- c("12:15","10:11","01:15",NA)
calculate_iso_date_time(var_name1 = collection_date, format1 = ddmmmyyyy, var_name2 = collection_time, format 2 = hhmm)
Function output - c("2015-11-01T12:15","2015-12-02T10:11","2023-12-05T01:15","2015-01") Example 3collection_day <- c("01", "02", "5", NA)
collection_mon <- c("NOV", "DEC", "DEC", "JAN")
collection_year <- c("2015", "2015", "2023", "2015")
collection_hour <- c("12","10","01",NA)
collection_min<- c("15","11","15",NA)
calculate_iso_date_time(
var_name1 = collection_day, format1 = dd,
var_name2 = collection_month, format2 = mon,
var_name3 = collection_year, format3 = yyyy,
var_name4 = collection_hour, format4 = hh,
var_name5 = collection_min, format5 = mm)
Function output - c("2015-11-01T12:15","2015-12-02T10:11","2023-12-05T01:15","2015-01") |
Although I understand your code was meant as pseudo-code, I think it helps to get closer to actual R code to clarify the spec. Assuming this is the code you meant: visit_date <- c("01NOV2015", "02DEC2015", "5-DEC-2023", "25JAN2015")
calculate_iso_date_time(visit_date, format = "ddmmmyyyy") Then the output would be: [1] "2015-11-01" "2015-12-02" "2023-12-05" "2015-01-25" What should happen if some of the details are missing? visit_date <- c("NOV2015", "02-2015", "5-DEC", "25JAN2015")
calculate_iso_date_time(visit_date, format = "ddmmmyyyy") Possibilities: # If some fields are missing the result is NA
[1] NA NA NA "2015-01-25" # Parse what can be parsed and return iso8601 in reduced format
[1] "2015-11--" "2015----02" "--12-05" "2015-01-25"
# Or perhaps this? Note that "-----02" could happen if on "02-2015", 02 was interpreted as day, then "2015"
# would be attempted at being parsed as a month, failing, and the year would be considered missing resulting
# in "-----02"...
[1] "2015-11--" "-----02" "--12-05" "2015-01-25" Should multiple formats be supported in one call? visit_date <- c("01NOV2015", "02DEC2015", "2019-02-03", "20180910")
calculate_iso_date_time(visit_date, format = c("ddmmmyyyy", "ddmmmyyyy", "yyyymmdd", "yyyymmdd")) Returning: [1] "2015-11-01" "2015-12-02" "2019-02-03" "2018-09-10" When you say that the function accepts any number of vectors as an input, could you provide an example? I am asking this because I only see this function taking one input vector of dates/date-times, one vector of formats, and perhaps some other scalar arguments that modify the behavior of the function (e.g., |
Hi @ramiromagno , This cannot happen as all dates are not in the format provided. The date can have missing components though. visit_date <- c("NOV2015", "02-2015", "5-DEC", "25JAN2015")
calculate_iso_date_time(visit_date, format = "ddmmmyyyy") The expected output for this will be c("2015-11",NA,NA,"2015-01-25") The reason "02-2015" is NA because it is not in the expected format. I will add more examples. |
Also, this is impossible as EDC collects data in a way one variable will have a defined format. One vector cannot have multiple formats. visit_date <- c("01NOV2015", "02DEC2015", "2019-02-03", "20180910")
calculate_iso_date_time(visit_date, format = c("ddmmmyyyy", "ddmmmyyyy", "yyyymmdd", "yyyymmdd")) |
@ramiromagno - I have added three examples in my earlier comment. Please review and let me know in case of any questions. |
Thanks, @rammprasad, that helps a lot. Just one more question: assume that the format indicates the month in numeric format and that the year can be in 2-digit format, so |
If the format is |
Should we keep mm as minutes, and mon can handle numeric and character months? |
One of our rules is, if the year is unknown, the month and day cannot be known. So in this case, it will always be May, 2003 |
Can we generalize a bit more, and say that if a date-time component is missing but its precision is higher than the others, then it is fine, otherwise, a less precise component always jeopardizes higher precision components:
And can the same reasoning be applied to times? If seconds are missing, hour and minutes can still be collected but if hour is missing then we drop the minutes and the seconds...? |
I am good with the approach mentioned in the above comment |
Hi @rammprasad, What qualifies as a valid year? In the first post you had this example with |
What values are admissible for representing missingness. In your first post you had:
Are there other alternative values that may be used to represent missing date-time components? What are the missing values' representations for year and time components? |
To answer both the questions,
|
Hi!, Here are few examples, although this is still work in progress. NB: the interface is not yet the one intended but the function
These format conventions can be easily changed in my implementation, if you prefer, though this way is more R idiomatic. I think I've covered many of the cases but most likely not all. As you can see that number of characters in the date-time components is irrelevant, i.e. Currently, the function does mostly syntatic validation according to the specification indicated above, although there will be cases where we might need semantic validation, e.g. when the year is a two digit number, some interpretation will be involved in translating that to a four-digit year. Currently, I am using the same rule as lubridate which is to map years less or equal to 68 to 2000's and above that to 1900's. This will be made a parameter to control this behavior. Also, if a time component is above 60 should we do something about it? The same goes for month days and numeric months. Currently only syntax is validated. I am not using any of these other date/time related packages. So I might be reinventing the wheel here. But all implementations I looked at always converted missing components to 0, and did not offer a way of handling funny missing values... Please provide feedback, and if I'm in heading in the right direction then I will write the wrapping function according to the spec, write unit tests and documentation. library(sdtm.oak)
format_iso8601("15:10", "HH:MM")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 -----T15:10 15:10 - - - 15 10 -
format_iso8601("2:10", "HH:MM")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 -----T02:10 2:10 - - - 02 10 -
format_iso8601("2:1", "HH:MM")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 <NA> <NA> - - - - - -
format_iso8601("2:01", "HH:MM")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 -----T02:01 2:01 - - - 02 01 -
format_iso8601("02:01", "HH:MM")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 -----T02:01 02:01 - - - 02 01 -
format_iso8601("02:01:56", "HH:MM:SS")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 -----T02:01:56 02:01:56 - - - 02 01 56
format_iso8601("02:01:56.5", "HH:MM:SS")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 -----T02:01:56.5 02:01:56.5 - - - 02 01 56.5
format_iso8601("1510", "HHMM")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 -----T15:10 1510 - - - 15 10 -
format_iso8601("210", "HHMM")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 -----T02:10 210 - - - 02 10 -
format_iso8601("21", "HHMM")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 <NA> <NA> - - - - - -
format_iso8601("201", "HHMM")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 -----T02:01 201 - - - 02 01 -
format_iso8601("020156.5", "HHMMSS")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 -----T02:01:56.5 020156.5 - - - 02 01 56.5
format_iso8601("12 NOV 202015:15", "dd mmm yyyyHH:MM")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2020-11-12T15:15 12 NOV 202015:15 2020 11 12 15 15 -
format_iso8601("03 NOV 20203:30", "dd mmm yyyyHH:MM")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2020-11-03T03:30 03 NOV 20203:30 2020 11 03 03 30 -
format_iso8601("02 DEC 201914:00", "dd mmm yyyyHH:MM")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2019-12-02T14:00 02 DEC 201914:00 2019 12 02 14 00 -
format_iso8601("U DEC 201914:00", "dd mmm yyyyHH:MM")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 <NA> <NA> - - - - - -
format_iso8601("U DEC 201914:00", "dd mmm yyyyHH:MM", na = "U")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2019-12--T14:00 U DEC 201914:00 2019 12 - 14 00 -
format_iso8601("UN DEC 201914:00", "dd mmm yyyyHH:MM", na = "UN")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2019-12--T14:00 UN DEC 201914:00 2019 12 - 14 00 -
format_iso8601("UNK DEC 201914:00", "dd mmm yyyyHH:MM", na = "UNK")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2019-12--T14:00 UNK DEC 201914:00 2019 12 - 14 00 -
format_iso8601("3 NOV 2020", "dd mmm yyyy")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2020-11-03 3 NOV 2020 2020 11 03 - - -
format_iso8601("NOV 2020", "mmm yyyy")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2020-11 NOV 2020 2020 11 - - - -
format_iso8601("MAR 2019", "mmm yyyy")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2019-03 MAR 2019 2019 03 - - - -
format_iso8601("MaR 2019", "mmm yyyy")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2019-03 MaR 2019 2019 03 - - - -
format_iso8601("mar 2019", "mmm yyyy")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2019-03 mar 2019 2019 03 - - - -
format_iso8601("UN APR 2018", "dd mmm yyyy")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 <NA> <NA> - - - - - -
format_iso8601("UN APR 2018", "dd mmm yyyy", na = "UN")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2018-04 UN APR 2018 2018 04 - - - -
format_iso8601("UN UNK 2015", "dd mmm yyyy")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 <NA> <NA> - - - - - -
format_iso8601("UN UNK 2015", "dd mmm yyyy", na = "UN")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 <NA> <NA> - - - - - -
format_iso8601("UN UNK 2015", "dd mmm yyyy", na = "UNK")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 <NA> <NA> - - - - - -
format_iso8601("UN UNK 2015", "dd mmm yyyy", na = c("UN", "UNK"))
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2015 UN UNK 2015 2015 - - - - -
format_iso8601("APR 2018", "mmm yyyy")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2018-04 APR 2018 2018 04 - - - -
format_iso8601("UK APR 2018", "dd mmm yyyy", na = "UK")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2018-04 UK APR 2018 2018 04 - - - -
format_iso8601("UN UNK 0", "dd mmm yyyy", na = c("UN", "UNK"))
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 <NA> <NA> - - - - - -
format_iso8601("UN UN 2015", "dd mmm yyyy", na = c("UN", "UNK"))
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2015 UN UN 2015 2015 - - - - -
format_iso8601("20190212", "yyyymmdd")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2019-02-12 20190212 2019 02 12 - - -
format_iso8601("2019020202:20", "yyyymmddHH:MM")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2019-02-02T02:20 2019020202:20 2019 02 02 02 20 -
format_iso8601("2019-04-041045", "yyyy-mm-ddHHMM")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2019-04-04T10:45 2019-04-041045 2019 04 04 10 45 -
format_iso8601("2019-04-041045-", "yyyy-mm-ddHHMM-")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2019-04-04T10:45 2019-04-041045- 2019 04 04 10 45 -
format_iso8601("2019-04-041045-", "y-m-dHM-")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2019-04-04T10:45 2019-04-041045- 2019 04 04 10 45 -
format_iso8601("20200507null", "yyyymmddHHMM")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 <NA> <NA> - - - - - -
format_iso8601("20200507null", "yyyymmdd")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 <NA> <NA> - - - - - -
format_iso8601("20200507null", "yyyymmddnull")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2020-05-07 20200507null 2020 05 07 - - -
format_iso8601(c("20200507null", "2019120602:20:13"), "ymd((HH:MM:SS)|null)")
#> # A tibble: 2 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2020-05-07 20200507null 2020 05 07 - - -
#> 2 2019-12-06T02:20:13 2019120602:20:13 2019 12 06 02 20 13
format_iso8601("2019120602:20:13", "ymdH:M:S")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2019-12-06T02:20:13 2019120602:20:13 2019 12 06 02 20 13
format_iso8601("2019120602:20:13", "yyyymmddHH:MM:SS")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2019-12-06T02:20:13 2019120602:20:13 2019 12 06 02 20 13
format_iso8601("2019120602:20:13.", "ymdH:M:S")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2019-12-06T02:20:13 2019120602:20:13. 2019 12 06 02 20 13
format_iso8601("2019120602:20:13.0", "ymdH:M:S")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2019-12-06T02:20:13 2019120602:20:13.0 2019 12 06 02 20 13
format_iso8601("2019120602:20:13.1", "ymdH:M:S")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2019-12-06T02:20:13.1 2019120602:20:13.1 2019 12 06 02 20 13.1
format_iso8601("2019120602:20:13.123", "ymdH:M:S")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2019-12-06T02:20:13.123 2019120602:20:13.… 2019 12 06 02 20 13.1…
format_iso8601("2019120602:20:13.123000", "ymdH:M:S")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2019-12-06T02:20:13.123 2019120602:20:13.… 2019 12 06 02 20 13.1…
format_iso8601("2019120602:20:13.1230001", "ymdH:M:S")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2019-12-06T02:20:13.1230001 2019120602:20… 2019 12 06 02 20 13.1…
format_iso8601("2019-120602:20:13.1230001", "y-mdH:M:S")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2019-12-06T02:20:13.1230001 2019-120602:2… 2019 12 06 02 20 13.1…
format_iso8601("19-120602:20:13.1230001", "y-mdH:M:S")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2019-12-06T02:20:13.1230001 19-120602:20:… 2019 12 06 02 20 13.1…
format_iso8601("19120602:20:13.1230001", "ymdH:M:S")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2019-12-06T02:20:13.1230001 19120602:20:1… 2019 12 06 02 20 13.1…
format_iso8601("80120602:20:13.1230001", "ymdH:M:S")
#> # A tibble: 1 × 8
#> iso8601 dttm year mon mday hour min sec
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1980-12-06T02:20:13.1230001 80120602:20:1… 1980 12 06 02 20 13.1… Created on 2023-10-30 with reprex v2.0.2 |
Here is the input from Pfizer (Venkata Maguluri)
|
Hi Venkata, Thank you for your feedback. I am not sure I understand your suggestions.
Assuming you mean that the feature requested function should automagically detect the format and parse accordingly, then this request is changing the specification indicated by @rammprasad and is considerably more difficult to implement as it will probably require attempting different formats, and in a specific order... it really is a feature request with a more demanding implementation. Perhaps we can discuss this in Slack.
Again, I'm not sure I understand your point/suggestion... but it seems to be hinting at a different spec. Again, Slack might be more appropriate to discuss this. |
|
Most of the time date and time come in two separate variables. |
@ramiromagno - the examples in your above comment look good to me. Few questions to consider
|
Hi @rammprasad: Thank you for your feedback. Before addressing your points, let me discuss a prior note about the user-facing function name: I think we can perhaps use a different name than "calculate". I think a better word might a verb that most closely matches what is actually being performed by such a function. We are essentially parsing and then assembling/creating/making the iso8601 formatted date-time string. So something as simple as
|
Ok, that sounds good. TY, @ramiromagno . Let's proceed with the implementation and creating test cases. |
Hi @rammprasad, In line with our adherence to ODM standards, as outlined here (https://wiki.cdisc.org/display/ODM2/Data+Formats), should we integrate a validation mechanism to confirm that the input data complies with these standards, including the handling of time zone information? Given that ODM accommodates time zone data, I'm keen to understand our rationale for not addressing time zone considerations within our function. Insights into this decision would greatly assist in ensuring our oak.SDTM package's features are consistent with standard practices and that our documentation clearly communicates this aspect to users. More broadly, I’m aiming to clarify how strictly our package should mirror ODM specifications, particularly features like time zone support. This understanding will guide us in dealing with similar considerations moving forward. I look forward to your thoughts on this matter. |
Hi @kamilsi, Our original plan is to use the ODM extracts from EDC systems. Later, we figured there was no easier way to extract the data in the ODM format from the EDC system due to the IT limitations. So, we decided to pivot from that option. Slides and other materials still need to be updated. EDC systems do not provide the timezone. So we are good, not considering the timezone component. |
Note to self: use triple colon |
* clean up dummy test * add `dtc_formats` data set * update .Rbuildignore * add tibble support for automatic pretty printing of tibbles * add `create_iso8601()` (closes #10) * clean up `lintr::lint_package()` issues * Automatic renv profile update. * Automatic renv profile update. * Fix typos in R/dtc_utils.R Co-authored-by: edgar-manukyan <[email protected]> * remove `dummy()` function * remove `.onLoad()` function This function was likely added as part of an automatic setup of the R package as a whole but I guess we should add the `.onLoad()` if really needed. * Remove the `is_dtc_fmt()` function Initially I thought of calling this function from within `assert_dtc_fmt()` but I think now that the current usage of `rlang::arg_match()` leads to more concise code, so this is preferred. * Import `.data` from rlang globally Import `.data` from rlang globally by using the R package level documentation (https://roxygen2.r-lib.org/articles/rd-other.html?q=_PACKAGE#packages). * Update WORDLIST * Update `assert_capture_matrix()` and `complete_capture_matrix()` docs * Add `coalesce_capture_matrices()` doc * Fix typo in `assert_dtc_fmt()` doc * Add `regex_or()` doc * Add `fmt_rg()` doc * Add `fmt_c()` doc * Add `parse_dttm_fmt()` doc * Fix doc of `parse_dttm_fmt()` * Add `dttm_fmt_to_regex()` doc * Bump development version to 0.0.0.9001 * Style updates Style updates on R/dtc_create_iso8601.R, R/dtc_parse_dttm.R, R/dtc_utils.R. Mostly indentation corrections, wrapping single line body if-conditions in braces, white space removal. * Style update to tests/testthat/test-yy_to_yyyy.R * Style update Style updates on tests/testthat/test-iso8601.R and tests/testthat/test-reg_matches.R * Blank lines removal * Style update * Update docs after style update * Refactor code about parsing dttm formats Made `dttm_fmt_to_regex()` interface more intitutive by accepting directly the argument `fmt` instead of `tbl_fmt_c` which was an intermediate R object returned by `parse_dttm_fmt()`. Also, introduced unit tests for `parse_dttm_fmt_()`. * Make `parse_dttm_fmt()` handle the case of no matching format components * Use `fmt_dttmc()` in unit tests * Small clarification on unit test description * Remove futile assertion from `assert_dtc_fmt()` * Add staged_dependencies for admiraldev (#26) * Add staged_dependencies for admiraldev * Add new line * Fix admiraldev links. * Fix admiraldev articles links. * Remove R 4.1 a it causing dependencies issues. We want to use purrr >= 1.0.0 * Test latest lintr * Test lintr with install package locally * Add install pacakge variable for lintr * Skip multi version pkgdown workflow. * R build ignore staged_dependencies.yaml * Automatic renv profile update. * Automatic renv profile update. * Cleaned up lintr issues * Export `fmt_cmp()` and add early draft of `create_iso8601()` article * Update `create_iso8601()` article * Link `create_iso8601()` doc to article "iso_8601" * Add RM as author to DESCRIPTION * Fix author role of RM * Fix indentation at `fmt_cmp()` source * Remove `.check_format` from examples and add an example with `fmt_cmp()` * Add an example to `create_iso8601()` with involving alternative formats and unk values * Add example to `create_iso8601()` about the interplay of `.format` and `.fmt_c` * Update common.yml * Update style * Change "oak" to "sdtm.oak" in DESCRIPTION * Change "oak" to "sdtm.oak" in README --------- Co-authored-by: ramiromagno <[email protected]> Co-authored-by: edgar-manukyan <[email protected]> Co-authored-by: Adam Foryś <[email protected]>
Feature Idea
Purpose
Converts collected date and optional time values into the iso8601 format. https://en.wikipedia.org/wiki/ISO_8601
Functionality
The feature should parse collected pieces of date and time data and reformat them into a valid iso8601 format. Partial dates are acceptable and common. Invalid dates can be encountered and should be handled as described in this specification.
EDC-collected dates generally arrive in multiple formats. Some examples are
dd MON yyyy
,yyyymmdd
,yyyy-mm-dd
Times arrive in the
hh:mm
orh:mm
(leading 0 implied) orhh:mm:ss
orhhmm
orhhmmss
formatWe also need to handle data arriving as a single datetime field for example
dd MON yyy HH:MM:ss
Relevant Input
one or many vectors and format for each vector.
Relevant Output
A vector in ISO8601 date format.
Reproducible Example/Pseudo Code
calculate_iso_date_time(vector1,format1...)
example
calculate_iso_date_time(ae_start_date,dd MON YYYY)
The text was updated successfully, but these errors were encountered: