-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
column type validation #740
Comments
Believe me...I've wanted to do this for a long time. However, there are MANY edge cases where a column that is normally a date (or number or whatever) is just not. Sometimes, it is an error in the data, but in most cases - it is actually how the data is suppose to be released. For example, there might be a value "ice" in the results_va. If we only output numeric, we'll lose important information. Similarly with dates, there are many examples that might be 12/2004 or 2004. We cannot just add a fake middle date and we also don't want to lose that information. OK, so what to do? I usually use the argument "convertType=FALSE", this brings everything in as a character (which therefore allows bind_cols and other methods like that to work). Then, for WQP queries, you can run parse_WQP to set the columns types. The parse_WQP function is similar to what you've got above, it also makes the dateTime posixct objects.
|
thanks for the reference to Your example surprises me, I would expect that "ice" would go in that status flag field rather than the result value field. The incomplete date issue is tricky. |
different but related issue: how about consistent fields? An empty query can return a 0x0 tibble, but it would be better to return a 0xN tibble with all the field names. |
We did use to do that, but the issue was that (especially with WQP) there's been an explosion of "dataProfiles" - and with each dataProfile, there's a new set of column and column names. So while it was easy to set an empty data frame with the correct names but no rows when there was a single format returned, now it is a lot more challenging. However, I think in the new beta services (ResultsWQX), calls are coming back with exactly what you are asking for (from the service, which is much more ideal than having dataRetrieval create it). I could be wrong, but I think NWIS also comes back with the info to make an empty data.frame (we've avoided adding dplyr/tibble as a dependency). If not, they are going through a major modernization this year so something like that will probably happen. |
It would be great if retrieval functions such as
readWQPqw()
,readNWISdv()
, etc. could guarantee returning consistent column types for all fields. I have run across issues where some queries will return fields with different types, typically when the field contains all NA values. For example, I have seenreadWQPqw()
queries return the "ActivityEndDate" field as acharacter
instead of aDate
when all values are NA.Inconsistent field types can cause issues in split-apply-combine workflows, e.g. using
targets
to branch over multiple gages/regions/etc. and recombine datasets, or really any workflow that usesdplyr::bind_rows()
to merge results from multiple queries. I've written a small validation function in my own workflow to enforce the (presumably correct) field types forreadWQPqw()
(see below), but I think this is something that should be enforced indataRetrieval
.The text was updated successfully, but these errors were encountered: