Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error handling - column headings #54

Closed
dcooley opened this issue Nov 4, 2020 · 5 comments
Closed

error handling - column headings #54

dcooley opened this issue Nov 4, 2020 · 5 comments
Labels
bug Something isn't working

Comments

@dcooley
Copy link

dcooley commented Nov 4, 2020

Working with Melbourne's transit feed is not always straight forward, and this morning I ran into this particular issue which tripped me up for a few hours, where the column heading has a non-Ascii symbol in it.

So my request is, is it worth adding in a 'clean column names' -type function when extracting the feed to try and handle this?

Here's the exmaple

library(data.table)
library(gtfsrouter)

url <- "http://data.ptv.vic.gov.au/downloads/gtfs.zip"

temp_dir <- tempdir()
download.file(url, destfile = paste0(temp_dir, "/gtfs.zip") )
list.files( temp_dir )

unzip( paste0(temp_dir, "/gtfs.zip"), exdir = paste0(temp_dir) )

## Melbourne GTFS is saved in folders numbered 1:11 for different modes
## <sigh>
list.files( temp_dir )
## 3 == tram
list.files(paste0(temp_dir, "/3"))
gtfs_tram <- extract_gtfs( paste0(temp_dir, "/3/google_transit.zip") )
# unlink(temp_dir)

I found this issue when running gtfsrouter:::join_service_id_to_stops(), as it gave me all NAs for trip_id

tram_stop_service <- gtfsrouter:::join_service_id_to_stops( gtfs_tram )
unique( tram_stop_service$services )
# [1] NA

So diving into the stop_times object I found the issue

dt_stoptimes <- copy( gtfs_tram$stop_times )
dput( dt_stoptimes[1, ] )

Copying this output back into my script highlighted the rogue character

Screen Shot 2020-11-05 at 10 05 04 am

This is fixed by the iconv() command.

setnames(
  gtfs_tram$stop_times
  , names( gtfs_tram$stop_times )
  , iconv( names( gtfs_tram$stop_times ), "latin1", "ASCII", sub = "" )
)

tram_stop_service <- gtfsrouter:::join_service_id_to_stops( gtfs_tram )
head(tram_stop_service )

# stop_id services
# 1:   19725    T0+a6
# 2:   19372    T0+a6
# 3:   19371    T0+a6
# 4:   19370    T0+a6
# 5:   19369    T0+a6
# 6:   18184    T0+a6

Is something like this worth adding to gtfsrouter?

@dcooley dcooley changed the title error handling - column headers error handling - column headings Nov 4, 2020
@mpadge
Copy link
Member

mpadge commented Nov 5, 2020

Yeah, absolutely worth adding, thanks @dcooley! Feel free to PR if you want, otherwise I'll get around to it within the next couple of weeks, I'd reckon. Melbourne is interesting because of all of the separate feeds from separate providers with no transfers (buses to trams, trams to trains, ...). I'd love somebody to take the new gtfs_transfer_table() function for a spin on that feed to see how it performs. Just in case that happens to entice you ... 😄

@dcooley
Copy link
Author

dcooley commented Nov 5, 2020

yeah I was trying gtfs_transfer_table() yesterday, but it wouldn't find any transfers. I dug into the code and kept finding this line

nbs[which(!nbs %in% service_stops)]

returned 0 results every iteration. I didn't go any further into it, but did read through the other issue to see if I could understand the logic a bit better. I'll see if I can find some more time on it this weekend.

@mpadge
Copy link
Member

mpadge commented Nov 6, 2020

Or just put a wee reprex here, and I'll dig in with you

@mpadge
Copy link
Member

mpadge commented Nov 14, 2020

Update: I can't reproduce your initial buggy behaviour. I don't get any rogue characters at all, and everything is read just fine. I'm sure bugs will arise in this regard, but they are also more likely to be in non-UTF or whatever encoding of names of agencies, routes, headsigns, and stops. So if something like your code were to be incorporated, I suspect it would need to be a comprehensive rectification of all potential encoding problems. A naive solution would be to iconv all column names and all "name" columns, so I guess that ability could be exposed through a helper function using latin1 as a default from encoding, but enabling any user-specified encoding. Whaddayareckon?

@mpadge
Copy link
Member

mpadge commented Jun 11, 2021

This should hopefully all be addressed via #84, but i'll keep the issue open to ensure everything is okay

@mpadge mpadge added the bug Something isn't working label Jun 11, 2021
@mpadge mpadge closed this as completed Aug 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants