Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large memory consumption(10GB RAM), option to reduce it? #17

Open
mr-tm opened this issue Apr 30, 2023 · 3 comments
Open

Large memory consumption(10GB RAM), option to reduce it? #17

mr-tm opened this issue Apr 30, 2023 · 3 comments

Comments

@mr-tm
Copy link

mr-tm commented Apr 30, 2023

First of all, thank you so much for making this tool, can't wait for v0.3!
That said, is it possible to add option to reduce memory consumption? (Eg. streaming data from csv files, caching data on file)
Doesn't matter if it takes longer to finish, currently larger GTFS can take up to 10GB of memory which can hit limits on cloud servers.

For example: https://transitfeeds.com/p/ov/814/latest/download

@patrickbr
Copy link
Owner

Unfortunately, no. I have been planning to add a mode to gtfsparser which stores all data on disk and only holds ID->disk references in memory for a while now, though.

@derhuerst
Copy link

I'm not very well-versed with the technique, but would an (optional) mode that uses memory mapping work?
This way, not all data would hava to be stored in memory at once, but the design (presumably) wouldn't have as much of an effect on gtfstidy's architecture as other solutions?

@mr-tm
Copy link
Author

mr-tm commented May 22, 2023

@patrickbr @derhuerst That would be great! Currently, increasing SWAP seems to work, but obviously not the best option. Memory mapping also could be worth looking into!

I noticed another thing with stop time minimization (-T). The trip ids are deleted(obviously that's from converting them to frequencies), but I tried using keep-trip-ids and couldnt find deleted trip-ids in output. I was thinking additional column to frequencies.txt would be great, (E.g. minimized_trip_ids) which would map all deleted trips to trip_id in frequencies. The use case could be saving all trip_ids to map them to gtfs-realtime data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants