Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v.out.ogr: faster export with many attributes #4741

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

metzm
Copy link
Contributor

@metzm metzm commented Nov 22, 2024

This PR proposes a new version of v.out.ogr that is about 5x faster when exporting vector maps with many attributes. E.g. the time to export a copy of streets_wake from the NC dataset in a mapset that uses sqlite as db is reduced from 12.5 sec to 2.5 sec.

While v.out.ogr issues a sql select statement for each feature to be exported, the new version issues only a single sql select statement at the beginning to get attributes ordered by category value. Vector features are then traversed also by ordered category value and the corresponding attributes can be directly fetched from the result of the initial select statement.

This PR introduces a new alternative to v.out.ogr named v.out.ograttr, but instead another name could be used or v.out.ogr could be replaced.

@metzm metzm added enhancement New feature or request vector Related to vector data processing C Related code is in C labels Nov 22, 2024
@metzm metzm added this to the 8.5.0 milestone Nov 22, 2024
@metzm metzm requested a review from benducke November 22, 2024 17:03
@github-actions github-actions bot added Python Related code is in Python HTML Related code is in HTML module docs tests Related to Test Suite labels Nov 22, 2024
@metzm metzm changed the title DRAFT: v.out.ogr: faster export with many attributes v.out.ogr: faster export with many attributes Nov 22, 2024
@metzm metzm requested a review from nilason November 22, 2024 17:24
@metzm
Copy link
Contributor Author

metzm commented Nov 22, 2024

With dbf as db driver, the time to export streets_wake is reduced from 2m22.4s to 2.5 sec.

@petrasovaa
Copy link
Contributor

Perhaps obvious question, but why is this a separate tool from v.out.ogr, why not incorporate it it there?

@metzm
Copy link
Contributor Author

metzm commented Nov 22, 2024

Perhaps obvious question, but why is this a separate tool from v.out.ogr, why not incorporate it it there?

Because it might substantially increase RAM consumption if the complete attribute table sorted by category values is kept in RAM (depends on the db driver). Therefore I am undecided if this should be a new module or an improvement to v.out.ogr. Opinions welcome! You opt to update v.out.ogr right?

@petrasovaa
Copy link
Contributor

Perhaps obvious question, but why is this a separate tool from v.out.ogr, why not incorporate it it there?

Because it might substantially increase RAM consumption if the complete attribute table sorted by category values is kept in RAM (depends on the db driver). Therefore I am undecided if this should be a new module or an improvement to v.out.ogr. Opinions welcome! You opt to update v.out.ogr right?

Exporting many attributes seems like a very common use case, so making it faster (even with more RAM used) sounds good to me.

@ecodiv
Copy link
Contributor

ecodiv commented Nov 23, 2024

Perhaps obvious question, but why is this a separate tool from v.out.ogr, why not incorporate it it there?

Because it might substantially increase RAM consumption if the complete attribute table sorted by category values is kept in RAM (depends on the db driver). Therefore I am undecided if this should be a new module or an improvement to v.out.ogr. Opinions welcome! You opt to update v.out.ogr right?

Would it be possible to add this as an option to v.out.ogr, or make it the default and have the old method as option? That way, there remains an alternative if RAM becomes a limitation?

@metzm
Copy link
Contributor Author

metzm commented Nov 23, 2024

Thanks for the positive feedback! Regarding memory consumption, there is an increase of 0.5% (sqlite) and 1.5% (dbf) in RAM consumption when exporting streets_wake. This should in practice have no adverse effect.

I have included the new, faster method directly in v.out.ogr. The old, slower method can be used with the new -o flag. v.out.ogr has already quite a few flags, therefore I struggled to come up with a new flag that somehow makes sense and is not yet used. Suggestions welcome!

@echoix
Copy link
Member

echoix commented Nov 23, 2024

Since it would be used for special cases (if ever), and when the new way doesn't work, can't you use a full word flag instead of single letter?

@echoix
Copy link
Member

echoix commented Nov 23, 2024

After that (finishing adapting), the v.out.ograttr files won't be needed in this PR anymore right?

@metzm
Copy link
Contributor Author

metzm commented Nov 23, 2024

The -o flag has been replaced with a new option method, allowed answers are slow and fast, default is the new, fast method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C Related code is in C docs enhancement New feature or request HTML Related code is in HTML module Python Related code is in Python tests Related to Test Suite vector Related to vector data processing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants