-
-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
initial s3 support (draft) #774
Conversation
Quality Gate failedFailed conditions 30.6% Coverage on New Code (required ≥ 50%) See analysis details on SonarCloud Catch issues before they fail your Quality Gate with our IDE extension SonarLint |
https://github.com/onthegomap/planetiler/actions/runs/7440500393 ℹ️ Base Logs c480b35
ℹ️ This Branch Logs e7f9f1b
|
Nice! I wonder if S3 should be a destination that we vary independently from the archive format...? For example an archive in mbtiles, pmtiles or individual files format can all end up in S3, as a file on disk, or some other destination. A naive pipeline using that approach would write to disk first then upload to s3 second, but a specialized s3-files implementation can skip writing to disk, and pmtiles could upload chunks as its writing them so maybe they might need specialized implementations anyway. |
Also I see the amazon s3 dependency adds ~20% to the bundle size. My understanding is it introduces its own threading model as well. Is there any lower level dependency we can use instead that can just help constructing the upload API requests then we make them from a worker pipeline or virtual threads? |
I think there are multiple things to clarify supported formatsI think the only format that is required to be supported is "files" (uploading maps/{z}/{x}/{y}.pbf). Neither mbtiles nor pmtiles(1) can be "stream uploaded" since you cannot seek while uploading to s3. And adding support to upload after it was written to disk - at least to me - does not make much sense. Support for CSV, JSON, proto would be nice but I would say not really required. (1) the pmtiles header can only be written after everything else was written, but you cannot seek (position=0) while uploading to s3. There may be some hack/workaround by doing a multi-part upload and leave the first part out for the header. approaches2 jarsThe s3-sdk is quite heavy, I agree. I am wondering, however, if it would make sense here to build 2 jars - one with s3 support and one without. s3-nio-spiAnother approach would be to use an s3-nio-spi which adds support for s3://.... paths. And things could be handled relatively transparently. https://github.com/awslabs/aws-java-nio-spi-for-s3 The big advantage here is also that we would get read support for all (except mmbtiles) - not super efficient but still. Users would need to provide that jar and then run planetiler like this:
The changes are minimal, and I have it implemented here: https://github.com/bbilger/planetiler/tree/s3-nio-spi Note: it's current implementation writes to disk before uploading but for "files" it's not a big issue (have some experimental branch to bypass that). planetiler-addons - FIFOThere's already some pretty powerful mechanism to do "whatever" by using CSV/JSON/proto. planetiler-addons - SPIWe could also introduce some SPI-mechanism to planetiler for file formats. And provide a SPI for s3 (either in that or a separate repo). native no SDKImplementing an s3 upload w/o the SDK directly in planetiler is an option as well, but it's quite some work (request-signing, ...). |
not sure why I forgot/overlooked that but https://github.com/minio/minio-java would probably the best compromise, to replace the aws sdk with - if we want to go such a route |
early draft => needs tests and some other adjustments but wanted to share early in case you have any concerns