Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add MapillaryClient to download images from a latitude and longitude #12

Closed

Conversation

dragonejt
Copy link
Contributor

@dragonejt dragonejt commented Feb 15, 2024

Changes

Add a MapillaryClient to call the Mapillary Python SDK to get some images from a latitude or longitude, by following these steps:

  1. Get some GeoJSON data of images near a latitude or longitude. Filter for only panoramic images.
  2. Get the Mapillary image id for each image from the GeoJSON
  3. Get the mapillary image url from the image id
  4. Download the image to a specified path from the url

Steps 3 and 4 execute concurrently through Python multiprocessing.

Testing

Running mapillary_client.py's dunder method, which does the steps above for a selected longitude and latitude, succeeds in downloading images.

@dragonejt dragonejt added the enhancement New feature or request label Feb 15, 2024
@dragonejt dragonejt requested a review from danbjoseph February 15, 2024 01:42
@dragonejt dragonejt linked an issue Feb 15, 2024 that may be closed by this pull request
@danbjoseph danbjoseph requested review from jayqi and ayoakin February 15, 2024 01:56
@danbjoseph
Copy link
Member

Awesome! Thanks for working on this!

Can you briefly describe how to test this? E.g. how does one provide the API token, etc?

Maybe we can chat as a group at tomorrow's project night how this bit interacts with the rest of the toolchain. That is, does the code that loops through the collection of geographic points from the previous step go into this file or somewhere else, etc?

@dragonejt
Copy link
Contributor Author

Yeah, currently the API token is gotten from the environment variables. You could either set it manually with an export, or save it in a .env file as with this PR, I am adding the python-dotenv library which imports environment variables from .env.

I tested it just by running python src/mapillary_client.py, it will run the code block under if __name__ == __main__ (the dunder method), and it will download the images to the directory that your terminal is running in. It is also importable for when we want to integrate it into the other parts of the code, but the integration has not been done yet.

@ayoakin
Copy link
Contributor

ayoakin commented Feb 15, 2024

Thanks for working on this @dragonejt

I am looking into this. So the module (creates_points.py) outputs a .zip file. Could you please add some documentation how a user would work with this file in your module (mapillary_client.py)? I think the outputs are clear but it doesn't hurt to add what the output is and where it would be stored.

Again thanks so much for all your hard work on this.

cc: @danbjoseph

@dragonejt
Copy link
Contributor Author

So as of right now the Mapillary client and create_points are not integrated at all. In order to use the client, you would likely import it into wherever it is needed, instantiate it, and call it with the latitude and longitude as is done in the dunder method. I have not read through create_points to figure out how to integrate it, but integration should probably be done by importing the Mapillary client in create_points.

return pool.map(self.download_image, image_ids)

def download_image(self, image_id: int) -> Path:
mly.set_access_token(self.access_token)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you authenticate already in __init__, you shouldn't need to do it again, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I am using multiprocessing for concurrency, the access token is not set when the image downloads are run concurrently.

Comment on lines +54 to +56
image_content = get(image_url).content
with open(filepath, "wb") as image_file:
image_file.write(image_content)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to use requests.raise_for_status in case of responses with error codes, like 404 or 500 or whatever. Otherwise, we'd end up writing the content of a 404 Not Found HTML page to the image.

https://requests.readthedocs.io/en/latest/user/quickstart/#response-status-codes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this was a WIP so I was focused on just downloading the images.

Comment on lines +13 to +15
self.log = logging.getLogger(self.__class__.__name__)
self.log.addHandler(logging.StreamHandler())
self.log.setLevel(log_level)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a strongly held view, but I personally like loguru a lot as a logging library. It has a lot of nice defaults and lets you avoid needing to set up boilerplate.

https://github.com/Delgan/loguru

Also, I've more commonly seen loggers initialized at the module level, and the level controlled with global log level settings. Attaching the logger to the client class feels unnecessary to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am more familiar with Java's Slf4J, which attaches the logger at the class level to provide the context class and method names at each log. Evidently, this did not happen with the self.__class__.__name__, so I am open to suggestions.



class MapillaryClient:
def __init__(self, access_token: str, log_level: int = logging.INFO, basepath: str = getcwd()) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're using a convention of saving data to data/raw so consider having a default base directory like data/raw/mapillary/

https://github.com/AmericanRedCross/street-view-green-view?tab=readme-ov-file#project-organization

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have implemented the option to set a basepath, so this should be possible by modifying the default basepath



if __name__ == "__main__":
client = MapillaryClient(getenv("MAPILLARY_API_KEY"))
Copy link
Contributor

@jayqi jayqi Feb 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to have a .env.example file that contains

MAPILLARY_API_KEY=somefakeapikey

to help document that this is needed. This is a common convention that I've seen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would things like the default basepath go in that file as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is possible. Although maybe it should be renamed to image_path or something since the scripts will be run separately.

@banjtheman
Copy link

Was able to run the code and download LOTS of images from. Some notes

  • You need to export the token from Client Token under https://www.mapillary.com/dashboard/developers
  • You can make an app and just use the GitHub for the website and redirect URL, it doesn't matter
  • Must use Python 3.9 or lower due to Mapillary client

@dragonejt
Copy link
Contributor Author

You need to export the token from Client Token under https://www.mapillary.com/dashboard/developers

I'm considering renaming this MAPILLARY_CLIENT_TOKEN to more accurately reflect what it is called on the Mapillary developer portal, should make it easier for an operator to find the token and run the program

@danbjoseph
Copy link
Member

@dragonejt do you have a clear path forward to continue work on this? or do you need more group input?

@jayqi
Copy link
Contributor

jayqi commented Feb 21, 2024

I haven't gotten response on the issue I opened about the mapillary package's pinned dependencies. mapillary/mapillary-python-sdk#154

I'm wondering if it would be better to just directly make REST API calls to Mapillary using requests, rather than using their Python SDK.

@dragonejt
Copy link
Contributor Author

Hey sorry I haven't been able to work on this recently. I think I'll reimplement the client with the API then.

@dragonejt dragonejt closed this Feb 28, 2024
@dragonejt dragonejt deleted the 7-use-mapillary-api-to-retrieve-image-for-each-point branch February 28, 2024 01:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

use Mapillary API to retrieve image for each point
5 participants