Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue w/ process terminating - CWW testing the pipeline #55

Open
trouille opened this issue Sep 2, 2020 · 4 comments
Open

Issue w/ process terminating - CWW testing the pipeline #55

trouille opened this issue Sep 2, 2020 · 4 comments

Comments

@trouille
Copy link
Member

trouille commented Sep 2, 2020

Mason of the Chicago Wildlife Watch team testing out https://subject-assistant.zooniverse.org/#/intro with their data.

His notes on this issue:

Start on hamlet at 8:17, upload the Spring 2019 subject set

Refresh on 8:24, not ready

Refresh on 8:33, not ready

Refresh on 9:07, complete, follow link to go back to subject assistant and try to fetch getting error saying that “ML task is still running, check again later”

Try to fetch on 9:26, same response (ML still running)

Try to fetch on 9:45, same response (ML still running)

Try to fetch on 10:30, terminated with this response.

image001 (1)

@trouille
Copy link
Member Author

trouille commented Sep 2, 2020

Note - Mason has tried this a few times. His internet has remained on the entire time.

@shaunanoordin
Copy link
Member

shaunanoordin commented Sep 4, 2020

Thank you for reporting this - I think the issue lies with with the Subject Assistant Proxy service.

Investigation

Issue: attempting to retrieve Mason's task at https://subject-assistant.zooniverse.org/#/tasks/ad68546e-9ec4-4bed-a023-b779cb7fc40f results in a "Request has been terminated" error.

At this point, we can determine that the misbehaviour is at the proxy service. Question is: why?

⚠️ SIDE NOTE this investigation is time-sensitive, as results on the Microsoft server have a limited lifespan before they're deleted (to save space, presumably.)

Further Observations and Debug Hypothesis

❗ (❓ ) My current hypothesis is that the proxy service isn't able to handle large requests very well.

This makes a certain amount of sense - the Subject Assistant has, up to this point, been mostly tested with limited curations, with fairly small Subject Set sizes in the range of hundreds of images. A "real" job using a full CWW Subject Set could number in the tens of thousands of images; meaning the theoretical server load wasn't calibrated for practical real world values. 🤦

FAQ

Q: Wait, why use a Proxy service? If you can already pull the actual URL from Microsoft - https://cameratrap.blob.core.windows.net/async-api-zooniverse/ad68546e-9ec4-4bed-a023-b779cb7fc40f/ad68546e_detections_zooniverse-subject-assistant_20200902135513.json?sp=r&sr=b&sv=2019-02-02&se=2020-12-01T14%3A57%3A21Z&sig=GeNOKU10GQDw8Jm1XhqJdb8/nFx5TP0CiXdKVqSV8Rg%3D - why bother with a middle man?

A: JavaScript cross-domain request restrictions, unfortunately. Web browsers don't let code from *.zooniverse.org go willy-nilly downloading files from *.windows.net for safety/security reasons.

Status

We have an idea why things are going wrong - now to confirm it and find a solution.

@camallen
Copy link
Contributor

camallen commented Sep 8, 2020

I'm happy to help here @shaunanoordin

When you are free let's schedule a screen share, to reproduce the failure and debug the subject assistant proxy server. Ive seen a few crashes on this service so it might be a simple resource limit (i.e. our of ram).

@shaunanoordin
Copy link
Member

@camallen You already figured out the problem and the solution - it WAS an issue of memory! PR #57 bumped up the available memory for the Proxy from 50Mi to 100Mi.

This means the example problem (link to Proxy Service) now correctly fetches the results (a 15.6MB JSON file) instead of dying with a 502 error.

@trouille please tell Mason to try https://subject-assistant.zooniverse.org/#/tasks/ad68546e-9ec4-4bed-a023-b779cb7fc40f once again, and to tell them thank you for highlighting this issue.

I'm going to open up a follow up issue in this repo in case anyone else with a larger Subject Set encounters the same problem, so we know the issue lies in the available RAM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants