-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't skip training test #751
Comments
Hi @DavidMed13, Thanks for your interest in Can you run the following code in a cell in your notebook and share with us what it returns as outputs? from micro_sam.util import _get_default_device
print(_get_default_device()) |
Hi @DavidMed13 , your issue is likely due to the fact that training on the MPS devices is very slow. You can try to train on the CPU by setting This is why we recommend using a GPU for training. If you don't have access to a GPU you can use cloud resources for this, see https://github.com/computational-cell-analytics/micro-sam/blob/master/notebooks/sam_finetuning.ipynb for details. |
Hi @DavidMed13, I think we fixed this issue in our latest release (where the dataloader accepts all supported arguments). To make sure of this, could you run the following script in your terminal and share with us the outputs? python -c "import micro_sam; print(micro_sam.__version__)" |
Okay, looks like you are using the latest I'll try to reproduce the issue you mentioned above. Meanwhile, could you confirm two things for us:
|
The installation was via mamba in a environment and yes im using the last fine-tuning notebook :) |
Hi @DavidMed13, Thanks for sharing the details. Another request: could you run this and send me the outputs? python -c "import inspect; from micro_sam.training.training import default_sam_loader; print(inspect.getsource(default_sam_loader))" |
Of course And thank you a lot @anwai98 |
Ah yeah, I see the issue now. Seems like we recently updated this part. Can I request you to install Since you already have
mamba activate <INSTALLED_ENVIRONMENT_NAME> # where, you should ensure that you activate the environment where micro-sam is already installed
git clone https://github.com/computational-cell-analytics/micro-sam
cd micro-sam
pip install -e . |
Yes, let's try training the model again and see if the error is (hopefully) gone! EDIT: We should now try with the |
Okay, that's great to see that the error is gone. Regarding the training runtime, you should see the progress bar going forward (i.e. processing a few iterations) in a couple of minutes (and the overall training would probably take a couple of hours). If it's slower than what you would desire, I can suggest a) using Kaggle (as it provides reasonable GPU resources free to use) or b) in case you have access to some compute cluster resources. This would free up the training overhead on your laptop. |
Hey thank you very much for all the help, I will try it on kaggle, my lab
have a server I will ask for permission
Thank you a lot :)
El El vie, 8 de nov de 2024 a la(s) 15:48, Anwai Archit <
***@***.***> escribió:
… Okay, that's great to see that the error is gone.
Regarding the training runtime, you should see the progress bar going
forward (i.e. processing a few iterations) in a couple of minutes (and the
overall training would probably take a couple of hours). If it's slower
than what you would desire, I can suggest a) using Kaggle (as it provides
reasonable GPU resources free to use) or b) in case you have access to some
compute cluster resources. This would free up the training overhead on your
laptop.
—
Reply to this email directly, view it on GitHub
<#751 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AWUX67VSNW3WLRXALHDWZATZ7UWRNAVCNFSM6AAAAABQHSZDROVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRVHAYDAOJQHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
The training test is very slow since yesterday. I am skipping it for now, see:
https://github.com/computational-cell-analytics/micro-sam/blob/master/test/test_training.py#L13-L14
But we should figure out why it's slower and reactivate it so that we notice if some change affects training.
I am not sure what the reason is for this slowdown. Nothing has changed in the training code, it seems to happen irrespective of the pytorch version, so is also not just for pytorch 2.5, and everything still works normally for me locally.
The text was updated successfully, but these errors were encountered: