Don't skip training test #751

constantinpape · 2024-10-19T15:36:36Z

The training test is very slow since yesterday. I am skipping it for now, see:
https://github.com/computational-cell-analytics/micro-sam/blob/master/test/test_training.py#L13-L14

But we should figure out why it's slower and reactivate it so that we notice if some change affects training.

I am not sure what the reason is for this slowdown. Nothing has changed in the training code, it seems to happen irrespective of the pytorch version, so is also not just for pytorch 2.5, and everything still works normally for me locally.

DavidMed13 · 2024-10-25T04:11:03Z

Hi, Im David Medina, from México, im trying to train my own model but de training is not moving forward, I don't know if this commend is related to this issue im having, by the way, great work with usam

Captura de pantalla 2024-10-24 a la(s) 10 10 54 p m

anwai98 · 2024-10-25T07:12:41Z

Hi @DavidMed13,

Thanks for your interest in micro-sam.

Can you run the following code in a cell in your notebook and share with us what it returns as outputs?

from micro_sam.util import _get_default_device
print(_get_default_device())

DavidMed13 · 2024-10-25T19:35:23Z

Sure!

constantinpape · 2024-10-26T14:42:22Z

Hi @DavidMed13 , your issue is likely due to the fact that training on the MPS devices is very slow. You can try to train on the CPU by setting device="cpu". But please note that training on the CPU will still be quite slow and you will likely have to wait several hours and also need sufficient main memory (ideally >=32 GB).

This is why we recommend using a GPU for training. If you don't have access to a GPU you can use cloud resources for this, see https://github.com/computational-cell-analytics/micro-sam/blob/master/notebooks/sam_finetuning.ipynb for details.

DavidMed13 · 2024-11-08T19:22:44Z

Hello, thank you for the information, I have tried put it in cpu but still won't advance, I don't know if im doing something wrong

If I use the same coding form the notebook I get this error

But if I delete "rois" work but I don't know if that modification is the problem, because I guess it shouldn´t take so much time is a little training is only 5 imagines with labels

I have tried to replicate this in google collab and kaggle but I always have error importing the packages

I really want to create this fine-tuning model because is for microglia, I really appreciate if you can help me, I really feel like I have tried all haha

Thank you in advance :D

anwai98 · 2024-11-08T19:43:04Z

Hi @DavidMed13,

I think we fixed this issue in our latest release (where the dataloader accepts all supported arguments).

To make sure of this, could you run the following script in your terminal and share with us the outputs?

python -c "import micro_sam; print(micro_sam.__version__)"

DavidMed13 · 2024-11-08T20:52:58Z

Sure :)

anwai98 · 2024-11-08T21:00:09Z

Okay, looks like you are using the latest micro-sam already. Hmm, that's strange.

I'll try to reproduce the issue you mentioned above.

Meanwhile, could you confirm two things for us:

The installation process (via conda-forge / source)?
Are you making use of the latest finetuning notebook? (https://github.com/computational-cell-analytics/micro-sam/blob/master/notebooks/sam_finetuning.ipynb)

DavidMed13 · 2024-11-08T21:10:49Z

The installation was via mamba in a environment and yes im using the last fine-tuning notebook :)

anwai98 · 2024-11-08T21:14:57Z

Hi @DavidMed13,

Thanks for sharing the details.

Another request: could you run this and send me the outputs?

python -c "import inspect; from micro_sam.training.training import default_sam_loader; print(inspect.getsource(default_sam_loader))"

DavidMed13 · 2024-11-08T21:20:53Z

Of course

Captura de pantalla 2024-11-08 a la(s) 3 19 12 p m

And thank you a lot @anwai98

anwai98 · 2024-11-08T21:25:34Z

Ah yeah, I see the issue now. Seems like we recently updated this part.

Can I request you to install micro-sam from source and try again? (see suggestion below)

Since you already have micro-sam installed, it's a rather easier pathway to installing the package from source:

Clone our repo
Enter the repo and install micro-sam in development mode

mamba activate <INSTALLED_ENVIRONMENT_NAME>  # where, you should ensure that you activate the environment where micro-sam is already installed
git clone https://github.com/computational-cell-analytics/micro-sam
cd micro-sam
pip install -e .

DavidMed13 · 2024-11-08T21:31:52Z

Okay, done I did it, should I try again the notebook?

Captura de pantalla 2024-11-08 a la(s) 3 31 15 p m

anwai98 · 2024-11-08T21:32:47Z

Yes, let's try training the model again and see if the error is (hopefully) gone!

EDIT: We should now try with the rois parameter in the default_sam_loader.

DavidMed13 · 2024-11-08T21:40:21Z

Okay it work!
Now i don't know how much time should I wait if im in my MacBook,
Like 4 hrs?
Yes I used the the rois hehe :)

And now im in here

anwai98 · 2024-11-08T21:47:44Z

Okay, that's great to see that the error is gone.

Regarding the training runtime, you should see the progress bar going forward (i.e. processing a few iterations) in a couple of minutes (and the overall training would probably take a couple of hours). If it's slower than what you would desire, I can suggest a) using Kaggle (as it provides reasonable GPU resources free to use) or b) in case you have access to some compute cluster resources. This would free up the training overhead on your laptop.

DavidMed13 · 2024-11-08T22:04:16Z

Hey thank you very much for all the help, I will try it on kaggle, my lab have a server I will ask for permission Thank you a lot :) El El vie, 8 de nov de 2024 a la(s) 15:48, Anwai Archit < ***@***.***> escribió:

…

Okay, that's great to see that the error is gone. Regarding the training runtime, you should see the progress bar going forward (i.e. processing a few iterations) in a couple of minutes (and the overall training would probably take a couple of hours). If it's slower than what you would desire, I can suggest a) using Kaggle (as it provides reasonable GPU resources free to use) or b) in case you have access to some compute cluster resources. This would free up the training overhead on your laptop. — Reply to this email directly, view it on GitHub <#751 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AWUX67VSNW3WLRXALHDWZATZ7UWRNAVCNFSM6AAAAABQHSZDROVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRVHAYDAOJQHA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't skip training test #751

Don't skip training test #751

constantinpape commented Oct 19, 2024

DavidMed13 commented Oct 25, 2024

anwai98 commented Oct 25, 2024

DavidMed13 commented Oct 25, 2024

constantinpape commented Oct 26, 2024

DavidMed13 commented Nov 8, 2024

anwai98 commented Nov 8, 2024

DavidMed13 commented Nov 8, 2024

anwai98 commented Nov 8, 2024

DavidMed13 commented Nov 8, 2024

anwai98 commented Nov 8, 2024

DavidMed13 commented Nov 8, 2024

anwai98 commented Nov 8, 2024

DavidMed13 commented Nov 8, 2024

anwai98 commented Nov 8, 2024 •

edited

Loading

DavidMed13 commented Nov 8, 2024

anwai98 commented Nov 8, 2024

DavidMed13 commented Nov 8, 2024 via email

Don't skip training test #751

Don't skip training test #751

Comments

constantinpape commented Oct 19, 2024

DavidMed13 commented Oct 25, 2024

anwai98 commented Oct 25, 2024

DavidMed13 commented Oct 25, 2024

constantinpape commented Oct 26, 2024

DavidMed13 commented Nov 8, 2024

anwai98 commented Nov 8, 2024

DavidMed13 commented Nov 8, 2024

anwai98 commented Nov 8, 2024

DavidMed13 commented Nov 8, 2024

anwai98 commented Nov 8, 2024

DavidMed13 commented Nov 8, 2024

anwai98 commented Nov 8, 2024

DavidMed13 commented Nov 8, 2024

anwai98 commented Nov 8, 2024 • edited Loading

DavidMed13 commented Nov 8, 2024

anwai98 commented Nov 8, 2024

DavidMed13 commented Nov 8, 2024 via email

anwai98 commented Nov 8, 2024 •

edited

Loading