-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training with custom data gets stuck on the first iteration. #15
Comments
Yes, you should be able to see some logs here. Can you please check if a folder is created in maml directory for your logs? Also, would that be possible for you to share the code? probably on some fork or other branch here? What about the dataset? |
a folder is actually created in the master folder as i moved my script in the master folder before i ran it. Of course i would love to share my code and dataset. |
i have created a fork and uploaded my code and here is a link for the dataset and logfiles. Thank you again for your help. Dataset: https://drive.google.com/file/d/1l37mCGycof3qI58gvjfDJ5tCE_cISEGz/view?usp=sharing log file: https://drive.google.com/file/d/1yMrQlwn9AqaVWvOBuBt-jyCYPPAEsn27/view?usp=sharing |
Thanks! Please share the link to the fork with me. I will look into it soon, however, I am a little busy with some other projects now so it might take a couple of days before I get back to you. |
here is the link of the repo https://github.com/SamiurRahman1/MetaLearning-TF2.0 |
Can you please point me to the python files you added? |
Sorry, not very familiar with github generally. Here are the script links: |
Hello, did you have some time to look at the code? |
Hi, i am still waiting for reply on this if possible. |
Hello, I have been a little bit busy. Unfortunately, I cannot check this from python files you shared on google drive since it is hard to track changes. Please put everything on Github and I can check out to that particular branch and debug it. Thank you very much, |
Hi, uploading or creating a branch is disabled for your repo, for obvious reasons. I created a pull request and uploaded the files. Maybe you can find them there? If not, could you please tell me exactly how i should upload them? |
Sorry for my late reply. I have been busy. I looked at the codes. It seems to me that your dataset has only 6 classes? Is that correct? In that case, do you want to do meta-learning on it or do you want to just use it for the test? If you want to do meta-learning, you need to have different tasks and your meta-batch-size is 4 and n is 5 which means you need at least 20 classes. However, I guess the program should check this before running and give an appropriate error message. Please let me know if this is the case. One way you can try it is to set meta-batch-size=1 and see if the program still stuck. Thanks again for using this repo. |
Hi, thanks for your reply. I'm trying to train a model with the dataset. I'll try your suggestion and get back to you. |
Hi, so i tried running the training with meta-batch-size=1 but unfortunately it still gets stuck. Is there anything else i need to change if i want to train the model with only 6 classes? |
Okay, I see in the dataset class there are 4 classes for training and 2 classes for validation. In this case, there is no way to generate 5-way tasks during training because there are only 4 classes for training. Can you please try using all 6 classes for training and all 6 classes as well for validation and test to just see if that is the problem? Also, you can set n=4 instead of 5 with meta-batch-size=1, but again since your validation has just two classes, I think you might get the same problem. |
When i set
|
I do not think you should set num_train_classes and num_val_classes to 6 because that means you have at least 12 classes and the others are for the test. Can you please make sure your function
|
I made the changes to the function as you suggested, set |
Can you run maml_omniglot? |
with the omniglot dataset you mean? |
hmm, interestingly, i get the same error when i run maml_omniglot.py |
It seems to be something from TF version. What is your TensorFlow version? |
my tf version is 2.3.1 |
I encounter the same problem when the TF version is 2.3.1. |
The code currently works with TF version 2.2.0-rc2. Would be glad to get a merge request to update the version if you are interested. |
When i am trying to train a model with my custom data, it is stuck with the following output:
It stays like this until i interrupt the training. I can also see that the GPU is in use but i don't see anything else happening. Is it normal? As far as i know, i should be able to see some logs like accuracy, loss, epoch count etc.
The text was updated successfully, but these errors were encountered: