Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a question about meta-training strategy #45

Open
Sword-keeper opened this issue Jan 14, 2021 · 15 comments
Open

a question about meta-training strategy #45

Sword-keeper opened this issue Jan 14, 2021 · 15 comments

Comments

@Sword-keeper
Copy link

Hi, when i read your code. i noticed that your meta-training strategy have some differences with MAML. Could you tell me which meta-learning paper design this strategy ? Or it is your design? Besides, what's the reason you choose this strategy?

@yaoyao-liu
Copy link
Owner

Hi,

What do you mean by “training strategy”? Do you mean that we introduce “pre-training” phase?

Best,
Yaoyao

@Sword-keeper
Copy link
Author

I mean meta training phase. In maml's outer loop, the loss which update model's params is all tasks'(100 training task) loss' sum. In each outer loop epoch,model's param update only once . however ,in your torch version, in the outer loop phase, the loss which update model's params is every task's loss. in each outer loop epoch, it update 100 times(training task num). This pic may can explain more clearly.

image

@yaoyao-liu
Copy link
Owner

I think you misunderstand MAML.

MAML doesn't use all tasks' losses to update the model in the outer loop. Our MTL uses a similar meta-training strategy as MAML. Your figure doesn't show the correct strategy applied in MAML.

In MAML, they use the "meta-batch" strategy, i.e., using the average loss of 4 tasks to update one outer loop iteration. In our method, we just set the number of "meta-batch" to 1.

@Sword-keeper
Copy link
Author

oh i see. thank you every much. And could you tell me why you set the meta-batch to 1? what's the meaning of meta-batch?

@yaoyao-liu
Copy link
Owner

If the meta-batch size is 4, in one outer loop iteration, the model will be updated by the average loss of 4 different tasks.
I set the meta-batch size to 1 because it will be easier to implement it...

@Sword-keeper
Copy link
Author

well... thank you~

@yaoyao-liu
Copy link
Owner

No problem.

@yaoyao-liu
Copy link
Owner

I think your figure is correct. But n is not 100. It is 4 in the different settings of MAML.

Besides, n is not the number of all tasks. In MAML, we can sample e.g., 10000 tasks. The four tasks in one meta-batch are sampled from the 10000 tasks.

@Sword-keeper
Copy link
Author

oh you are right . i misunderstand this figure.

@LavieLuo
Copy link

@Sword-keeper Hello, I agree with u. And also thank the authors for their useful reply.
I guess the main difference between the MTL and MAML w.r.t. “training strategy” is the setting of meta_batch_size, where MAML is 4 and MTL is 1. Besides, I guess "update 100 times" means the parameter update_batch_size ($k$ in your figure) in MAML code, which is set as 5 while MTL is 100? I'm actually also puzzled about this. (e.g., line 101 in meta-transfer-learning/pytorch/trainer/pre.py )
for _ in range(1, self.update_step):

@yaoyao-liu
Copy link
Owner

Hi @LavieLuo,

Thanks for your interest in our work.
In MAML, they update all network parameters during base-learning 5 times.
In our MTL, we update the FC layer during base-learning 100 times.
As we update a minimal number of parameters compared to MAML, we can update them more times.

If you have any further questions, please send me an email or add comments on this issue.

Best,
Yaoyao

@LavieLuo
Copy link

@yaoyao-liu Woo, thank you for this prompt reply. Now I completely understand the motivation of this strategy. That's cool! :)

@yaoyao-liu
Copy link
Owner

@LavieLuo In my experience, if the base-learner overfits the training samples of the target task, the performance won't drop. So I just update the FC layer as many times as I can to make it overfitting.

@LavieLuo
Copy link

@yaoyao-liu Yes, I agree! I remember some recent works show the overfitting of DNNs manifests in probabilistic (over-confidence) which somehow doesn‘t degrade the accuracy. Also, I forget that MTL only trains a part of the parameters, and now I figure it out. Thanks again!

@yaoyao-liu
Copy link
Owner

@LavieLuo
No problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants