Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I got Nan values during the training stage-1 #135

Closed
sarperkilic opened this issue Jul 3, 2024 · 4 comments
Closed

I got Nan values during the training stage-1 #135

sarperkilic opened this issue Jul 3, 2024 · 4 comments

Comments

@sarperkilic
Copy link

Hi,

I started training stage-1.

In the first iteration, everything is fine but after the first iteration,each model generates only nan value. what can be the reason?

this gives me NaN

face_emb = self.imageproj(face_emb)

this gives me NaN
self.reference_unet(
ref_image_latents,
ref_timesteps,
encoder_hidden_states=face_emb,
return_dict=False,
)

in the first iteration, each model works fine

@xumingw
Copy link
Contributor

xumingw commented Jul 3, 2024

Could you test the pull request #133?

@sarperkilic
Copy link
Author

i will test now but I also encounter this problem when I don't validate.

i change enter the validation code part like this and I am not validating the network after the first iteration
if global_step % cfg.val.validation_steps == 0: # or global_step == 1:

in the first iteration, network produces result as expected, and in the second iteration it gives me nan

@sarperkilic
Copy link
Author

it works now, thanks @xumingw

@HMnZn
Copy link

HMnZn commented Dec 28, 2024

你好,请问您是怎么解决的?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants