-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
i3DB results #23
Comments
The setting for the quantitative results in the paper was slightly different than the configuration provided in the repo. In particular, we used 3 second sequences ( wrt the gap between VPoser-t and HuMoR: the global body joint error is not a great indicator of the key differences between these two methods since it includes all body joints over all frames. HuMoR is most helpful when there are heavy occlusions or noise, but the global joint error metric is dominated by joints that are visible and not too noisy even in i3DB. The difference is more obvious when joint errors are measured for body parts that are often occluded like legs. You can also see the qualitative difference in the supplemental comparisons on the webpage. |
Hello, thanks a lot for your answer ! By the way, I have a question about rollout function. Thank you! |
To start stage 3, we have to represent the output of stage 2 (sequence of SMPL poses) within the VAE (i.e. as an initial pose and sequence of latent vectors). To do this we use the VAE encoder for all pairs of frames to get a latent z for each pair. Then when we rollout the sequence using this latent sequence (i.e. the stage3_init_result) there are naturally some errors in the reconstruction that tend to propagate as the sequence gets longer. For long sequences like 10-20 sec the optimization will be quite difficult: the initialization will be worse as you suggest, but also it's a much larger problem that will take longer and have more local minima (since we must optimize another latent z for every added timestep). This is why we have the option to split up long videos into short sequences of 2-3 sec. |
Ok I see ! |
Hello, thank you for this great work!
I have a question about the reproducibility of the results on the i3DB dataset: when I use your code, the final global joint error is 33.5cm and I have 34.3 for Vposer-t (which are respectively equal to 28.15 and 31.59 in the paper).
Am I missing something or were your testing settings any different from those in the code?
By the way, is this expected to obtain only a 1cm gap between VPoser-t and HuMoR since the CVAE prior seems to be crucial for good predictions?
Thank you!
The text was updated successfully, but these errors were encountered: