[BUG] Incomplete and risky bias statistics #4424

iProzd · 2024-11-26T06:58:28Z

Bug summary

In data statistics, certain types may not be sampled from the dataset, resulting in incomplete bias statistics. This will cause training problems, especially when dealing with mixed-type data formats.

The PyTorch DataLoader could be enhanced by implementing two methods:

Calculate the number of atoms of each type within the dataset and cache some frame indices for each type.
If the sampled frames lack certain types that exist in the dataset, use the cached indices to add frames of these missing types into the samples before performing bias statistics.

This approach will ensure comprehensive bias statistics.

DeePMD-kit Version

3.0.0

Backend and its version

Both Pytorch and TensorFlow

How did you download the software?

Built from source

Input Files, Running Commands, Error Log, etc.

See above

Steps to Reproduce

See above

Further Information, Files, and Links

No response

iProzd added the bug label Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Incomplete and risky bias statistics #4424

[BUG] Incomplete and risky bias statistics #4424

iProzd commented Nov 26, 2024

[BUG] Incomplete and risky bias statistics #4424

[BUG] Incomplete and risky bias statistics #4424

Comments

iProzd commented Nov 26, 2024

Bug summary

DeePMD-kit Version

Backend and its version

How did you download the software?

Input Files, Running Commands, Error Log, etc.

Steps to Reproduce

Further Information, Files, and Links