You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to evaluate an LBANN for strong scalability described in the LBANN publication: (The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism, arXiv '20).
However, I cannot reproduce the scalability of CosmoFlow benchmark. In the paper, they said that this result worked with spatial-parallel I/O, but I cannot find the related option in LBANN.
Could you help me to produce the strong scalability of LBANN on NERSC Perlmutter? My question is:
i) How to use spatial-parallel I/O? (Does it mean the "distconv" option?)
ii) Could you share the detailed training parameters (batch size, training options of CosmoFlow)?
Thank you for your help
The text was updated successfully, but these errors were encountered:
Hello LBANN team,
I would like to evaluate an LBANN for strong scalability described in the LBANN publication: (The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism, arXiv '20).
However, I cannot reproduce the scalability of CosmoFlow benchmark. In the paper, they said that this result worked with spatial-parallel I/O, but I cannot find the related option in LBANN.
Could you help me to produce the strong scalability of LBANN on NERSC Perlmutter? My question is:
i) How to use spatial-parallel I/O? (Does it mean the "distconv" option?)
ii) Could you share the detailed training parameters (batch size, training options of CosmoFlow)?
Thank you for your help
The text was updated successfully, but these errors were encountered: