You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Separated layers:
During research, we extract features from intermediate layers. We also inject other stuff like random features or gradients in between layers. Among the models that are implemented in this repo, many of them have bunched layers. For example, in the ResNet models, many Bottleneck blocks are bunched together in the make_layer function. This makes research difficult, since we first have to edit the network structure and undo all the 'bunching', make everything serialized and then use the network for research. Doing so not only takes a lot of time, but also takes away the ability to use pretrained weights since there is a dictionary mismatch. Hence, we would also have to write porting code for the weights as well. It would be great if the network structures can be simplified.
DeepSpeed / Fairscale integration:
As it is rightly pointed out in the latest update, the NF-Nets are memory hungry and training even in low batch sizes goes OOM in most modern GPUs. This trend is just the beginning. In the coming future, more and more models are going to go OOM even at very small batch sizes. A transparent implementation (like --zero-optimizer fairscale/deepspeed) would be an awesome addition.
Nvidia DALI loader:
Nvidia DALI is an extremely lightweight and fast loader. It is very useful for research folks who don't own a high grade CPU wherein bottlenecks in the dataloader pipeline is imminent. It also help in scenarios where we run multiple projects in a single node splitting it across different devices.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
During research, we extract features from intermediate layers. We also inject other stuff like random features or gradients in between layers. Among the models that are implemented in this repo, many of them have bunched layers. For example, in the ResNet models, many Bottleneck blocks are bunched together in the make_layer function. This makes research difficult, since we first have to edit the network structure and undo all the 'bunching', make everything serialized and then use the network for research. Doing so not only takes a lot of time, but also takes away the ability to use pretrained weights since there is a dictionary mismatch. Hence, we would also have to write porting code for the weights as well. It would be great if the network structures can be simplified.
As it is rightly pointed out in the latest update, the NF-Nets are memory hungry and training even in low batch sizes goes OOM in most modern GPUs. This trend is just the beginning. In the coming future, more and more models are going to go OOM even at very small batch sizes. A transparent implementation (like --zero-optimizer fairscale/deepspeed) would be an awesome addition.
Nvidia DALI is an extremely lightweight and fast loader. It is very useful for research folks who don't own a high grade CPU wherein bottlenecks in the dataloader pipeline is imminent. It also help in scenarios where we run multiple projects in a single node splitting it across different devices.
Beta Was this translation helpful? Give feedback.
All reactions