You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If am not wrong the proposed adapter contains 183M parameters when you compare this with the VIT-B encoder which is composed of 63M params approximately. How can you claim that your adapter is efficient than fine tuning the whole encoder itself?
The text was updated successfully, but these errors were encountered:
Firstly, fine-tuning the entire encoder would lead to a degradation of the original ViT's capabilities, so we opted for adapter fine-tuning instead. Secondly, the efficiency during fine-tuning with adapters did not decrease to an intolerable level; for instance, the FPS remained acceptable. Lastly, the adapter layer updates parameters only during the first iteration of each batch, and subsequent iterations do not update them, thus maintaining training efficiency. If you wish to reduce the number of parameters further, you can increase the down-sampling rate, such as to 0.75.
If am not wrong the proposed adapter contains 183M parameters when you compare this with the VIT-B encoder which is composed of 63M params approximately. How can you claim that your adapter is efficient than fine tuning the whole encoder itself?
The text was updated successfully, but these errors were encountered: