You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When we already got the semantic token ids by HubertKmeans, the semantic embeddings are calculated using a randomly initialized embedding layer in SemanticTransformer. So why don't use the cluster centroids of pre-trained Hubert as the embedding?
The text was updated successfully, but these errors were encountered:
The idea of the attention mechanism in the transformer network is to capture the relationship between token ids. Those semantic embeddings are randomly initialized but will be trained or will learn to capture the relationship between tokens in the training process.
When we already got the semantic token ids by HubertKmeans, the semantic embeddings are calculated using a randomly initialized embedding layer in SemanticTransformer. So why don't use the cluster centroids of pre-trained Hubert as the embedding?
The text was updated successfully, but these errors were encountered: