Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training from scratch: Repeated and mangled words #59

Open
maruker opened this issue Jul 5, 2019 · 0 comments
Open

Training from scratch: Repeated and mangled words #59

maruker opened this issue Jul 5, 2019 · 0 comments

Comments

@maruker
Copy link

maruker commented Jul 5, 2019

I am trying to use this repository to train a language model with an additional input.
My data looks like this:

┌─────────┬─────┬────┬───┐
│side info│start│The │cat│
└─────────┴─────┴────┴───┘

The labels look like this

┌────┬───┬─────┐
│The │cat│meows│
└────┴───┴─────┘

Since my objective is quite different from the original training script I implemented the training from scratch but I noticed that it takes much more time than a simple LSTM model to become somewhat decent and the results are not fully concise language even after 15 epochs on 2 million sentences. I am getting outputs that look like this

Gold label:
In most cases , accurate results can only be achieved after a laborious and expensive trial and error process .

Output:
only most accurate cases can be achieved after a laborious error and process results In trial and expensive suit.

Currently I am using a small model with 4 layers and 2 heads each.

I randomly initialized the position encodings and multiplied them by 0.1 to match the variance of my word embeddings.

Any ideas what I could have missed?

Here is some of my code

batch_size = 32
n_epochs = 100
max_len = 120

embeddings, emb_weights = load_embeddings(data_path+'de.en.fr.ka.tok.60000.shuf.vec',max_len)
train_dataset = SortedSentenceDataset(data_path+'train.txt', 200000, max_len, embeddings, 'avg',device)
train_sampler = train_dataset.get_sampler(batch_size)
train_loader = DataLoader(train_dataset, batch_size=1, sampler=train_sampler)
dev_dataset = SortedSentenceDataset(data_path+'valid.txt', 1000, max_len, embeddings, 'avg',device)
dev_sampler = dev_dataset.get_sampler(batch_size)
dev_loader = DataLoader(dev_dataset, batch_size=1, sampler=dev_sampler)

args = DEFAULT_CONFIG
args.n_embd = emb_weights.size(1)
# Constraint: embedding size % number of heads = 0
args.n_head = 2
args.n_layer = 4
model = load_model(args, emb_weights)

model.to(device)

criterion = torch.nn.CrossEntropyLoss()

optimizer = OpenAIAdam(model.parameters(),
                           lr=6.25e-3,
                           schedule='warmup_linear',
                           warmup=0.02,
                           t_total=n_epochs*len(train_dataset)*20,
                           b1=0.9,
                           b2=0.999,
                           e=1e-8,
                           l2=0.01,
                           vector_l2='store_true',
                           max_grad_norm=1)

best = 1000
for epoch in range(n_epochs):
    do_epoch(train_loader)
    val_loss = eval(dev_loader)
    print('Validation loss: {}'.format(val_loss))
    if val_loss < best:
        best = val_loss
        print('Saving model')
        torch.save(model.state_dict(),"context-at-each-layer-checkpoint-{}k{}e4b.pt".format(len(train_dataset)//1000,n_epochs))
    print(' '.join(generate(train_dataset,max_len,embeddings)))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant