Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss and accuracy doesn't change when trying to run the GG experiment on a single custom dataset #29

Open
appledora opened this issue May 16, 2024 · 0 comments

Comments

@appledora
Copy link

appledora commented May 16, 2024

From a pickle file, I load my custom dataloaders for a particular dataset like below:

train_loader = dataloader_dict[TASK_NAME]["train"]
val_loader = dataloader_dict[TASK_NAME]["val"]
test_loader = dataloader_dict[TASK_NAME]["test"]
print(f"Train size: {len(train_loader.dataset)}, Val size: {len(val_loader.dataset)}, Test size: {len(test_loader.dataset)}")
best_acc1 = [0.0 for _ in range(num_tasks+1)]
curr_acc1 = [0.0 for _ in range(num_tasks+1)]
adapt_acc1 = [0.0 for _ in range(num_tasks+1)]

I have slightly modified the resnet.py to accept a num_class property during initialization. So my training setup looks like this:

num_tasks = 1 
task_idx = 1
criterion = nn.CrossEntropyLoss()
model = utils.get_model("ResNet34", NUM_CLASSES)
CONFIG.device = device
CONFIG.output_size = NUM_CLASSES
model = model.to(device)
print(device)
model.apply(lambda x: setattr(x, "task", task_idx))
params = []
param_count = 0
for name, param in model.named_parameters():
    if not param.requires_grad: continue
    param_count += param.numel()
    split = name.split(".")
    if split[-1] in ["scores", "s", "t"]:
        params.append(param)
lr = 0.1
optimizer = torch.optim.Adam(params, lr=lr, weight_decay=0.0001)
train_epochs = 250
scheduler = CosineAnnealingLR(optimizer, train_epochs)

These are my training and eval code:

import tqdm
def train(model, writer, train_loader, optimizer, criterion, epoch, task_idx, data_loader=None):
    model.zero_grad()
    model.train()

    num_correct = 0
    total_seen = 0
    for batch_idx, (data, target) in tqdm.tqdm(enumerate(train_loader), desc = "TRAIN"):
        optimizer.zero_grad()
        data = data.to(device)
        target = target.to(device)
        output = model(data)
        loss = criterion(output, target)
        predictions = output.data.max(1, keepdim=True)[1]
        num_correct += predictions.eq(target.data.view_as(predictions)).sum()
        total_seen += target.size(0)
        loss.backward()
        optimizer.step()

        if batch_idx % 10 == 0:
            print(
                f"Train Epoch: {epoch} [{batch_idx * len(data)}/{len(train_loader.dataset)} "
                f"({100. * batch_idx / len(train_loader):.0f}%)]\tLoss: {loss.item():.6f}"
            )

@torch.no_grad()
def evaluate(model, val_loader, epoch):
    model.eval()
    num_correct = 0
    total_seen = 0
    for id, (batch, labels) in tqdm.tqdm(enumerate(val_loader), desc = "EVAL"):
        batch = batch.to(device)
        labels = labels.to(device)
        logits = model(batch)
        predictions = logits.argmax(dim=-1)
        num_correct += (predictions == labels).float().sum()
        total_seen += logits.size(0) 
    

    print(f"Val Perf after {epoch + 1} epochs Acc@1 {(num_correct / total_seen):0.4f}")
    return num_correct / total_seen

I am using the config values from the rn50-supsup-adam.yaml file. The args are also set up accordingly.
However, no matter what even after 40-50 epochs, there's no stable change in loss or accuracy. What am I doing wrong here?
Additionally, for this single dataset I am using the following module types:

    conv_type="MaskConv",
    bn_type="NonAffineBN",
    conv_init="signed_constant",
@appledora appledora changed the title How to run the GG experiment on a single custom dataset? Loss and accuracy doesn't change when trying to run the GG experiment on a single custom dataset May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant