Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about model trained on CIFAR10 #6

Open
v-1024 opened this issue Apr 5, 2023 · 5 comments
Open

Questions about model trained on CIFAR10 #6

v-1024 opened this issue Apr 5, 2023 · 5 comments

Comments

@v-1024
Copy link

v-1024 commented Apr 5, 2023

Hi author,

Thank you for your outstanding work! Recently, I repeated this work. During the training process, I trained epoch 500 on the CIFAR10 dataset using the script 'eval_ckpt_cifar10.sh' provided by you, but I encountered some problems during the testing process.

knn score and Mahalanobis score are used as OOD score to detect OOD, and the indicators are as follows:

knn:

           FPR95  AUROC   AUPR
SVHN       4.86   99.23   99.23
places365  25.2   95.36   95.74
iSUN       21.38  96.41   97.39
dtd        17.04  97.29   98.51
LSUN       4.12   99.27   99.34
AVG        14.52  97.51   98.04

Mahalanobis:

           FPR95  AUROC   AUPR
SVHN       96.21  65.56   70.06
places365  94.14  59.71   61.74
iSUN       92.36  57.75   63.97
dtd        96.61  44.15   59.86
LSUN       91.0   63.12   59.14
AVG        94.06  58.06   62.95

It can be seen that when Mahalanobis score is used, FPR95 of the model is close to 100, and the test results are quite different from those given in Appendix D and table 6. I am very confused about this result. I tried to find out the reason, so I first tested the ID classification accuracy of the model on CIFAR10 data set, and the result was surprisingly obtained: accuracy is only 5.41%, which is the main part of my acc test code:

'''calculate acc'''
def accuracy(predictions, labels):
    pred = torch.max(torch.softmax(predictions.data, dim=1), dim=1)[1]
    rights = pred.eq(labels.data.view_as(pred)).sum()
    return rights, len(labels)

# load CIFAR10
normalize = transforms.Normalize(mean=[x/255.0 for x in [125.3, 123.0, 113.9]],
                                 std=[x/255.0 for x in [63.0, 62.1, 66.7]])
transform_test = transforms.Compose([transforms.ToTensor(), normalize])
test_loader = torch.utils.data.DataLoader(
            datasets.CIFAR10(args.id_loc, train=False, transform=transform_test),
            batch_size=args.batch_size, shuffle=False)

# load model parameters
pretrained_dict= torch.load(args.ckpt,  map_location='cpu')['state_dict']
net = set_model(args)
net.load_state_dict(pretrained_dict)
net.eval()

val_rights = []
device = torch.device('cuda:{}'.format(args.gpu) if torch.cuda.is_available() else 'cpu')
with torch.no_grad():
    for (data, target) in tqdm(test_loader):
        data = data.to(device)
        target = target.to(device)
        
        penultimate = net.encoder(data).squeeze()
        features = F.normalize(penultimate, dim=1)
        out = net.fc(features)
        # calculate acc
        v_right = accuracy(out, target)
        val_rights.append(v_right)
    
val_r = (sum([tup[0] for tup in val_rights]), sum([tup[1] for tup in val_rights]))

print('acc: {:.2f}%'.format(100. * val_r[0].cpu().numpy() / val_r[1]))

For the above program, I found in the debugging process that the model's prediction of sample labels concentrated in categories 4 and 3, which was obviously an abnormal phenomenon. I don't know the reason for this result, and I hope to get your answer.

Thank you!

@alvinmingsf
Copy link
Collaborator

Thanks for bringing this up! How did you evaluate the ID accuracy? Is it obtained by linear probe as in SupCon https://github.com/HobbitLong/SupContrast/blob/master/main_linear.py? For Mahalanobis score, is the covariance matrix ill-conditioned? If you can provide a checkpoint, I can help take a look

@v-1024
Copy link
Author

v-1024 commented Apr 6, 2023 via email

@v-1024
Copy link
Author

v-1024 commented Apr 6, 2023

Thank you for your reply!

For the test of ID accuracy, I will test again according to the code you provided. For Mahalanobis score, since the local torch version does not support torch.cov (), the code for calculating covariance matrix is as follows:

def cov_matrix(x):
    """
    Compute the covariance matrix of a given tensor x
    """
    x_mean = torch.mean(x, dim=1, keepdim=True)
    x_centered = x - x_mean
    cov = torch.matmul(x_centered, x_centered.t()) / (x.shape[1] - 1)
    return cov

In addition, I am very willing to provide the checkpoint, checkpoint has been sent to your email. Thank you for taking the time to help me check it out.

Thank you!

@emannix
Copy link

emannix commented Nov 3, 2023

I'm also struggling to reproduce the results in the paper for CIFAR-10 with this code base. I'm getting a similar AUROC (96.89) but a larger FPR95 (19.43). Was there quite a bit of noise in the FPR95 results for CIFAR-10?

@Xiaoyan-Zhou
Copy link

For the ID recognition, should we train a model based on https://github.com/HobbitLong/SupContrast/blob/master/main_linear.py?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants